I agree with you about the limitations of language. I started Tranquillity's Secret with a chapter on that very subject: "A Word About Language and the Written Word ."
Yes, the eyes see motion directly, such as when we are watching someone running. They do not work like a camera clicking individual pictures and then sewing them together again.
In relation to the mind and why mind moves, I suggest that the understanding of the mind moving from frame to frame is incorrect.
I see mind as the name of an activity, not as an entity doing something, so that question of "why?" doesn't arise, it's just the activity of 'minding', or in my terms, naturing.
And so, time, as felt duration, is what the word awareness should be pointing to, in this process of naturing, even if we aren't using that word, but just relying on our understanding of its meaning.
It is my assertion that naturing is cognitive, in that the activity of naturing is impersonally recognized as it proceeds. I use the example of a skilled and fluent dancer who is viscerally aware of her dance, which is immanent within her -- not something objective outside of her. This is exactly how we each know our body -- proprioceptively.
And because time is only a formalism when it refers to anything other than felt duration, this naturing doesn't unfold, it reconfigures immediately (since there is no time). It's kind of like the causal creation idea of a multiverse, just without having to instantiate an entirely new universe. It's just a passage through a plenum of possibility.