Stella X. Yu : Research : Biological Vision


Visual Integration across Saccadic Eye Movements

Joint work with Dr. Tai Sing Lee and Dr. Takeo Kanade; 1998-99.


Seeing with Moving Eyes

In normal viewing, our eye movements are characterized as a sequence of fixations separated by saccades. Fixations are resting eye states, which stabilizes our retina over a stationary object of interest. Saccades are ballistic eye movements that direct our center of gaze from one point to another in space. Every second we make 3~4 saccades, with each fixation taking about 200~300ms and each saccade taking about 5-100ms.

The reasons why we need to make saccades are rooted in the structure of our receptive surface. The center region of our retina is called the fovea, which is tightly packed with cones and gives the best visual detail and color. Only objects striking the fovea allow us to recognize what they are. Moving from the center of the retina to its peripheral region, we travel from the most highly organized structure to a primitive eye, which has far sparsely distributed rods functioning under low illumination and does little more than detect movements of shadows, like the bug detector in a frog eye. The periphery gives unconscious vision and directs fovea to where it is likely to be needed for its high acuity [Gregory, 1990]. By moving our eyes around, we see visual details of the entire scene. Even in fixation, our eyes undergo tremor, drift and microsaccades. Though these miniature eye movements might reflect the noise in the fixation control system attempting to hold gaze steady, they could be a mechanism to compensate the quick adaptation of our photoreceptors in order to maintain the perceptual clarity [Dodwell 1971].

Figure 1 show simulated pictures of what we see in fixation and during a saccade. Our eye movements sample the external light stream in time and space, taking snapshots of a scene in discontinuous fixations. With each saccade, the image of the scene sweeps across the retina at a velocity up to 400 degrees per second. Not only the resultant retina stimulation is very much smeared, such a speed also renders little visual information extraction possible. After each saccade, with objects moving in and out of our tiny foveal region, there is a displacement of the retinal locations of objects in the scene, and the entire scene would have appeared to jump.

Figure 1. Scene, retinal images in one fixation and during one saccade.

These retinal facts imposed by eye movements contradict the clarity, continuity and stability of our visual perception. How do we fill in the blank period and fuzzy input for saccadic eye movement in our perception? How do we accumulate information across saccades and integrate them into an uninterrupted whole? How do we differentiate world displacement from retinal displacement in saccades to maintain visual stability? These questions have puzzled vision researchers for over a century.

Build Mind into a Camcorder

When we hold a camcorder and walk around, what we see in it are what we would have from all the retinal events described above jerky, choppy and fuzzy in movement. How would we improve it in engieering by reflection on the achievements of our visual system?

Here is my account roughly corresponding to Marr's three level classification [Marr, 1980]. At information processing level, two mechanisms are important: internal visual representation of the world and predictive remapping. It seems plausible that what we see is an image in our mind rather than the crude image of the world. We keep tracking what's the focus of our retinal image in our visual imagery and keep updating it to make it faithful to the outside world. We differentiate world displacement from retinal displacement because we know how we move our eyes and we can cancel out this component. We avoid fuzziness because we can automatically shut out the blur input when in movement. We maintain our stable perception of the world because the visual imagery is stable and is what we see. At representation and algorithm level, two mechanisms could be used to interpolate and construct the visual imagery between fixations: inertia as a passive component and information transfer as an active component. By inertia of receptive surface and various visual-processing units, the short break and smear noise caused by saccades can be smoothed out. By active information transfer from the visual imagery to the constructed retinal image, perception becomes continuous and stable. The transferred information includes what and where: what would be seen in the fovea of the image in the next fixation? Where would the new retinal image be located in the visual imagery?

Visual Perception during Saccades

During saccades, perceptive ability decreases relative to fixation condition. The quantitative analysis of this decrease, so called saccadic suppression, became possible with the development of fast eye-movement-contingent display systems. Using detection threshold and recognition rate as a measurement for perceptive ability, there has been a large number of psychophysics results on this topic, including visual tasks of two categories: what detection and where localization, investigating parameters involving temporal, spatial, spectral properties, and under various eye movement conditions.

Here are some confirmed facts about saccadic suppression. Saccadic suppression precedes the saccade onset and outlasts the landing of the saccade. It has been found in both foveal vision and peripheral vision and with different time courses [Mitrani et al, 1970]. It occurs with voluntary saccades, with involuntary saccades, during the fast phase of vestibular nystagmus, during involutary saccades in the course of pursuit movements, during rapid passive movements of the eye, during blinks and changes in accommodation. It is sensitive to luminance contrast at low spatial frequencies and insensitive (or even perceptively enhanced) to color contrast during saccades [Burr et al, 1994; Bridgeman & Macknik, 1995].

There has been a hot debate on the causes of saccadic suppression and its role in visual stability [Matin, 1974]. In my viewpoint, this debate favors three important points. The first is that retinal stimulation in saccades ("smearing") gives a substantial contribution to saccadic suppression by virtue of spatial-temporal integration and various visual masking effects of visual processing. Supporting arguments include that: very marked increases in suppression are found with increases in the illumination and / or complexity of the background on which stimuli are presented [Richards, 1969; Mitrani, 1971]; analogous suppression is found when the eye is held still and the environment is moved rapidly with a mirror [Woodworth, 1938]; for fixating subjects, rapid motion of background causes elevation of visual threshold to detect a brief flash of a light stimulus [MacKay, 1970]; two fixation fields with an interspersed grayout magnify the suppression [Chekaluk & Llewellyn, 1990]. The second is there is residue extraretinal suppression. Supporting arguments include: suppression of phosphenes ocurrs when saccades were made in completely darkness conditions [Riggs etal, 1974]; saccadic suppression occurs earlier than eye movement and recovers later than new fixation [Honda, 1991]. This rules out the assertion that there should be no difference in perception between the saccading and the stationary eye, provided that the stimulus conditions are equivalent [Woodworth, 1906]. It has already been noted in [Dodge, 1900] that saccadic smears are not ordinarily perceived because there is a central process he envisaged as ignoring or inhibition of stimuli that would be disturbing to clear vision. However, where central inhibition originates and how it works remains elusive. Third, saccadic suppression is not the single factor in achieving visual stability and the basic idea of outflow theory is accepted [Martin, 1982].

Figure 2. What are the causes for saccadic suppression?

Besides saccadic suppression, there must be some compensatory extraretinal mechanism operating such that the world does not appear displaced after the suppression ends. The basic idea of outflow theory, proposed by Hemoholtz around 1867, says that, there are two parallel discharges from the brain commanding eye movement: one to the extraocular muscles, and a corollary discharge or an efference copy of the command to the visual system to cancel out the retinal movement signals by subtraction. There are two major supporting arguments for the outflow theory to rule out its opposite, inflow theory, which agrees on the cancellation theory but postulates that the cancellation signals are an afference copy of the eye movement signals fed back from the eye muscles into the brain when the eyes move. One argument is the case when there is no outflow signals: the world appears to swing around in the opposite direction to the movements of the eyes during passive eye movements. The other is the case when there is no retinal movement signals: after-images, fixed images on the retina, does not move during passive eye movements but moves with the eye during voluntary eye movements; for patients with paralyzed eyes or eye muscle problems preventing from functioning, their world moves in the direction their eyes should have moved, and they feel dizzy.

Figure 3. The basic ideas of the cancellation theory.

The details of the outflow theory are explored in various object displacement detection tasks during saccades. The point in question is whether this efference copy is accurate and available to the visual system all the time and how this signal is used by the cancellation process to give us perceptual stability. [Bridgeman et al, 1974] observed that rapid displacement of a target goes undetected during saccades if the size of saccades is more than three times larger than the target displacement. The authors interpret their results with the addition of a threshold element to the algebraic sum of the corollary discharge and the visual signal. This calls for another role of saccadic suppression in visual stability. [Lennie & Sidwell, 1978] showed that errors in target localization following remembered saccades to a flashed target are matched closely by errors in the size of saccades. This mislocalization could be caused by the underestimation of the target location in the peripheral vision and / or by the improper registration of the extent of the eye movement. The authors concluded that the visual system relies on the eye movement being correct and locates the target as the spot that falls on the fovea. They also inferred that the extraretinal signal need not to be precise in specifying accurately the size of the saccade but only need to indicate that eyes have moved. [Mack et al, 1985] showed that saccades after a brief flash of target under an induced displacement condition are directed to its perceived position when that differs from both its retinal and spatial position. [Honda, 1990, 1991] showed that for both horizontal and vertical saccades, a target flashed immediately before, or at the beginning of a saccade is mislocalized in the same direction as the saccade, whereas when the target flashed at the end or immediately after the saccade is mislocalized in the opposite direction to the saccade. The author argued that the extraretinal positional signal does not cancel the movement of the target's image on the retina, because the time course of the signal, calculated by the substraction of the retinal location from the perceived location of the target, does not coincide with that of actual eye movement. The results also suggested that the extraretinal signals must originate from a common neural center more central than the different oculomotor systems for horizontal and vertical saccades. [Cai et al, 1997] reported data on a 3-dot vernier alignment task in which the lower and upper dots are present till the flash of the middle dot. The perception of the relative position of the middle dot is systematically altered in the direction of a saccade in the period preceding the saccade. [Ross et al, 1997] showed that the apparent location of a briefly flashed bar is mislocalized toward the saccade destination. The peak of mislocalization sits around saccade onset in the time course of a saccade. This result together with those on other experiments, which includes perception of a natural scene flashed during saccades, indicates that there is a perceptual distortion, or precisely a compression of visual space, before, during and after saccades instead of uniform shift of visual space driven by some extraretinal position signal as commonly assumed.

From these various results we can conclude that visual perception of the retinal stimulation is suppressed during saccades while the internal visual processing undergoes dynamic and complicate change to maintain visual stability. The results on remembered saccades also imply that visual matching after achieving new fixation could play an important role in visual stability as well. I believe that, to reconcile all these seemingly different results not only involves an explanation from the dynamics of saccadic suppression and cancellation by extraretinal signals per se, but also requires an account for how the visual system registers and binds objects with their locations and how the visual system updates this map when the eyes move. These issues become more direct when we explore the visual integration across saccades.

Visual Integration across Saccades

Saccadic suppression contributes to our perceptual clarity and stability. However, how we fuse information together across saccades seems more essential in understanding our visual stability.

There are two extreme models hypothesizing the mechanisms of information accumulation across saccades. The first is integrative visual buffer theory [McConkie & Rayer, 1976]. The integrative visual buffer is a spatiotopic fusion of retinatopic images from one fixation to next, and is responsible for the perception of a stable and continuous visual world across eye movements. See Figure 4.

Figure 4. Integrative visual buffer theory is a spatiotopic fusion theory:
Eye scan path on an image; fusion after the first two fixations; fusion after the sequence of fixations.

Though this theory is very attractive as it matches directly with our visual awareness, it receives little support from psychophysics experiments. The testing of this hypothesis has been carried out from two complementary perspectives. One is to test the integrative ability across saccades. [Irwin et al, 1983] showed that subjects are unable to perceive a composite visual pattern from two different visual patterns presented in the same spatial location but separated by an eye movement. The specific task used is to ask subjects indicate a missing dot when two completely different set of 4 dots from the same 3 by 3 matrix of dots are presented. [Irwin et al, 1988] and [Sun & Irwin, 1987] claim that integration occurs on the basis of retinotopic but not spatiotopic coordinates across saccadic and pursuit eye movements, but they were shown to be artifacts of display persistence. Instead of seeking evidence for point to point integration, others have looked for some facilitation of perception by presenting a stimulus in a peripheral location before the saccade. These include tests on word reading and picture naming. Changing the case of all the letters from one fixation to the next, which makes the visual appearance dramatically different on each new fixation than on the previous fixation, does not affect the degree of facilitation and in fact the change is rarely noticed, whereas significant facilitation occurs when the preview and the target share the first two or three letters [McConkie & Zola, 1979; Rayner et al, 1980]. Similar facilitation effects are found when the words are previewed extrafovealy [Rayner et al, 1978]. In picture naming, same preview benefits are found when the size of extrafoveal pictures is varied 10%; there is significant facilitation when the preview is visually similar but none when it is semantically similar [Pollatsek et al, 1984]; substantial facilitation is found if the spatial location of the identical preview is different [Pollatsek et al, 1990]. The other perspective is to test change detection ability across saccades. It has already been noted in word reading that if the text is shifted a few character positions to the left or right during a saccade so that the eye would land upon a different letter or word than the one intended, subjects would frequently make a small corrective saccade of the same magnitude but in the opposite direction of the original shift and they rarely indicate their awareness that the text has been moved. Change blindness across saccades is well established by a series of experiments using visually rich natural images. Changing the size or color of a salient object, switching hats or suits between two people can go unnoticed, even when the subjects directly fixate upon the change region before and immediately after the change [Grimes, 1992].

All the results provide evidence against the possibility that there is a low-level retinal image fusion in an extended coordinate system in some short-term visual memory buffer. The other extreme model argues that there is no need to have such a visual memory since the outside world itself acts as a visual memory for us to access by looking again [Minsky, 1985]. Our visual stability is maintained not because we remember so well about what we see in last fixation and thus being able to align the new visual content in current fixation with old visual short-term memory, but because we forget so fast about the old visual content and only high level abstraction, e.g. semantic codes, need to be passed on from one fixation to next such that we can always look back to examine the visual details whenever we want [Irwin & Andrew, 1996]. Our stable visual world may be constructed out of a brief retinal image and a very sketchy, higher level representation along with a pop-out mechanism to redirect attention [Blackmore et al, 1995]. It seems safe to conclude that the link between the visual contents from one fixation to next is set up at a high and abstract level, with the extraction of information necessary for specific visual tasks investigated. As mentioned before, preview benefits seem to be location independent. It has also been found that subjects can make precise discriminations between patterns, e.g. lines of different lengths and shapes of slightly different sizes, even when the stimuli were viewed in separate fixations [Irwin et al, 1990; Palmer & Ames, 1992]. However, [Hayhoe et al, 1991] found that subjects could make precise angle judgment for three points viewed in successive fixations and argued that precise spatial information could also be maintained as the task requires it. [Irwin & Andrew, 1996] proposed a new trans-saccadic memory theory based on Treisman's feature integration theory. The account includes representations of feature maps, a master map for locations and a long-term memory that stores knowledge about the objects. Temporary object representation is formed in short-term memory when attention conjoins features, together with their coarse or partial location information, into unitary wholes. Because of the limited capacity of short-term memory, only a small number (3~4) of objects can be retained across saccades. With every saccadic eye movement, a first-in-first-out use of this visual short- term memory begins. Rather than containing a detailed memory for visual contents of successive fixations, trans-saccadic memory consists of objects that are produced before a saccade (as saccade initiation directs attention to saccade target before saccade onset) and of the residual activation in feature maps and long-term memory. The objects in short-term memory as well as their activated neighboring objects in feature maps explain the maintenance of low level visual details across saccades. The objects activated in long-term memory explain high level abstractions across saccades. According to this account, instability across saccades would be detected only if one of the few objects encodes in trans-saccadic memory loses stability. [Irwin et al, 1994] shows that the object to which the eyes are sent to changes its spatial position during a saccade, instability is usually perceived, whereas if the object maintains its position, stability is usually perceived even when everything else in the scene changes position.

Questions and objectives

Irwin's trans-saccadic memory seems promising in solving the mystery of visual stability. But it is still a framework theory; it lacks details for the computation carried out by the visual system during the dynamic transition of eye movements. We still do not know how the spatial map is updated across saccades such that the visual system can judge whether the object maintains or changes its spatial location. The theory does not provide an account for the spatial distortion observed before, during and after saccades either. In addition, how to explain our intuition of uniform visual scene in our perception as kind of integrative visual buffer if only objects in trans-saccadic memory are taken in and frequently washed out? What and how are these objects in short-term memory absorted into long-term memory via memory consolidation? We might not be able to detect big changes to a scene during a saccade. The semantic meaning retains together with its vague visual content is it simply a re-rendering of our semantic knowledge when we recall? Say we fail to encode the color of the background of a scene and we do not detect its change during our saccades. If so, how and what value do we assume for this missing feature in our recall of the scene?

Though there have been many psychophysics results on saccadic suppression, change blindness and visual integration, very few literatures on neurophysiology experiments investigate these issues directly. Some EEG data and single-unit recording in LGN showed saccadic suppression, but little work has been done in direct interest of the reasons and mechanisms of saccadic suppression of visual cells. If visual streams are interspersed with gaps brought about by saccadic eye movements, how much information is allowed coming in at each stage of the visual system? The answers to all these questions would definitely contribute to a better understanding of the computations employed by human vision system in order to maintain our visual stability. It will also help us find out a way to build an intelligent camcorder and enjoy viewing through an aperture without feeling dizzy.


References

Abrams RA, et al, 1992. Adaptive Modification of Saccadic Eye Movements, Journal of Experimental Psychology: Human Perception and Performance, 118(4): 922-933.

Blackmore J. Susan, Brelstaff Gavin, Nelson, Kay, & Troscianko, Tom, 1995. Is the Richness of Our Visual World an Illusion? Transsaccadic Memory for Complex Scenes, Perception, 24(9): 1075-1081.

Bridgeman B, et al, 1995. Saccadic Suppression Relies on Luminance Information, Psychological Research, 58(3): 163-168.

Bridgeman B, et al, 1990. Saccadic Suppression of Displacement is Strongest in Central Vision, Perception, 19(1): 103-11.

Burr DC, et al, 1994. Selective Suppression of the Magnocellular Visual Pathway during Saccadic Eye Movements, Nature, 6;371(6497):511-513.

Cai RH, et al, 1997. Perceived Geometrical Relationships Affected by Eye-movement Signals, Nature, 386(6625): 601-604.

Chase R, et al, 1972. Suppression of Visual Evoked Responses to Flashes and Pattern Shifts During Voluntary Saccades, Vision Research, 12(2): 215-220.

Dodge, R., 1990. Visual Perception during Eye Movement, Psychological Review, vol. 7, 454-465.

Dodwell PC, 1971. On Perceptual Clarity, Psychol Review, 78(4): 275-89.

Duffy FH, et al, 1968. Electrophysiological Evidence for Visual Suppression prior to the Onset of a Voluntary Saccadic Eye Movement, Nature, 15;218(146):1074-5.

Gegenfurtner, Karl R. & Sperling, George, 1993. Information Transfer in Iconic Memory Experiments, J Exp Psychol Hum Percept Perform, 19(4):845-866.

Gregory, Richard L., 1990. Eye and Brain, The Psychology of Seeing, Princeton University Press, Princeton, New Jersey.

Hayhoe, Mary, Lachter, Joel, & Feldman, Jerome, 1991. Integration of Form across Saccadic Eye Movements, Perception. 20(3):393-402.

Hendry DP, 1975. Letter: Saccadic Velocities Determined by a New Perceptual Method. Vision Research, 15(1): 149-151.

Holding DH, 1971. The Amount Seen in Brief Exposures, Q J Exp Psychol, 23(1): 72-81.

Honda, Hitoshi, 1990. Eye Movements to a Visual Stimulus Flashed before, during or after a Saccade, In Dinannerod, M. (Ed.), Attention and Performance, vol. 13, pp. 567-582, Hillsdale: LEA.

Honda, Hitoshi, 1991. The Time Courses of Visual Mislocalization and of Extraretinal Eye Position Signals at the Time of Vertical Saccades, Vision Research, vol. 31, no. 11, pp. 1915-1921.

Ilg UJ, et al, 1993. Motion Perception during Saccades, Vision Research, 33(2):211-20.

Irwin, D. E., Brown, J. S., & Sun, J. -S., 1988. Visual Masking and Visual Integration across Saccadic Eye Movements, Journal of Experimental Psychology: General, vol 117, pp. 274-285.

Irwin, David E, 1991. Information Integration across Saccadic Eye Movements. Cognitive Psychology, 23(3):420-56.

Irwin, David E, et al, 1990. Visual Memory and the Perception of a Stable Visual Environment, Percept Psychophysics, 47(1):35-46.

Irwin, David E., McConkie, G. W., Carlson-Radvansky, L. A. & Currie, C., 1994. A Localist Evaluation Solution for Visual Stability across Saccades, Behavioural and Brain Sciences, 17, 265-266.

Irwin, David E. & Andrews, Rachel V., 1996. Integration and Accumulation of Information across Saccadic Eye Movements, Attention and Performance, vol. XII, pp. 125-155.

Juttner M, et al, 1993. Lateral Information Transfer across Saccadic Eye Movements, Percept Psychophys, 53(2):210-220.

Lennie P, et al, 1978. Saccadic Eye Movements and Visual Stability, Nature, 275(5682): 766-768.

Lennie, P. & Sidwell, A., 1978. Saccadic Eye Movements and Visual Stability, Nature, vol. 275, pp. 766-768.

Li WX, et al, 1990. The Influence of Saccade Length on the Saccadic Suppression of Displacement Detection, Percept Psychophys, 48(5): 453-8.

Li, WX, et al, 1990. Saccadic Suppression of Displacement: Influence of Postsaccadic Exposure Duration and of Saccadic Stimulus Elimination, Vision Research, 30(6): 945-55.

Mack A, et al, 1985. Perceived Position and Saccadic Eye Movements, Vision Research, 25(4): 501-505.

Matin E, 1974. Saccadic Suppression: a Review and an Analysis, Psychological Bulletin, 81(12):899-917.

Matin E, 1982. Saccadic Suppression and the Dual Mechanism Theory of Direction Constancy, Vision Research, 22(2): 335-336.

Matin E, et al, 1993. Saccadic Overhead: Information-processing Time with and without Saccades, Percept Psychophys, 53(4):372-380.

McConkie, G. W. & Rayner, K., 1976. Identifying the Span of the Effective Stimulus in Reading: Literature Review and Theories of Reading, In H. Singer & R. B. Ruddell (Eds.), Theoretical Models and Processes of Reading, pp. 137-162. Newark, DE: International Reading Association.

Michael JA, et al, 1967. Electrophysiological Correlates of Saccadic Suppression, Experimental Neurology, 17(2): 233-246.

Minsky, M., 1985. The Society of Mind, New York: Simon and Schuster.

Mitrani, Lenin, et al, 1970. Smearing of the Retinal Image during Voluntary Saccadic Eye Movements. Vision Research, 10(5): 405-409.

Mitrani, Lenin, et al, 1971. Is Saccadic Suppression Really Saccadic? Vision Research, 11(10): 1157-1161.

Mitrani, Lenin, et al, 1975. Various Background Pattern-effect on Saccadic Suppression, Act Nerv Super (Praha), 17(3): 161-164.

Nakamura, S., 1996. Effects of Background Stimulation upon Eye-movement Information, Percept Mot Skills, 82(2): 627-35.

Palmer. J. & Ames, C., 1992. Measuring the Effects of Multiple Eye Fixations on Memory for Visual Attributes. Perception and Psychophysics, 52, 295-306.

Pollatsek, A., Rayner, K., & Collins, W. E., 1984. Integrating Pictorial Information axross Eye Movements, Journal of Experimental Psychology: General, vol. 113, pp. 426-442.

Pollatsek, Alexander, Rayner, Keith & Henderson, John M., 1990. Role of Spatial Location in Integration of Pictorial Information across Saccades, J Exp Psychol Hum Percept Perform. 1990 Feb;16(1):199-210.

Richards W., 1969. Saccadic suppression, Journal of The Optical Society of American, 59(5): 617-623.

Rottach KG, et al, 1998. Properties of Horizontal Saccades Accompanied by Blinks. J Neurophysiol, 79(6): 2895-2902.

Sun, J. S. & Irwin, D. E., 1987. Retinal Masking during Pursuit Eye Movements: Implications for Spatiotopic Visual Persistence, Journal of Experimental Psychology: Human Perception and Performance, vol. 13, pp. 140-145.

Woodworth, R. S., 1906. Vision and Localization during Eye Movements, Psychological Bulletin, vol. 3, pp. 68-70.

Zuber BL, et al, 1966. Saccadic Suppression: Elevation of Visual Threshold Associated with Saccadic Eye Movements, Experimental Neurology, 16(1): 65-79.


Last updated on 04/27/01.