Question: Why is it a mistake to converge stereo cameras whose images will be viewed by people (vs. analysed by a computer)? Short Answer: Any 3D scene point (that is not in the plane that contains the horizontally converged lens axes) projects onto the converged sensors at different heights. When the resulting images are rotated onto the plane of the viewing screen, corresponding image points on the screen are vertically displaced, so the lines from the eyes to the corresponding screen points nowhere intersect, i.e., they no longer correspond to a 3D scene point. The underlying reason is that rotating both images onto the screen plane destroys knowledge of the initial camera geometry that is crucial to reconstructing the 3D scene. Long Answer: Let the plane V of the viewing screen be vertical and perpendicularly equidistant from the centers-of-projection L and R of the viewer's eyes, and let line LR be horizontal. Any plane containing LR intersects V in a horizontal line [1]. Let P be a world point that is to be depicted stereoscopically by drawing it on the screen for the left eye at L', the intersection of line LP with the screen, and by drawing it on the screen for the right eye at R', the intersection of line RP with the screen. Points L, R, and P determine a plane that contains LR, so L'R' is horizontal [2]. Conversely, given purportedly corresponding points L" and R" with L"R" not horizontal, LR and L"R" are not parallel [3], they do not intersect [4], LR and L"R" are not coplanar [5], LL" and RR" cannot intersect [6], so L" and R" cannot actually correspond to a world point. Notice I have said nothing about the "gaze directions" of the eyes, and nothing about the relationship between left and right retinal points that the brain can or cannot fuse. It is purely a matter of geometry that only horizontally separated points on the screen correspond to actual points in 3D space. So what happens when cameras (i.e., the perpendiculars to their sensor planes) are converged and their images are printed on a viewing screen? A rectangle centered on and perpendicular to the altitude of the isocolese triangle formed by the centers of the sensors and the intersection of the normals to the sensors through their centers is captured by the sensors as oppositely tapered trapezoids [7]. When these trapezoids are drawn on the screen, corresponding points (e.g., the actual rectangle corners) are vertically disparate. By the above discussion, they do not correspond to a 3D-world point. How did this happen? Initially the sensors were not parallel. In drawing both the images they recorded onto the same screen, crucial knowledge about their actual relative orientation was discarded. It is this discarding of information about the actual camera geometry that is the origin of the problem. In fact, recognizing this, it seems that there is a way to repair the damage. Rather than simply overlaying the two images, we can project them back through an optical system that is converged exactly the same way the sensors were converged originally, undoing the keystone distortion. This arrangement is actually used for 3D slide shows and movies, where left and right films are separately projected onto one screen. Equivalently, a linear rectification algorithm can be used to undo the keystone distortion. The optical and the algorithmic solutions are both exact in the approximation of pinhole optics, but for optics that use lenses they are inexact. One source of inexactness is that even ideal gaussian lenses exhibit depth-of-field effects which neither optics nor geometrical rectification corrects. Another source of inexactness is that real lenses also have aberrations not all of which can be corrected by optical reversal, and which are generally impractical to correct algorithmically. By the way, a tempting but incorrect answer to the original question is that the eye is spherical, so to a rough approximation it receives and perceives a 3D->2D projection whose shape is independent of where it is pointed; only the location of the image on the retina changes. [The reason it is a rough approximation is that the center of projection is on the surface of the sphere, not at its center. The reason the eye points is, of course, because of the foveal structure of the retina.] The camera, with its flat film or CCD, in contrast records a 3D->2D projection whose shape depends strongly on where it is pointed. But this pseudo-answer evades the real issue, which is the destruction of information about relative camera orientation when the images are both rotated into the viewing plane. The question should correctly be answered with reference only to the locations of the centers of projection of the eyes; it should not be necessary to invoke the detailed engineering of the eye. I have described only the special case of the eyes horizontal and equidistant from the viewing screen. It should be apparent from this discussion that this is in fact the only correct viewing geometry. That is, to achieve strictly correct viewing geometry it is necessary not only for the camera axes to be parallel (and the sensors displaced if necessary to overlay their fields of view), but for the viewer's eyes to be located in the original camera positions. However it is not necessary for the viewer's eyes to point in the initial camera pointing directions; that is a matter of individual preference. References: (1) Arthur C. Hardy and Fred H. Perrin, Principles of optics, McGraw-Hill, 1932, chapter on stereoscopy. (2) V. S. Grinberg, G. W. Podnar, M. W. Siegel, "Geometry of Binocular Imaging", in Stereoscopic Displays and Applications V, Proceedings of the 1994 SPIE/IS&T Conference on Electronic Imaging: Science & Technology, San Jose, California, USA, SPIE, 8-10 February 1994, pp. 56-65. Notes: [1] "Any plane containing LR" can be defined by parallel lines LL' and RR' perpendicular to LR with L' and R' in V. The perpendicular equidistance assumption makes LL'R'R a rectangle. Thus L'R' is parallel to LR. LR is assumed horizontal, so L'R' must be horizontal. [2] Conclusion of previous paragraph. [3] LR is horizontal but L"R" is not. [4] L"R" is in the screen plane and LR is not. [5] Coplanar lines must either intersect or be parallel. [6] Intersecting lines are coplanar. [7] This is called "keystone distortion".