\documentstyle[titlepage,11pt]{article} \pagestyle{myheadings} \markboth{}{} \setlength{\topmargin}{0in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \setlength{\textwidth}{6.5in} \setlength{\textheight}{8.5in} \begin{document} \begin{titlepage} \title{{\bf A Stereo Image Sequence of A Static Indoor Scene}} \author{Ning Cui\,${}^{ \sharp}$, Juyang Weng${}^{\ast \sharp}$ and Paul Cohen${}^{\sharp}$ \\ \\ \\ \\ \\ \\ ${}^{\sharp}$ Department of Electrical Engineering\\ Ecole Polytechnique de Montreal\\ POB 6079, Station ``A''\\ Montreal PQ, H3C 3A7, Canada \\ Tel: (514)340-4549, (514)340-4247\\ \\ ${}^\ast $ Centre de Recherche Informatique de Montreal\\ 3744 Jean-Brillant St., Montreal PQ, H3P 1P1, Canada\\ \\ Email: cui@ai.polymtl.ca, weng@ai.polymtl.ca, cohen@ai.polymtl.ca\\ \\ \\ } \date{} \end{titlepage} \maketitle \newpage \section{Introduction} \label{Intro} A digital stereo image sequence has been produced at the Computer Vision Laboratory of Ecole Polytechnique de Montreal, for testing motion and structure estimation algorithms based on either stereo or monocular image sequences. The sequence contains 20 stereo pairs of images. If a monocular image sequence is to be processed, either left or right image sequence can be chosen. Each of the 20 images has $480\times 512$ pixels, with 8 bits/pixel. The format of the images is ``raw''; each image has $512 \times 512$ bytes, the last 32 rows are meaningless. The two cameras were calibrated in order to compute the internal and external parameters of each camera \cite{Weng:cali} and compensate for lens distortions. In order to provide some ground truth of the 3-D structure, many lines in the scene have been measured. The calibrated parameters, the motion and structure ground truth will be given in the following sections. \section{ Camera Setup and Motion} The stereo system consisted of two f=8.5mm CCD cameras, mounted on the tip of a high-precision six-joint robot manipulator. After each stereo image pair was grabbed, the manipulator was controlled to rotate by an angle of $2.25^{\circ}$ around a vertical axis. Other motion parameters are not known precisely. \section{Calibrated data for the two stereo cameras} \label{calidata} Each camera has four basic internal parameters: row focal length $f_{u}$, column focal length $f_{v}$, principal coordinates $r_{0}$, $c_{0}$, and five distortion parameters $k_{1}$, $g_{1}$, $g_{2}$, $g_{3}$ and $g_{4}$ to correct various lens distortions (both radial and tangential). To determine the 3-D position and orientation of the optical system in the world coordinate system, each camera has six independent external parameters, which correspond to the rotation axis, rotation angle and translation vector. The relationship between the world and camera-centered coordinate systems is given by: \begin{equation} \label{eq:1} {\bf x_{c} } = R {\bf x} + {\bf T} \end{equation} where ${\bf x } = (x, y, z)^{\top} $ represents the coordinates of any visible point in the fixed world coordinate system, while ${\bf x_{c} } = (x_{c}, y_{c}, z_{c})^{\top} $ represents its coordinates of in a camera-centered coordinate system. The calibrated parameters for the left camera is shown in Table 1 and the right in Table 2. \begin{table} \begin{center} \caption{Calibration data for the left f=8.5mm lens camera.} \vspace{0.3cm} \begin{tabular}{||lr|r||} \hline Focal length: & $f_{u}$ & -639.10 \\ & $f_{v}$ & 527.09 \\ \hline Center coordinate: & $r_{0}$ & 251.07 \\ & $c_{0}$ & 260.01 \\ \hline Distortion parameter: & $k_{1}$ & 0.17645 \\ & $g_{1}$ & -0.00390 \\ & $g_{2}$ & 0.00093 \\ & $g_{3}$ & 0.01522 \\ & $g_{4}$ & 0.00373 \\ \hline External parameters: & & \\ Rotation angle ($^{\circ}$) & $\theta$ & 6.2704 \\ \hline Rotation axis & $n_{x}$ & 0.8791 \\ & $n_{y}$ & 0.4119 \\ & $n_{z}$ & 0.2396 \\ \hline Translation (mm) & $t_{1}$ & -147.74 \\ & $t_{2}$ & -191.45 \\ & $t_{3}$ & 390.83 \\ \hline \end{tabular} \end{center} \end{table} \begin{table} \begin{center} \caption{Calibration data for the right f=8.5mm lens camera.} \vspace{0.3cm} \begin{tabular}{||lr|r||} \hline Focal length: & $f_{u}$ & -639.11 \\ & $f_{v}$ & 527.87 \\ \hline Center coordinate: & $r_{0}$ & 243.85 \\ & $c_{0}$ & 261.79 \\ \hline distortion parameter: & $k_{1}$ & 0.17727 \\ & $g_{1}$ & -0.00440 \\ & $g_{2}$ & 0.00081 \\ & $g_{3}$ & 0.01768 \\ & $g_{4}$ & 0.01098 \\ \hline External parameters: & & \\ Rotation angle ($^{\circ}$) & $\theta$ & 6.8387 \\ \hline Rotation axis & $n_{x}$ & -0.9249 \\ & $n_{y}$ & 0.3654 \\ & $n_{z}$ & -0.1041 \\ \hline Translation (mm) & $t_{1}$ & -150.28 \\ & $t_{2}$ & -200.69 \\ & $t_{3}$ & 434.35 \\ \hline \end{tabular} \end{center} \end{table} If the relative configuration between the two cameras is specified by, \begin{equation} \label{eq:2} {\bf x_{l} } = M {\bf x_{r}} + {\bf B}, \end{equation} where ${\bf x }_{r} = ( x _{r} \,, y _{r} \,, z _{ r} ) ^{\top}$ and ${\bf x}_{l} = ( x _{l} \,, y _{l} \,, z _{l} )^{\top}$ represent the same 3-D point in the right-camera and left-camera systems, respectively, then $ M$ and $\bf B$, can be directly computed from the external parameters of the two cameras. The rotation angle is 11.863801$^{\circ}$, the rotation axis is (0.998870, 0.016901, 0.044419)$^{\top}$. The translation vector $\bf B$ equals (-0.001150, 0.095565, 0.006572)$^{\top}$ (units are in meters). Due to the relative configuration of the two cameras and the depth range of the scene, the common field of view of the two cameras is about 362-pixel wide. The formulas used to get the image projections after lens distortion correction is \begin{equation} \label{eq:3} \frac{x_{c}}{z_{c}}={\hat u}+(g_{1}+g_{3}){\hat u}^{2}+g_{4}{\hat u}{\hat v}+ g_{1}{\hat v}^{2}+k_{1}{\hat u}({\hat u}^{2}+{\hat v}^{2}), \end{equation} \begin{equation} \label{eq:4} \frac{y_{c}}{z_{c}}={\hat v}+g_{2}{\hat u}^{2}+g_{3}{\hat u}{\hat v}+ (g_{2}+g_{4}){\hat v}^{2}+k_{1}{\hat v}({\hat u}^{2}+{\hat v}^{2}), \end{equation} where ${\hat u} ={r-r_{0}}/f_{u}$ and ${\hat v}={c-c_{0}}/f_{v}$. \section{Structure} In order to provide structure ground truth for the image sequence, a pyramid was included in the scene. It can be seen in most of the images in the sequence. The pyramid consists of seven layers. For each layer, the height is 39mm and the four sides have the same width. From top to bottom, the first layer's side-width is 41.5mm, the second layer's side-width is 122mm. The third layer's side-width is 160mm. The fourth layer's side-width is 201mm. The fifth layer's side-width is 241mm. The sixth layer's side-width is 280mm. The seventh layer's side-width is 360mm. \begin{flushleft} {\Large \bf Acknowledgements} The help of Nguyen H. Hai, system analyst in our computer vision group is gratefully acknowledged. \end{flushleft} \begin{thebibliography}{9} \setlength{\itemsep}{0in} \bibitem{Weng:cali} J. Weng, P. Cohen and Marc Herniou, Calibration of stereo cameras using a non-linear distortion model, in {\em The proceedings of the 10th International Conference on Pattern Recognition}, Atlantic City, New Jersey, June 16-21, 1990, pp. 246-253. \end{thebibliography} \end{document}