\documentstyle[titlepage,11pt]{article}
\pagestyle{myheadings}
\markboth{}{}
\setlength{\topmargin}{0in}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{8.5in}
\begin{document}
\begin{titlepage}
\title{{\bf A Stereo Image Sequence of A Static
Indoor Scene}}
\author{Ning Cui\,${}^{ \sharp}$, Juyang Weng${}^{\ast \sharp}$ and Paul Cohen${}^{\sharp}$ \\
\\
\\
\\
\\
\\
${}^{\sharp}$ Department of Electrical Engineering\\
Ecole Polytechnique de Montreal\\
POB 6079, Station ``A''\\
Montreal PQ, H3C 3A7, Canada \\
Tel: (514)340-4549, (514)340-4247\\
\\
${}^\ast $ Centre de Recherche Informatique de Montreal\\
3744 Jean-Brillant St., Montreal PQ, H3P 1P1, Canada\\
\\
Email: cui@ai.polymtl.ca, weng@ai.polymtl.ca, cohen@ai.polymtl.ca\\
\\
\\
}
\date{}
\end{titlepage}
\maketitle
\newpage
\section{Introduction}
\label{Intro}
A digital stereo image sequence has been produced at the Computer Vision
Laboratory of Ecole Polytechnique de
Montreal, for testing motion and structure
estimation algorithms based on either
stereo or monocular image sequences. The sequence contains 20
stereo pairs of images. If a monocular image sequence is to be processed,
either left or right image sequence can be chosen.
Each of the 20 images has $480\times 512$ pixels, with 8 bits/pixel. The
format of the images is ``raw''; each image has $512 \times 512$ bytes,
the last 32 rows are meaningless.
The two cameras were calibrated in order to compute
the internal and external parameters
of each camera \cite{Weng:cali} and compensate for lens distortions.
In order to provide some ground truth
of the 3-D structure, many lines in the scene have been measured. The
calibrated parameters, the motion and structure ground truth will be given in
the following sections.
\section{ Camera Setup and Motion}
The stereo system consisted of two f=8.5mm CCD cameras, mounted
on the tip of a high-precision six-joint robot manipulator.
After each stereo image
pair was grabbed, the manipulator was controlled to
rotate by an angle of $2.25^{\circ}$ around a vertical axis.
Other motion parameters are not known precisely.
\section{Calibrated data for the two stereo cameras}
\label{calidata}
Each camera has four basic internal
parameters: row focal length $f_{u}$,
column focal length $f_{v}$, principal coordinates $r_{0}$,
$c_{0}$, and five distortion parameters $k_{1}$, $g_{1}$,
$g_{2}$, $g_{3}$ and $g_{4}$ to correct various lens distortions (both
radial and tangential). To
determine the 3-D position and orientation of the optical system in the world
coordinate system, each
camera has six independent external parameters, which correspond to the
rotation axis, rotation angle and translation vector.
The relationship between the world and camera-centered coordinate systems
is given by:
\begin{equation}
\label{eq:1}
{\bf x_{c} } = R {\bf x} + {\bf T}
\end{equation}
where ${\bf x } = (x, y, z)^{\top} $ represents the coordinates of any
visible point in the fixed world coordinate system, while
${\bf x_{c} } = (x_{c}, y_{c}, z_{c})^{\top} $ represents its coordinates of
in a camera-centered coordinate system.
The calibrated parameters for the left camera is shown in Table 1 and the right
in Table 2.
\begin{table}
\begin{center}
\caption{Calibration
data for
the left f=8.5mm lens camera.}
\vspace{0.3cm}
\begin{tabular}{||lr|r||} \hline
Focal length: & $f_{u}$ & -639.10 \\
& $f_{v}$ & 527.09 \\ \hline
Center coordinate: & $r_{0}$ & 251.07 \\
& $c_{0}$ & 260.01 \\ \hline
Distortion parameter: & $k_{1}$ & 0.17645 \\
& $g_{1}$ & -0.00390 \\
& $g_{2}$ & 0.00093 \\
& $g_{3}$ & 0.01522 \\
& $g_{4}$ & 0.00373 \\ \hline
External parameters: & & \\
Rotation angle ($^{\circ}$) & $\theta$ & 6.2704 \\ \hline
Rotation axis & $n_{x}$ & 0.8791 \\
& $n_{y}$ & 0.4119 \\
& $n_{z}$ & 0.2396 \\ \hline
Translation (mm) & $t_{1}$ & -147.74 \\
& $t_{2}$ & -191.45 \\
& $t_{3}$ & 390.83 \\ \hline
\end{tabular}
\end{center}
\end{table}
\begin{table}
\begin{center}
\caption{Calibration
data for
the right f=8.5mm lens camera.}
\vspace{0.3cm}
\begin{tabular}{||lr|r||} \hline
Focal length: & $f_{u}$ & -639.11 \\
& $f_{v}$ & 527.87 \\ \hline
Center coordinate: & $r_{0}$ & 243.85 \\
& $c_{0}$ & 261.79 \\ \hline
distortion parameter: & $k_{1}$ & 0.17727 \\
& $g_{1}$ & -0.00440 \\
& $g_{2}$ & 0.00081 \\
& $g_{3}$ & 0.01768 \\
& $g_{4}$ & 0.01098 \\ \hline
External parameters: & & \\
Rotation angle ($^{\circ}$) & $\theta$ & 6.8387 \\ \hline
Rotation axis & $n_{x}$ & -0.9249 \\
& $n_{y}$ & 0.3654 \\
& $n_{z}$ & -0.1041 \\ \hline
Translation (mm) & $t_{1}$ & -150.28 \\
& $t_{2}$ & -200.69 \\
& $t_{3}$ & 434.35 \\ \hline
\end{tabular}
\end{center}
\end{table}
If the relative
configuration between the two cameras is specified by,
\begin{equation}
\label{eq:2}
{\bf x_{l} } = M {\bf x_{r}} + {\bf B},
\end{equation}
where ${\bf x }_{r} = ( x _{r} \,, y _{r} \,,
z _{ r} ) ^{\top}$
and ${\bf x}_{l} = ( x _{l} \,, y _{l} \,, z _{l} )^{\top}$
represent the same 3-D point in the right-camera and
left-camera systems, respectively,
then $ M$ and $\bf B$, can be directly computed
from the external parameters of the two cameras.
The rotation angle is 11.863801$^{\circ}$,
the rotation axis is
(0.998870, 0.016901, 0.044419)$^{\top}$. The translation vector $\bf B$
equals
(-0.001150, 0.095565, 0.006572)$^{\top}$ (units are in meters).
Due to the relative configuration
of the two cameras and the depth range of the scene, the common field
of view of the two cameras is about 362-pixel wide.
The formulas used to get the image projections
after lens distortion correction is
\begin{equation}
\label{eq:3}
\frac{x_{c}}{z_{c}}={\hat u}+(g_{1}+g_{3}){\hat u}^{2}+g_{4}{\hat u}{\hat v}+
g_{1}{\hat v}^{2}+k_{1}{\hat u}({\hat u}^{2}+{\hat v}^{2}),
\end{equation}
\begin{equation}
\label{eq:4}
\frac{y_{c}}{z_{c}}={\hat v}+g_{2}{\hat u}^{2}+g_{3}{\hat u}{\hat v}+
(g_{2}+g_{4}){\hat v}^{2}+k_{1}{\hat v}({\hat u}^{2}+{\hat v}^{2}),
\end{equation}
where ${\hat u} ={r-r_{0}}/f_{u}$ and ${\hat v}={c-c_{0}}/f_{v}$.
\section{Structure}
In order to provide structure ground truth for the image sequence, a pyramid
was included in the scene. It can be seen in most of the images in the
sequence. The pyramid consists of seven layers. For each layer,
the height is 39mm and the four sides have
the same width. From top to bottom, the first layer's side-width is 41.5mm,
the second layer's side-width is 122mm.
The third layer's side-width is 160mm. The fourth layer's
side-width is 201mm. The fifth layer's side-width is 241mm.
The sixth layer's side-width
is 280mm. The seventh layer's side-width is 360mm.
\begin{flushleft}
{\Large \bf Acknowledgements}
The help of Nguyen H. Hai,
system analyst in our computer vision group is gratefully acknowledged.
\end{flushleft}
\begin{thebibliography}{9}
\setlength{\itemsep}{0in}
\bibitem{Weng:cali} J. Weng, P. Cohen and Marc Herniou, Calibration of stereo
cameras using a non-linear distortion model, in {\em The proceedings
of the 10th International Conference on Pattern Recognition}, Atlantic City,
New Jersey, June 16-21, 1990, pp. 246-253.
\end{thebibliography}
\end{document}