Patents Pending


Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicle


Fast and accurate 3D reconstruction of multiple dynamic rigid objects (eg. vehicles) observed from wide-baseline, uncalibrated and unsynchronized cameras is challenging. On one hand, feature tracking works well within each view but is hard to correspond across multiple cameras with limited overlap in fields of view or due to occlusions. On the other hand, advances in deep learning have resulted in strong detectors that work across different viewpoints but are still not precise enough for triangulation based reconstruction. In this work, we develop a framework to fuse both the single-view feature tracks and multiview detected part locations to significantly improve the detection, localization and reconstruction of moving vehicles, even in the presence of strong occlusions. We demonstrate our framework at a busy traffic intersection by reconstructing over 40 vehicles passing within a 3-minute window. We evaluate the different components within our framework and compare to alternate approaches such as reconstruction using tracking-by-detection.

Key Insights

On one hand, structured semantic keypoints (SP) can be matched accurately across viewpoints but are not precise enough for triangulation based reconstruction. On the other hand, unstructured points (UP) such as the Harris corners can be tracked precisely within view but cannot be associated accurately across views. CarFusion exploits the rigid motion of cars to estimate the 3D location of the structured points with smallest deviation to the surrounding unstructured points of the time.

Depth Adaptive Laser Power


Our overall pipeline for dynamic 3D reconstruction of multiple cars from uncalibrated and unsynchronized video cameras. We use Car-Centric RANSAC(cRANSAC) to find correspondences across views for reconstruction. We use multiple physical contraints on the dimension of the car to get accurate correspondences. Since our cameras are unsynchronized, we use unstructured points to align different cameras using spatio-temporal bundle adjustment. We exploit the rigidity bewtween the structured points and the tracks of the unstructured feature points overtime to obtain precise reconstruction of the moving vehicle. The reconstructions are reprojected into all the views and are used to bootstrap the detectors.


Dataset (14 Keypoints annotations for 100,000 cars(53,000 Images))

We provide mannual annotations of 14 semantic keypoints for 100,000 car instances (sedan, suv, bus, and truck) from 53,000 images captured from 18 moving cameras at Multiple intersections in Pittsburgh, PA. Please fill the google form to get a email with the download links:

To view the labels, please run the following command:

python PathToData CamID_FrameID

To read the data in the coco format we provide a python wrappper at CARFUSION_TO_COCO

Craig Sequence [Map]

Fifth Sequence [Map]

Morewood Sequence[Map]

Butler Sequence [Map]

Penn Sequence [Map]

4D Reconstruction

Craig Intersection(videos captured for 10 mins, played at 2X speed)

Morewood Intersection(videos captured for 10 mins, played at 4X speed)

Fifth Intersection(videos captured for 3 mins, played at 4X speed)

Butler Intersection(videos captured for 3 mins, played at 4X speed)

More Details

For an in-depth description of the technology behind CarFusion, please refer to our paper and the accompanying video.

"CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicle"
N. Dinesh Reddy, Minh Vo, and Srinivasa Narasimhan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
[PDF][Poster] [Supp] [Bibtex]


This paper was supported in parts by a Heinz Foundation grant, an NSF award CNS-1446601, an ONR grant N00014-15-1-2358, a CMU University Transportation Center T-SET grant and a PhD fellowship from Qualcomm.

Copyright © 2018 Dinesh Reddy