Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicle
Fast and accurate 3D reconstruction of multiple dynamic rigid objects (eg. vehicles) observed from wide-baseline, uncalibrated and unsynchronized cameras is challenging. On one hand, feature tracking works well within each view but is hard to correspond across multiple cameras with limited overlap in fields of view or due to occlusions. On the other hand, advances in deep learning have resulted in strong detectors that work across different viewpoints but are still not precise enough for triangulation based reconstruction. In this work, we develop a framework to fuse both the single-view feature tracks and multiview detected part locations to significantly improve the detection, localization and reconstruction of moving vehicles, even in the presence of strong occlusions. We demonstrate our framework at a busy traffic intersection by reconstructing over 40 vehicles passing within a 3-minute window. We evaluate the different components within our framework and compare to alternate approaches such as reconstruction using tracking-by-detection.
On one hand, structured semantic keypoints (SP) can be matched accurately across viewpoints but are not precise enough for triangulation based reconstruction. On the other hand, unstructured points (UP) such as the Harris corners can be tracked precisely within view but cannot be associated accurately across views. CarFusion exploits the rigid motion of cars to estimate the 3D location of the structured points with smallest deviation to the surrounding unstructured points of the time.
Our overall pipeline for dynamic 3D reconstruction of multiple cars from uncalibrated and unsynchronized video cameras. We use Car-Centric RANSAC(cRANSAC) to find correspondences across views for reconstruction. We use multiple physical contraints on the dimension of the car to get accurate correspondences. Since our cameras are unsynchronized, we use unstructured points to align different cameras using spatio-temporal bundle adjustment. We exploit the rigidity bewtween the structured points and the tracks of the unstructured feature points overtime to obtain precise reconstruction of the moving vehicle. The reconstructions are reprojected into all the views and are used to bootstrap the detectors.
We provide mannual annotations of 14 semantic keypoints for 100,000 car instances (sedan, suv, bus, and truck) from 53,000 images captured from 18 moving cameras at Multiple intersections in Pittsburgh, PA. Please fill the google form to get a email with the download links:
To view the labels, please run the following command:
python Visualize.py PathToData CamID_FrameIDTo read the data in the coco format we provide a python wrappper at CARFUSION_TO_COCO
Craig Sequence [Map]
Fifth Sequence [Map]
For an in-depth description of the technology behind CarFusion, please refer to our paper and the accompanying video."CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicle"