Abstract

Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled ground-truth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.

Oral Presentation (15-min)

Clip-Art Data Generation with 3D-based Compositing

Given a time-lapse video, we automatically generate 2D/3D training data under severe occlusions. We start by detecting each object in the video, and unoccluded (fully visible) objects are identified. Each unoccluded object is then reconstructed using the ground plane and camera parameters. With the 3D pose, unoccluded objects are composited back into the same location (i.e., clip-art style) in a geometrically consistent approach, ensuring physically accurate and realistic occlusion configurations. The composited image and its pseudo-groundtruth from off-the-shelf methods (e.g., segmentation, keypoints, shapes) are utilized to train a model that can produce accurate 2D/3D object reconstruction under severe occlusions.

Another example at a different location:

Comparison with 2D-based Compositing

Our 3D-based compositing method generates realistic and geometrically accurate occlusion configurations, in contrast to the 2D-based method (e.g., cars and people overlapping in an unfeasible way).

Vehicle-People Occlusion

Vehicle-Vehicle Occlusion

People-People Occlusion

Paper

For an in-depth description of WALT3D, please refer to our paper.

WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion

Khiem Vuong, N. Dinesh Reddy, Robert Tamburo, Srinivasa G. Narasimhan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024.
[pdf] [supp] [bibtex]

We do not perform any human subjects research from these cameras.

This work was supported in part by an NSF Grant CNS-2038612, a US DOT grant 69A3551747111 through the Mobility21 UTC and grants 69A3552344811 and 69A3552348316 through the Safety21 UTC.

If you have any question, please feel free to contact Khiem Vuong.

Last Modified: April 2nd, 2024

WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion

Khiem Vuong^1* N. Dinesh Reddy^2* Robert Tamburo¹ Srinivasa G. Narasimhan¹

¹Carnegie Mellon University ²Amazon

CVPR 2024 (Oral, Top 0.8%)

Abstract

Oral Presentation (15-min)

Clip-Art Data Generation with 3D-based Compositing

Another example at a different location:

Comparison with 2D-based Compositing

Qualitative Results

Vehicle-People Occlusion

Vehicle-Vehicle Occlusion

People-People Occlusion

Paper

Potential Societal Impact

Acknowledgements

Contact

WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion

Khiem Vuong1* N. Dinesh Reddy2* Robert Tamburo1 Srinivasa G. Narasimhan1

1Carnegie Mellon University 2Amazon

CVPR 2024 (Oral, Top 0.8%)

Abstract

Oral Presentation (15-min)

Clip-Art Data Generation with 3D-based Compositing

Another example at a different location:

Comparison with 2D-based Compositing

Qualitative Results

Vehicle-People Occlusion

Vehicle-Vehicle Occlusion

People-People Occlusion

Paper

Potential Societal Impact

Acknowledgements

Contact

Khiem Vuong^1* N. Dinesh Reddy^2* Robert Tamburo¹ Srinivasa G. Narasimhan¹

¹Carnegie Mellon University ²Amazon