Given a single image of a scene, humans have few issues answering questions about its 3D structure like “is this facing upwards?” even though mathematically speaking this should be impossible. We have similarly have few issues accounting for this 3D structure in answering viewpoint independent questions like "is this the same carpet as the one in your office?'', even if the carpets were viewed from different views and have no pixels in common. 

At the heart of the issue is that images are the result of two phenomena: the underlying 3D shape, which we call the 3D structure, and canonical texture that is applied to this shape, which we call the style. In the 3D world, these phenomena are distinct, but when we observe the world, they become mixed. Although the identity of both structure and style gets lost in the process, if we know about regularities in both phenomena, we can narrow down the possible combinations that could have produced our image.

This dissertation aims to better enable computer to understand images in a 3D way by factoring the image into 3D structure and style. The key is that we can take advantage of regularity in both phenomena to inform our interpretation. For instance, we do not expect carpet texture on ceilings or 75 degree angles between walls. By using regularities, especially ones discovered from large-scale data, we can winnow away the possible combinations of 3D structure and style that could have produced our image.

Thesis Committee:
Abhinav Gupta (Co-Chair)
Martial Hebert (Co-Chair)
Deva Ramanan
William T. Freeman (Massachusetts Institute of Technology)
Andrew Zisserman (University of Oxford)

Copy of Thesis Document

Boris Sofman
As an engineer and researcher with experience in building diverse robotic systems - from consumer products to off-road autonomous vehicles and bomb-disposal robots - Boris is making it his life’s work to create products that people would not expect to be possible. He earned a B.S., M.S. and Ph.D. from the Robotics Institute of Carnegie Mellon University. Boris is an avid tennis player, but finds that Anki doesn’t allow him to play nearly as often as he’d like.

Hanns Tappeiner:
Hanns is passionate about creating products he always wanted but didn’t exist. He has worked extensively at refining the connections between operator and robot, developing deeper senses of feel and control. Hanns has designed robotics across the globe for companies in Germany, Italy, Austria and the US. Hanns studied at the University of Technology in Vienna before earning an M.S. in Robotics from the Robotics Institute at Carnegie Mellon University. On the weekend, Hanns is probably working or outdoors somewhere with his motorcycle.

Faculty Host: Martial Hebert

Robots still struggle with everyday manipulation tasks. An overriding problem with robotic manipulation is uncertainty in the robot's state and calibration parameters. Small amounts of uncertainty can lead to complete task failure. This thesis explores ways of tracking and calibrating noisy robot arms using computer vision, with an aim toward making them more robust. We consider three systems with increasing complexity: a noisy robot arm tracked by an external depth camera, a noisy arm that localizes itself using a hand-mounted depth sensor looking at an unstructured word, and a noisy arm that only has a single hand-mounted monocular RGB camera estimating its state while simultaneously calibrating its camera extrinsics. Using techniques taken from dense object tracking, fully dense SLAM and sparse general SLAM, we are able to automatically track the robot and extract its calibration parameters. We also provide analysis linking these problems together, while exploring the fundamental limitations of SLAM-based approaches for calibrating robot arms.

Thesis Committee:
Siddhartha Srinivasa (Co-chair)
Michael Kaess (Co-chair)
George Kantor
Andrew Davison (Imperial College, London)

Copy of Draft Document

Loose, granular terrain can cause rovers to slip and sink, inhibiting mobility and sometimes even permanently entrapping a vehicle. Traversability of granular terrain is difficult to foresee using traditional, non-contact sensing methods, such as cameras and LIDAR. This inability to detect loose terrain hazards has caused significant delays for rovers on both the Moon and Mars and, most notably, contributed to Spirit's permanent entrapment in soft sand on Mars. These delays are caused both by slipping in unidentified loose sand and by wasting time analyzing or completely circumventing benign sand. Reliable prediction of terrain traversability would greatly improve both the safety and the operational speed of planetary rover operations. This thesis leverages thermal inertia measurements and physics-based terramechanics models to develop algorithms for slip prediction in planetary granular terrain.

The ability of a rover to traverse granular terrain is a complex function of the geometry of the terrain, the rover's configuration, and the physical properties of the granular material, such as density and particle geometry. Vision-based traversability prediction methods are inherently limited. Subsurface characteristics are not exclusively correlated with visual appearance of the surface layer. Vision does not provide enough information to fully understand all the physical properties that influence mobility. The inherent difficulty of estimating traversability is compounded by the conservative nature of planetary rover operations. Mission operators actively avoid potentially hazardous regions, which makes strictly data-driven regression approaches difficult due to limited data.

Pre-proposal research has shown that thermal inertia is correlated to and improves estimates of traversability. This has been demonstrated both in terrestrial experiments and by using data from the Curiosity rover. Unlike visual appearance, thermal properties of a material are not only influenced by the surface of terrain but also by the physical properties of the underlying material. This thesis develops techniques for predicting the traversability of terrain by leveraging thermal inertia measurements to provide a greater understanding of material properties both at and below the surface.

The proposed research will develop computationally efficient traversability prediction technologies. Thermal inertia and geometric features, such as angle of repose, will be used to estimate granular terrain properties. Then surface geometry and soil parameters will be used as inputs to a learning-based slip prediction algorithm. The algorithm will be trained on both in-situ and synthetic data to reduce overfitting and increase prediction accuracy. Synthetic data will be generated using state-of-the-art terramechanics simulators that produce accurate slip estimates given known terrain properties but are too computationally inefficient to be used for tactical rover planning. Evaluation will occur on data from the Mars rovers. Results will be compared to vision-only methods in order to understand in what situations the addition of thermal inertia can improve traversability prediction.

Thesis Committee:
William "Red" Whittaker (Chair)
David Wettergreen
Steven Nuske
Issa Nesnas (Jet Propulsion Laboratory)

The last decade has seen remarkable advances in 3D perception for robotics. Advances in range sensing and SLAM now allow robots to easily acquire detailed 3D maps of their environment in real-time.

However, adaptive robot behavior requires an understanding the environment that goes beyond pure geometry. A step above purely geometric maps are so-called semantic maps, which incorporate task-oriented semantic labels in addition to 3D geometry. In other words, a map of what is  where . This is a straightforward representation that allows robots to use semantic labels for navigation and exploration planning.

In this proposal we develop learning-based approaches for semantic mapping with image and range sensors. We make three main contributions.

In our first contribution, which is completed work, we developed VoxNet, a system for accurate and efficient semantic classification of 3D point cloud data. The key novelty in this system is the integration of volumetric occupancy maps with spatially 3D Convolutional Neural Networks (CNNs). The system showed state-of-the-art performance in 3D object recognition and helicopter landing zone detection.

In our second contribution, motivated by the complementary information in image and point cloud data, we propose a CNN architecture fusing both modalities. The architecture consists of two interconnected streams: a volumetric CNN stream for the point cloud data, and a more traditional 2D CNN stream for the image data. We will evaluate this architecture for the tasks of terrain classification and obstacle detection in an autonomous All Terrain Vehicle (ATV).

In the final contribution, we propose a semantic mapping system for intelligent information gathering on Micro Aerial Vehicles (MAVs). In pursuit of a lightweight solution, we forego active range sensing and use monocular imagery as our main data source. This leads to various challenges, as we now must infer *where* as well as *what*. We outline our plan to solve these challenges using monocular cues, inertial sensing, and other information available to the vehicle.

Thesis Committee:
Sebastian Scherer (Chair)
Martial Hebert
Abhinav Gupta
Raquel Urtasun (University of Toronto)

Copy of Proposal Document

The students from the Double-Major in Robotics will demonstrate the robots they have built as part of the Capstone Course (16-474). Teams of students followed a Systems Engineering process to develop functional robots that meet specific performance requirements.

This year's projects include:

  • An autonomous airline trolley beverage cart
  • A mobile robot for autonomous detection and elimination of weeds
  • An autonomous robotic cart that follows a user while carrying the user's tools and other loads

Please stop by to see the demonstrations and talk to the teams!

Faculty Host: Dimi Apostolopoulos

Nine RI MRSD program student teams will use posters, videos, and hardware to show their project work on robots for package delivery, river taxi service, wellhead servicing, the Amazon Picking Challenge, undersea docking, 3D printing with COTS part inclusion, swarm-based facial recognition, autonomous parking, and autonomous landing on a moving shipdeck.

Faculty Host: John Dolan

Please visit the posters and learn about the exciting projects students are working on as part of the graduate course on Computer Vision.

Faculty Host: Deva  Ramanan

Please visit the posters and learn about the exciting projects students are working on as part of the graduate course on Computer Vision.

Faculty Host: Deva Ramanan


Subscribe to RI