Stéphane Ross - Robotics Institute

Welcome

Hi, my name is Stéphane Ross. I am now a Software Engineer at Google X working on self-driving cars. My PhD thesis on iterative and interactive learning methods for learning good predictors for sequential decisions and/or predictions can be found here.

I recently completed my Ph.D. in Robotics at Carnegie Mellon University. During my Ph.D., I have been doing research in machine learning for applications in control and perception under the supervision of Drew Bagnell. More specifically, my research has focused on problems that arise at the intersection of Machine Learning and Dynamical Systems/Time Series where one must learn to make a sequence of predictions to achieve a task. I have developed novel learning techniques, based on online learning and interactions with the learner, that leads to efficient learning techniques with good guarantees for these sequence prediction problems. I have applied these techniques in the context of imitation learning (learning from demonstrations), structured prediction in computer vision, model-based reinforcement learning, and currently on list optimization problems (e.g. ad placement, personalized news recommendation, grasp selection and trajectory optimization for robotic manipulation).

In the context of imitation learning, I have demonstrated the efficacy of these methods on two video game problems, making a computer learn how to play a 3D racing game and Mario Bros from input images and corresponding actions taken by a human player, as well as making a UAV learn to fly through forest environments while avoiding trees seen through its camera from pilot demonstrations.

On the computer vision/perception side, I showed how the same learning techniques could be applied to learn predictors for general Structured Prediction problems where a high-dimensional output is constructed from a sequence of predictions (e.g. labeling the object present at each pixel in an image. I have applied this technique to large Structured Prediction problems, such as LADAR 3D Point Cloud classification task, and learning to identify the 3D geometry of a scene from a 2D image.

For Model-Based Reinforcement Learning, I showed how the technique could be slightly modified to learn good models of dynamical systems that leads to good performance when planning with them. I have applied this technique in the context of learning dynamic models of a simulated helicopter to perform aerobatic maneuvers.

For more information, look in the research section and the publications section.

Previous Work

I completed my M.Sc. in Computer Science at McGill University during the summer 2008 semester, under the supervision of Joelle Pineau, as part of the Reasoning and Learning Laboratory. My Master's research has been focusing on developing several efficient algorithms that allow computers/robots to simultaneously learn and plan to achieve a task or long-term goal, under various sources of uncertainty, akin to the ones a robot must face in the real world. In particular, I have worked on several extensions of Model-Based Bayesian Reinforcement Learning methods to more complex problems, such as problems involving partial observability of the world (POMDP), and continous domains. I have also worked on a more efficient extension of Model-Based Bayesian Reinforcement Learning that can exploit and discover hidden structure in the domain to learn more efficiently. The resulting algorithms I have developed allow one to find an optimal plan that trade-off between 1) learning the probabilitic model of the system 2) identifying the hidden state of the system 3) gathering rewards; such as to maximize the expected long-term rewards. This research has useful application in Robotics and Human-Computer interaction, where uncertainty on the parameters of the probabilitistic system is common, and to date, not taken into account in the planning process. The final version of my M.Sc. thesis is available in the publications section.

I completed my B.Sc. in Computer Science at Laval University at the end of the winter 2006 semester. In summer 2005, I have obtained a NSERC Undergraduate Student Research Award to work at the DAMAS Laboratory with Sébastien Paquet, former Ph.D student, on the RobocupRescue project. I also assisted him and contributed to his research on novel hybrid POMDP approaches. During the last year of my Bachelor degree, I worked part time at the DAMAS Laboratory under the supervision of Brahim Chaib-draa, and investigated the field of multiagent reinforcement learning and game theory, particularly for their application in the RoboCupRescue project. In this environment, the problem is to find a way to make the different agents learn how to cooperate, as efficiently as possible, under partial observability and communication constraints. Furthermore, in summer 2006, I worked full time at the DAMAS laboratory on new efficient algorithms for online search in POMDPs.