Modern autonomous driving systems continue to face the challenges of handling complex and variable multi-agent real-world scenarios. Some subsystems, such as perception, use deep learning-based approaches to leverage large amounts of data to generalize to novel scenes. Other subsystems, such as planning and control, still follow the classic cost-based trajectory optimization approaches, and require high efforts to handle the long tail of rare events. Deep Reinforcement Learning (RL) has shown encouraging evidence in learning complex decision-making tasks spanning from strategic games to challenging robotics tasks. Further, the dense reward structure and modest time horizons make autonomous driving a favorable prospect for applying RL. As there are practical challenges in running RL online on vehicles and most self-driving companies have millions of miles of collected data, it motivates the use of off-policy RL algorithms to learn policies that can eventually work in the real world. We explore the use of off-policy RL algorithm, Deep Q-Learning, to learn goal-directed navigation in a simulated urban driving environment. Since Deep Q-Learning methods are susceptible to instability and sub-optimal convergence, we investigate different strategies to sample experiences from the replay buffer to mitigate these issues. We also explore combining expert agent’s demonstration data with the RL agent’s experiences to speed-up the learning process. We demonstrate promising results on the CoRL2017 and NoCrash benchmarks on CARLA.
Jeff Schneider (Advisor)
Zoom Participation Enabled. See announcement.