Log of Past CMU Reinforcement Learning Talks

RL talks and abstracts for 1992-93, 1993-94, 1994-95, 1995-96, 1996-97, 1997-98


		      Carnegie Mellon University
		      School of Computer Science

		 Organizer:  Justin.Boyan@cs.cmu.edu

Sep 25	Sebastian Thrun
	Explanation-based Neural Networks for Robot Control

Oct  2  Long-Ji Lin
	Practice Thesis Defense

Oct  9  Scott Fahlman
	The Cascade algorithms for fast continuous function learning

Oct 23  Lonnie Chrisman
	Causal Differencing:  Explicit-Bias Q-Learning (EBQ)

Oct 30  Michael Littman
	A classification of reinforcement learning environments

Nov  6  Rich Caruana
	Multi-task learning:  thoughts and results

Nov 10  Gerry Tesauro (IBM)
	Practical Experiences in TD Learning

Nov 13  Jan Zytkow
	Creative response by combining simple experiences

Dec 15  Long-Ji Lin
	Thesis Defense

Jan 25  Michael Littman  (& Dave Ackley, video)
	Evolutionary Reinforcement Learning &
	Distributed Lamarckian Evolution

Feb  8  Erik Ydstie
	Inverse Adaptive Control Using Connectionist Networks

Feb 24  Andrew Moore
	Faculty candidate talk  -- Memory-Based Learning for Control

Mar 15  Avrim Blum
	Efficient path planning in unfamiliar geometric terrain

Mar 25  Mark Ring (Texas)
	Hierarchical Learning

Apr  9  Sebastian Thrun
	Problems with Function Approximation for Q-Learning

Apr 12  Joseph O'Sullivan
	Reinforcement Learning with Vision for the Xavier robot

Apr 19  Lonnie Chrisman
	Representing and Reasoning about Modeling Limitations

Apr 26	Sven Koenig
	Complexity Analysis of Reinforcement Learning

May 3   Gregory Karakoulas (National Research Council, Canada)
	Reinforcement Learning in Continuous State and Action Spaces

May 17  Geoff Gordon
	Continuous Q-functions are (sort of) PAC learnable

May 27  Justin Boyan
	A Distributed RL Scheme for Packet Routing

Jun 3	Ari Juels (Berkeley)
	Rethinking the Genetic Algorithm

Jun 17  Lonnie Chrisman and Michael Littman
	ML93 Preview:  RL in Environments with Hidden State


  • Sep 20: Sebastian Thrun, Discovering skills in RL
  • Sep 27: Andrew Moore, Three new algorithms for fast, massive-scale cross-validation searches
  • Oct 4: Goang-Tay Hsu, Learning the backtracking policy for potential-guided path planning
  • Oct 11: Sebastian Thrun, Explanation-based neural network learning in chess
  • Oct 18: Rich Caruana presenting, Exploring the decision forest [Murphy & Pazzani (UCI)]
  • Oct 25: Andrew Moore, The Parti-Game algorithm for online variable resolution RL
  • Nov 1: Goang-Tay Hsu, Parameterization as path planning for redundant manipulators
  • Nov 12: Michael Littman (Brown), Learning probabilistic policies for game-playing and coping with hidden state
  • Nov 17: Marcos Salganicoff (U.Penn), A vision-based system for the learning of pushing manipulation
  • Nov 23: M. Sheppard (U. of Teesside, UK), The application of RL to the compensation of reactive power disturbances
  • Dec 8: Filippo Neri, Semi-automatic synthesis of a fuzzy logic controller
  • Dec 13: Sven Koenig, Planning for risk-sensitive agents
  • Jan 21: Astro Teller, The Evolution of Mental Models
  • Jan 27: Lonnie Chrisman, Reasoning about Actions at Multiple Levels of Granularity
  • Feb 3, 3:45, WeH 7500: Tom Mitchell, What drives learning - Current data or prior knowledge? (Distinguished Lecture)
  • Feb 10: Rich Caruana and Dayne Freitag, Greedy attribute selection
  • Feb 17: Discussion, What's our group all about?
  • Feb 24: Filippo Neri, Dealing with Huge Amounts of Data in an Object Reaching Task
  • Mar 3: Hank Wan, Spatial reasoning in rats: the sense of direction and place
  • Mar 10: Sebastian Thrun, Experiments in Robot Learning
  • Mar 17: Michael Littman, POMDP's: A framework for reinforcement learning with hidden state?
  • Theory Seminar, Friday Mar 25, 2:30, WeH 7220: B. K. Natarajan (HP Labs), Learning Functions via Occam's Razor with Application to Filtering
  • Mar 31: Justin Boyan, How to Approximate a Value Function
  • Friday April 8, 2:00, WeH 7220: Satinder Singh (MIT), Monte Carlo Learning in Environments with Hidden State
  • Friday April 15, 2:00, WeH 7220: Andrew McCallum (Rochester), Instance-Based State Identification
  • Friday April 22, 3:30, Baker Hall Adamson wing: Maja Mataric (MIT), Group Behavior and Learning in Autonomous Systems
  • April 28: Astro Teller, Parallel Algorithm Discovery and Orchestration
  • Friday April 29, 3:30, Baker Hall: Andrew Moore, Learning control with kernel-based function approximators and intense cross-validation
  • May 5: Geoff Gordon, Hierarchical Mixtures for Non-Experts
  • May 16, WeH 4623, 10:30: Lynn Stein (MIT AI Lab), Towards a Cognitive Robotics (AI Seminar)
  • Friday May 20, WeH 7220, 1:30: Rudolf Bauer (Siemens), Robust Obstacle Avoidance in Unknown and Cramped Environments
  • June 3: Joseph O'Sullivan, Usages of Action Models in Learning
  • June 10: Peter Stone, The Need for Different Domain-Independent Heuristics
  • Thursday June 23, 1:30: Nicolas Fiechter (Pitt), PAC Analysis of Reinforcement Learning
  • July 6, Bao-Liang Lu (Inst of Physical and Chemical Research, Japan): A multi-sieving neural network architecture that decomposes learning tasks automatically
  • July 8: Mathias Heger, Risk-averse reinforcement learning
  • July 29: Sanjiv Singh, Learning to Predict Interaction Forces for an Excavating Robot
  • 1994-95

  • Tuesday August 30: Dieter Fox (U. of Bonn), An Approach to Obstacle Avoidance and High-Speed Navigation Based on Sonar Sensors
  • September 7: Sebastian Thrun, Learning to Recognize Objects (An Incredibly Preliminary Chat)
  • September 21: Geoff Gordon, A Tutorial Chat About Maximum Likelihood Estimation and the EM Algorithm
  • September 28: Kan Deng, Kd-trees for Efficient Regression
  • September 30: Steve Omohundro (ICSI), Model Merging for Learning and Recognition (AI/RI Seminar)
  • October 5: Andrew Moore, Response surface methods: an introduction, and preliminary discussion of the Auton project
  • October 19: Erik Ydstie (Dept of Chemical Engineering), Adaptive linear quadratic control using policy iteration
  • October 26: Rich Caruana, Sex Lies, and Gradient Descent (In a Multitask Backprop Net)
  • November 2: Scott Davies, NP-Completeness of Searches for Smallest Consistent Feature Sets
  • Friday November 4: Tony Stentz, The D* Algorithm for Real-Time Replanning (RI Seminar)
  • Tuesday November 8: Chris Atkeson, Learning Mappings vs. Learning Strategies in Robot Learning (AI Seminar)
  • November 9: Eating Pizza and Shooting the Breeze with Chris Atkeson
  • Nov 16: Shumeet Baluja, Using a saliency map for active spatial selective attention
  • Friday Nov 18, WeH 5409: Rich Sutton, On Reinforcement Learning and Function Approximation
  • Nov 23: Ping Zhang (University of Compiegne, France), Self-confidence increasing Q-learning
  • Tuesday Dec 6: Thomas Hoffman (University of Bonn), Pairwise Data Clustering and Multidimensional Scaling
  • Jan 18: Geoff Gordon, Stable function approximation in dynamic programming
  • Jan 25: Teddy Seidenfeld (CMU, Philosophy and Statistics), A representation of partially ordered preferences
  • Feb 3: Matthew McDonald (University of Western Australia), Learning useful approximate value functions
  • Feb 17: Robert Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting (theory seminar)
  • Mar 1: David Cohn (MIT B&CS), Active learning with statistical models
  • Mar 8: Rich Zemel, Learning to segment three-dimensional moving objects
  • Mar 15: John Lafferty, Gibbs-Markov Models
  • 1:30, Mar 31, WeH 5409: Sebastian Thrun, "Toward Lifelong Learning Robots"
  • Apr 5: Ken Lang, NewsWeeder: Learning to Filter Netnews
  • Apr 12: Peter Stone, Learning to play robotic soccer: the beginnings
  • Apr 19, Wean 4625: Dean Pomerleau, RALPH - Rapidly Adapting Lateral Position Handling System
  • Friday Apr 28: Stefan Schaal (Georgia Tech), A Constructive Learning Network Based on Nonparametric Regression: Receptive Field Locally Weighted Regression
  • May 3: Shumeet Baluja, Removing the Genetics from the Genetic Algorithm
  • 1995-96

  • Sep 27: Rich Caruana, What To Do When You're Feeling Out of Sorts: Using Rankpropagation To Sort Patients By Pneumonia Risk
  • Oct 4: Kan Deng, Logistic regression for classification -- Can it be an alternative to neural nets? (slides.ps.Z)
  • 3:30, Tues Oct 10, WeH 5409: Leslie Kaelbling, Planning Under Uncertainty: A Markov Decision Process Approach (AI seminar)
  • Nov 1: Rich Caruana, Using the Future to Help Predict the Present: Multitask Learning for Pneumonia Risk Prediction
  • Nov 8: Peter Stone, Using Testing to Iteratively Improve Training (slides.ps.Z)
  • Nov 15: Anthony Robins (U. Otago, New Zealand, and CMU Psych), Rehearsal and pseudorehearsal as solutions to the catastrophic forgetting problem
  • 3:30, Fri Nov 17, WeH 4623: Mike Kearns (AT&T Bell Labs), Decision tree learning algorithms are boosting algorithms
  • Nov 22: Sebastian Thrun, Task clustering and selective transfer of knowledge across multiple learning tasks: Thoughts and Results
  • Dec 6: Geoff Gordon, Rank-Based Tests (slides.ps.Z)
  • Jan 17: Barak Pearlmutter (Siemens), Playing the Matching-Shoulders Lob-Pass Game with Logarithmic Regret
  • Jan 31: Dave Redish, Using the EM algorithm to analyze search behavior in gerbils: A case study with very noisy data
  • Feb 7: Mance Harmon (Wright-Patterson AFB), Spurious Solutions to the Bellman Equation
  • Feb 14: Shumeet Baluja, Novel Machine Learning Tools for Computer Vision
  • Feb 28: Leemon Baird, Scaling Issues in Reinforcement Learning
  • Mar 13: Frank Dellaert, A Developmental Model for the Evolution of Complete Autonomous Agents (SAB paper.ps.Z)
  • Mar 20: Michael Littman, Algorithms for Sequential Decision Making (slides.ps)
  • Mar 27: no meeting (spring break)
  • Apr 3: Andrew McCallum, Addressing Selective Attention and Hidden State using Utile Distinctions in Feature-Space and History-Space
  • Apr 10: Fabio Cozman, Quasi-Bayesian Theory for Planning to Observe
  • Apr 17: Rich Caruana, Features that Work Better as Extra Outputs than as Extra Inputs
  • Apr 24: Michael Trick (GSIA), Linear Programming Methods for Stochastic Dynamic Programs
  • May 1: Peter Stone, Multi-agent Systems (slides.ps.Z)
  • May 8: Justin Boyan, A Machine Learning Architecture for Optimizing Web Search Engines (8-page paper)
  • May 29: Sven Koenig and Yuri Smirnov, Environment Learning with Performance Guarantees
  • Jun 19: Marek Druzdzel (Pitt), Causal Ordering and Causal Discovery
  • Jul 24: Gary Boone (Georgia Tech), Efficient Reinforcement Learning: Model-Based Acrobot Control (compressed postscript)
  • 1996-97

  • Sep 4: Wolfram Burgard, Position Estimation for Mobile Robots
  • Sep 11: Avrim Blum, Polynomial-time guarantees for the Perceptron Algorithm
  • Sep 18: Leemon Baird, Learning to emulate probability distributions
  • Sep 25: Frank Dellaert, Recognizing Emotion in Speech
  • Oct 2: Jeff Schneider, Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
  • Oct 9: Kan Deng, An Introduction to the Kalman Filter and its Application to Neural Net Training
  • Oct 16: Sridhar Mahadevan (Univ. of S. Florida), Improving the quality of industrial simulation using reinforcement learning
  • Oct 23: Sebastian Thrun, Landmark-based Localization
  • Oct 30: Peter Spirtes (CMU Philosophy), Automated Search for Bayesian Networks
  • Nov 6: Michael Nechyba, Cascade Learning with Extended Kalman Filtering
  • 3:30 Tue Nov 12: Tom Dietterich (Oregon State), A connectionist/symbolic hybrid architecture for real-time reinforcement learning (AI Seminar)
  • Nov 20: Mike Cox, Constructing a learning strategy under reasoning failure
  • Nov 27: Justin Boyan, Using prediction to improve combinatorial optimization search (paper)
  • Dec 4: Astro Teller, Putting Sub-solutions Back Together
  • Dec 11: Will Uther, Adversarial Reinforcement Learning
  • Dec 13: Bruce Digney (Defense Research Establishment, Canada), Learning Reactive Hierarchical Control Structures (FRC seminar)
  • Jan 22: Rich Caruana, MTL in kNN
  • 3:00pm, Jan 29, WeH 5409: Peter Stone, Layered Learning in Multiagent Systems (Thesis Proposal)
  • Feb 12: Geoff Gordon, Linear Programming, Lagrange Multipliers, and Duality
  • Feb 19: Hideki Asoh, Socially Embedded Learning of Office-Conversant Mobile Robot "Jijo-2"
  • Feb 26: Kan Deng, Learning to Recognize Time Series: Combining ARMA models with memory-based learning
  • Mar 5 and 12: no meeting scheduled
  • Mar 19: Mark Craven, [--]
  • Apr 2: Andrew Ng, Preventing "Overfitting" of Cross-Validation Data
  • 10:45am, Monday Apr 7, Wean 8220: Nick Szirbik (Technical University of Timisoara, Romania), Optimal NN Topology for Time Dependent Patterns
  • Apr 16: Natalie Japkowicz (Rutgers), A Novelty Detection Approach to Classification
  • Apr 30: Astro Teller
  • May 7: Geoff Gordon
  • Thursday May 15, Wean 1302: Jeremy Wyatt (Birmingham University), Evaluating Robot Learners
  • May 28: John Lafferty, Statistical Learning Algorithms Based on Bregman Distances
  • June 18: Jude Shavlik, A Large-Scale Test for Theory Refinement: The Wisconsin Adaptive Web Assistant
  • June 25: Phoebe Sengers, Learning for Believable Agents
  • July 16: Peter Stone, Task Decomposition and Dynamic Role Assignment for Real-Time Strategic Teamwork
  • July 30: Thorsten Joachims, Text Classification with Support Vector Machines
  • 1997-98

  • Friday September 12, Wean 4601, 2:30: Michael Littman (Duke University), Solving Propositional Decision Processes
  • September 24: Tom Dietterich (Oregon State University), Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
  • October 1, Doherty Hall 2300A: Manuela Veloso et.al., Demonstration of CMUnited robotic soccer team
  • October 8: John Langford, Research on and implications of Support Vector Machines
  • October 15: David Andre, Generalized Prioritzed Sweeping (paper.ps, slides.ps)
  • October 22: Justin Boyan, Latest Results with STAGE on Satisfiability Problems (slides.ps)
  • October 29: Sebastian Thrun, A Probabilistic Approach for Concurrent Map Acquisition and Localization for Mobile Robots
  • November 5: Andrew Moore, AutoRSM: Memory Based Active Learning for Noisy Optimization and Control
  • November 12: Mark Craven, First-Order Learning for Web Mining
  • November 19: Scott Davies, Probabilistic Modeling for Combinatorial Optimization
  • November 26: NO TALK SCHEDULED
  • December 3: Tucker Balch, Behavioral Diversity in Learning Robot Teams
  • December 10: Jonathan Baxter (Australian National University), KnightCap: a chess program that learns by combining TD(lambda) with minimax search
  • December 17: NO TALK SCHEDULED
  • Happy Holidays!
  • January 7: Chats Restart - NO TALK SCHEDULED
  • January 14: Frank Dellaert, Application of Memory Based Learning for Car Detection and Robust Car tracking using Kalman filters
  • January 21: No talk scheduled
  • January 28: Joseph O'Sullivan, Building a life-long learning agent
  • February 4: Oded Maron, Learning from Extremely Ambiguous Examples
  • February 18: Will Uther, Structural Generalisation with Decision Forests
  • February 25: Noone
  • March 4: Michael Bowling, Transferring learning from a simulator to real robots
  • March 11: Thomas Hofmann, Structuring Document Databases by Hierarchical Clustering and Abstraction
  • March 18: Bryan Singer, Learning State Features from Policies to Bias Exploration in Reinforcement Learning
  • March 25: Spring Break
  • April 1: Dieter Fox, Active Markov Localization for Mobile Robots
  • April 8: Astro Teller, Internal Reinforcement in a Connectionist Genetic Programming Approach
  • April 15: Geoff Gordon, A New Algorithm For Approximating Value Functions
  • June 23: Malcolm Ryan, RL-TOPs: A Hierarchial Task Decomposition Technique, for Faster Reinforcement Learning.
  • November 4: Oren Etizioni (University of Washington), From AI Methodology to Internet Startup: The Story of the Softbots Project.
  • November 11: ICML / COLT 98 Overview, Description
  • November 18: Avrim Blum, Combining Labeled and Unlabeled Data with Co-Training
  • December 2:Kan Deng, Memory-based Time Series Detection

  • Back to group home page