Log of Past CMU Reinforcement Learning Talks

RL talks and abstracts for 1992-93, 1993-94, 1994-95, 1995-96, 1996-97, 1997-98

1992-93

		      Carnegie Mellon University
		      School of Computer Science

	    REINFORCEMENT LEARNING SEMINAR SERIES 1992-93
		 Organizer:  Justin.Boyan@cs.cmu.edu


Sep 25	Sebastian Thrun
	Explanation-based Neural Networks for Robot Control

Oct  2  Long-Ji Lin
	Practice Thesis Defense

Oct  9  Scott Fahlman
	The Cascade algorithms for fast continuous function learning

Oct 23  Lonnie Chrisman
	Causal Differencing:  Explicit-Bias Q-Learning (EBQ)

Oct 30  Michael Littman
	A classification of reinforcement learning environments

Nov  6  Rich Caruana
	Multi-task learning:  thoughts and results

Nov 10  Gerry Tesauro (IBM)
	Practical Experiences in TD Learning

Nov 13  Jan Zytkow
	Creative response by combining simple experiences

Dec 15  Long-Ji Lin
	Thesis Defense

Jan 25  Michael Littman  (& Dave Ackley, video)
	Evolutionary Reinforcement Learning &
	Distributed Lamarckian Evolution

Feb  8  Erik Ydstie
	Inverse Adaptive Control Using Connectionist Networks

Feb 24  Andrew Moore
	Faculty candidate talk  -- Memory-Based Learning for Control

Mar 15  Avrim Blum
	Efficient path planning in unfamiliar geometric terrain

Mar 25  Mark Ring (Texas)
	Hierarchical Learning

Apr  9  Sebastian Thrun
	Problems with Function Approximation for Q-Learning

Apr 12  Joseph O'Sullivan
	Reinforcement Learning with Vision for the Xavier robot

Apr 19  Lonnie Chrisman
	Representing and Reasoning about Modeling Limitations

Apr 26	Sven Koenig
	Complexity Analysis of Reinforcement Learning

May 3   Gregory Karakoulas (National Research Council, Canada)
	Reinforcement Learning in Continuous State and Action Spaces

May 17  Geoff Gordon
	Continuous Q-functions are (sort of) PAC learnable

May 27  Justin Boyan
	A Distributed RL Scheme for Packet Routing

Jun 3	Ari Juels (Berkeley)
	Rethinking the Genetic Algorithm

Jun 17  Lonnie Chrisman and Michael Littman
	ML93 Preview:  RL in Environments with Hidden State

1993-94

Sep 20: Sebastian Thrun, Discovering skills in RL

Sep 27: Andrew Moore, Three new algorithms for fast, massive-scale cross-validation searches

Oct 4: Goang-Tay Hsu, Learning the backtracking policy for potential-guided path planning

Oct 11: Sebastian Thrun, Explanation-based neural network learning in chess

Oct 18: Rich Caruana presenting, Exploring the decision forest [Murphy & Pazzani (UCI)]

Oct 25: Andrew Moore, The Parti-Game algorithm for online variable resolution RL

Nov 1: Goang-Tay Hsu, Parameterization as path planning for redundant manipulators

Nov 12: Michael Littman (Brown), Learning probabilistic policies for game-playing and coping with hidden state

Nov 17: Marcos Salganicoff (U.Penn), A vision-based system for the learning of pushing manipulation

Nov 23: M. Sheppard (U. of Teesside, UK), The application of RL to the compensation of reactive power disturbances

Dec 8: Filippo Neri, Semi-automatic synthesis of a fuzzy logic controller

Dec 13: Sven Koenig, Planning for risk-sensitive agents

Jan 21: Astro Teller, The Evolution of Mental Models

Jan 27: Lonnie Chrisman, Reasoning about Actions at Multiple Levels of Granularity

Feb 3, 3:45, WeH 7500: Tom Mitchell, What drives learning - Current data or prior knowledge? (Distinguished Lecture)

Feb 10: Rich Caruana and Dayne Freitag, Greedy attribute selection

Feb 17: Discussion, What's our group all about?

Feb 24: Filippo Neri, Dealing with Huge Amounts of Data in an Object Reaching Task

Mar 3: Hank Wan, Spatial reasoning in rats: the sense of direction and place

Mar 10: Sebastian Thrun, Experiments in Robot Learning

Mar 17: Michael Littman, POMDP's: A framework for reinforcement learning with hidden state?

Theory Seminar, Friday Mar 25, 2:30, WeH 7220: B. K. Natarajan (HP Labs), Learning Functions via Occam's Razor with Application to Filtering

Mar 31: Justin Boyan, How to Approximate a Value Function

Friday April 8, 2:00, WeH 7220: Satinder Singh (MIT), Monte Carlo Learning in Environments with Hidden State

Friday April 15, 2:00, WeH 7220: Andrew McCallum (Rochester), Instance-Based State Identification

Friday April 22, 3:30, Baker Hall Adamson wing: Maja Mataric (MIT), Group Behavior and Learning in Autonomous Systems

April 28: Astro Teller, Parallel Algorithm Discovery and Orchestration

Friday April 29, 3:30, Baker Hall: Andrew Moore, Learning control with kernel-based function approximators and intense cross-validation

May 5: Geoff Gordon, Hierarchical Mixtures for Non-Experts

May 16, WeH 4623, 10:30: Lynn Stein (MIT AI Lab), Towards a Cognitive Robotics (AI Seminar)

Friday May 20, WeH 7220, 1:30: Rudolf Bauer (Siemens), Robust Obstacle Avoidance in Unknown and Cramped Environments

June 3: Joseph O'Sullivan, Usages of Action Models in Learning

June 10: Peter Stone, The Need for Different Domain-Independent Heuristics

Thursday June 23, 1:30: Nicolas Fiechter (Pitt), PAC Analysis of Reinforcement Learning

July 6, Bao-Liang Lu (Inst of Physical and Chemical Research, Japan): A multi-sieving neural network architecture that decomposes learning tasks automatically

July 8: Mathias Heger, Risk-averse reinforcement learning

July 29: Sanjiv Singh, Learning to Predict Interaction Forces for an Excavating Robot

1994-95

Tuesday August 30: Dieter Fox (U. of Bonn), An Approach to Obstacle Avoidance and High-Speed Navigation Based on Sonar Sensors

September 7: Sebastian Thrun, Learning to Recognize Objects (An Incredibly Preliminary Chat)

September 21: Geoff Gordon, A Tutorial Chat About Maximum Likelihood Estimation and the EM Algorithm

September 28: Kan Deng, Kd-trees for Efficient Regression

September 30: Steve Omohundro (ICSI), Model Merging for Learning and Recognition (AI/RI Seminar)

October 5: Andrew Moore, Response surface methods: an introduction, and preliminary discussion of the Auton project

October 19: Erik Ydstie (Dept of Chemical Engineering), Adaptive linear quadratic control using policy iteration

October 26: Rich Caruana, Sex Lies, and Gradient Descent (In a Multitask Backprop Net)

November 2: Scott Davies, NP-Completeness of Searches for Smallest Consistent Feature Sets

Friday November 4: Tony Stentz, The D* Algorithm for Real-Time Replanning (RI Seminar)

Tuesday November 8: Chris Atkeson, Learning Mappings vs. Learning Strategies in Robot Learning (AI Seminar)

November 9: Eating Pizza and Shooting the Breeze with Chris Atkeson

Nov 16: Shumeet Baluja, Using a saliency map for active spatial selective attention

Friday Nov 18, WeH 5409: Rich Sutton, On Reinforcement Learning and Function Approximation

Nov 23: Ping Zhang (University of Compiegne, France), Self-confidence increasing Q-learning

Tuesday Dec 6: Thomas Hoffman (University of Bonn), Pairwise Data Clustering and Multidimensional Scaling

Jan 18: Geoff Gordon, Stable function approximation in dynamic programming

Jan 25: Teddy Seidenfeld (CMU, Philosophy and Statistics), A representation of partially ordered preferences

Feb 3: Matthew McDonald (University of Western Australia), Learning useful approximate value functions

Feb 17: Robert Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting (theory seminar)

Mar 1: David Cohn (MIT B&CS), Active learning with statistical models

Mar 8: Rich Zemel, Learning to segment three-dimensional moving objects

Mar 15: John Lafferty, Gibbs-Markov Models

1:30, Mar 31, WeH 5409: Sebastian Thrun, "Toward Lifelong Learning Robots"

Apr 5: Ken Lang, NewsWeeder: Learning to Filter Netnews

Apr 12: Peter Stone, Learning to play robotic soccer: the beginnings

Apr 19, Wean 4625: Dean Pomerleau, RALPH - Rapidly Adapting Lateral Position Handling System

Friday Apr 28: Stefan Schaal (Georgia Tech), A Constructive Learning Network Based on Nonparametric Regression: Receptive Field Locally Weighted Regression

May 3: Shumeet Baluja, Removing the Genetics from the Genetic Algorithm

1995-96

Sep 27: Rich Caruana, What To Do When You're Feeling Out of Sorts: Using Rankpropagation To Sort Patients By Pneumonia Risk

Oct 4: Kan Deng, Logistic regression for classification -- Can it be an alternative to neural nets? (slides.ps.Z)

3:30, Tues Oct 10, WeH 5409: Leslie Kaelbling, Planning Under Uncertainty: A Markov Decision Process Approach (AI seminar)

Nov 1: Rich Caruana, Using the Future to Help Predict the Present: Multitask Learning for Pneumonia Risk Prediction

Nov 8: Peter Stone, Using Testing to Iteratively Improve Training (slides.ps.Z)

Nov 15: Anthony Robins (U. Otago, New Zealand, and CMU Psych), Rehearsal and pseudorehearsal as solutions to the catastrophic forgetting problem

3:30, Fri Nov 17, WeH 4623: Mike Kearns (AT&T Bell Labs), Decision tree learning algorithms are boosting algorithms

Nov 22: Sebastian Thrun, Task clustering and selective transfer of knowledge across multiple learning tasks: Thoughts and Results

Dec 6: Geoff Gordon, Rank-Based Tests (slides.ps.Z)

Jan 17: Barak Pearlmutter (Siemens), Playing the Matching-Shoulders Lob-Pass Game with Logarithmic Regret

Jan 31: Dave Redish, Using the EM algorithm to analyze search behavior in gerbils: A case study with very noisy data

Feb 7: Mance Harmon (Wright-Patterson AFB), Spurious Solutions to the Bellman Equation

Feb 14: Shumeet Baluja, Novel Machine Learning Tools for Computer Vision

Feb 28: Leemon Baird, Scaling Issues in Reinforcement Learning

Mar 13: Frank Dellaert, A Developmental Model for the Evolution of Complete Autonomous Agents (SAB paper.ps.Z)

Mar 20: Michael Littman, Algorithms for Sequential Decision Making (slides.ps)

Mar 27: no meeting (spring break)

Apr 3: Andrew McCallum, Addressing Selective Attention and Hidden State using Utile Distinctions in Feature-Space and History-Space

Apr 10: Fabio Cozman, Quasi-Bayesian Theory for Planning to Observe

Apr 17: Rich Caruana, Features that Work Better as Extra Outputs than as Extra Inputs

Apr 24: Michael Trick (GSIA), Linear Programming Methods for Stochastic Dynamic Programs

May 1: Peter Stone, Multi-agent Systems (slides.ps.Z)

May 8: Justin Boyan, A Machine Learning Architecture for Optimizing Web Search Engines (8-page paper)

May 29: Sven Koenig and Yuri Smirnov, Environment Learning with Performance Guarantees

Jun 19: Marek Druzdzel (Pitt), Causal Ordering and Causal Discovery

Jul 24: Gary Boone (Georgia Tech), Efficient Reinforcement Learning: Model-Based Acrobot Control (compressed postscript)

1996-97

Sep 4: Wolfram Burgard, Position Estimation for Mobile Robots

Sep 11: Avrim Blum, Polynomial-time guarantees for the Perceptron Algorithm

Sep 18: Leemon Baird, Learning to emulate probability distributions

Sep 25: Frank Dellaert, Recognizing Emotion in Speech

Oct 2: Jeff Schneider, Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning

Oct 9: Kan Deng, An Introduction to the Kalman Filter and its Application to Neural Net Training

Oct 16: Sridhar Mahadevan (Univ. of S. Florida), Improving the quality of industrial simulation using reinforcement learning

Oct 23: Sebastian Thrun, Landmark-based Localization

Oct 30: Peter Spirtes (CMU Philosophy), Automated Search for Bayesian Networks

Nov 6: Michael Nechyba, Cascade Learning with Extended Kalman Filtering

3:30 Tue Nov 12: Tom Dietterich (Oregon State), A connectionist/symbolic hybrid architecture for real-time reinforcement learning (AI Seminar)

Nov 20: Mike Cox, Constructing a learning strategy under reasoning failure

Nov 27: Justin Boyan, Using prediction to improve combinatorial optimization search (paper)

Dec 4: Astro Teller, Putting Sub-solutions Back Together

Dec 11: Will Uther, Adversarial Reinforcement Learning

Dec 13: Bruce Digney (Defense Research Establishment, Canada), Learning Reactive Hierarchical Control Structures (FRC seminar)

Jan 22: Rich Caruana, MTL in kNN

3:00pm, Jan 29, WeH 5409: Peter Stone, Layered Learning in Multiagent Systems (Thesis Proposal)

Feb 12: Geoff Gordon, Linear Programming, Lagrange Multipliers, and Duality

Feb 19: Hideki Asoh, Socially Embedded Learning of Office-Conversant Mobile Robot "Jijo-2"

Feb 26: Kan Deng, Learning to Recognize Time Series: Combining ARMA models with memory-based learning

Mar 5 and 12: no meeting scheduled

Mar 19: Mark Craven, [--]

Apr 2: Andrew Ng, Preventing "Overfitting" of Cross-Validation Data

10:45am, Monday Apr 7, Wean 8220: Nick Szirbik (Technical University of Timisoara, Romania), Optimal NN Topology for Time Dependent Patterns

Apr 16: Natalie Japkowicz (Rutgers), A Novelty Detection Approach to Classification

Apr 30: Astro Teller

May 7: Geoff Gordon

Thursday May 15, Wean 1302: Jeremy Wyatt (Birmingham University), Evaluating Robot Learners

May 28: John Lafferty, Statistical Learning Algorithms Based on Bregman Distances

June 18: Jude Shavlik, A Large-Scale Test for Theory Refinement: The Wisconsin Adaptive Web Assistant

June 25: Phoebe Sengers, Learning for Believable Agents

July 16: Peter Stone, Task Decomposition and Dynamic Role Assignment for Real-Time Strategic Teamwork

July 30: Thorsten Joachims, Text Classification with Support Vector Machines

1997-98

Friday September 12, Wean 4601, 2:30: Michael Littman (Duke University), Solving Propositional Decision Processes

September 24: Tom Dietterich (Oregon State University), Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

October 1, Doherty Hall 2300A: Manuela Veloso et.al., Demonstration of CMUnited robotic soccer team

October 8: John Langford, Research on and implications of Support Vector Machines

October 15: David Andre, Generalized Prioritzed Sweeping (paper.ps, slides.ps)

October 22: Justin Boyan, Latest Results with STAGE on Satisfiability Problems (slides.ps)

October 29: Sebastian Thrun, A Probabilistic Approach for Concurrent Map Acquisition and Localization for Mobile Robots

November 5: Andrew Moore, AutoRSM: Memory Based Active Learning for Noisy Optimization and Control

November 12: Mark Craven, First-Order Learning for Web Mining

November 19: Scott Davies, Probabilistic Modeling for Combinatorial Optimization

November 26: NO TALK SCHEDULED

December 3: Tucker Balch, Behavioral Diversity in Learning Robot Teams

December 10: Jonathan Baxter (Australian National University), KnightCap: a chess program that learns by combining TD(lambda) with minimax search

December 17: NO TALK SCHEDULED

Happy Holidays!

January 7: Chats Restart - NO TALK SCHEDULED

January 14: Frank Dellaert, Application of Memory Based Learning for Car Detection and Robust Car tracking using Kalman filters

January 21: No talk scheduled

January 28: Joseph O'Sullivan, Building a life-long learning agent

February 4: Oded Maron, Learning from Extremely Ambiguous Examples

February 18: Will Uther, Structural Generalisation with Decision Forests

February 25: Noone

March 4: Michael Bowling, Transferring learning from a simulator to real robots

March 11: Thomas Hofmann, Structuring Document Databases by Hierarchical Clustering and Abstraction

March 18: Bryan Singer, Learning State Features from Policies to Bias Exploration in Reinforcement Learning

March 25: Spring Break

April 1: Dieter Fox, Active Markov Localization for Mobile Robots

April 8: Astro Teller, Internal Reinforcement in a Connectionist Genetic Programming Approach

April 15: Geoff Gordon, A New Algorithm For Approximating Value Functions

June 23: Malcolm Ryan, RL-TOPs: A Hierarchial Task Decomposition Technique, for Faster Reinforcement Learning.

November 4: Oren Etizioni (University of Washington), From AI Methodology to Internet Startup: The Story of the Softbots Project.

November 11: ICML / COLT 98 Overview, Description

November 18: Avrim Blum, Combining Labeled and Unlabeled Data with Co-Training

December 2:Kan Deng, Memory-based Time Series Detection

Back to group home page