**This page is will be updated soon for Spring 2019**


16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning

Spring 2019

Instructor: Jean Oh (jeanoh@cmu.edu)
(It is important that you prefix the subject line with [16-785] when emailing the instructor.)
Location: Newell Simon Hall (NSH) 3002
Dates/Times: Monday & Wednesday, 09:00 - 10:20 AM
Office hours: Newell Simon Hall (NSH) 4521, Wed 3:00 - 5:00 PM

Course Description

This course covers the topics on building cognitive intelligence for robotic systems. Cognitive capabilities constitute high-level, humanlike intelligence that exhibits reasoning or problem solving skills. Such capabilities as semantic perception, language understanding, and task planning can be built on top of low-level robot autonomy that enables autonomous control of physical platforms. The topics generally bridge across multiple technical areas, for example, vision-language intersection and language-action/plan grounding. This course is composed of 50% lectures and 50% seminar classes. There are no explicit prerequisites for this class, but a general background knowledge in AI and machine learning is assumed.

Course Goals

In this course, we will strive to answer the following research questions and beyond towards the goal of developing cognitive capabilities on robots.
  • How can we make robots to perform tasks following natural language instructions?
  • How can we develop robots that can describe in natural language what they perceive through vision or explain what they are doing and why?
  • How can we fuse information coming in multiple modalities, e.g., language and images, to understand context-aware, semantic meanings of sensory data?
  • How do we measure the quality of information translated between different modalities, e.g., how do we measure the quality of language description given an image? What are the limitations and shortcomings of existing metrics?
  • How can we make use of semantic information digested from raw sensory inputs in the process of planning to solve a problem/task?
  • How do we measure the performance of computer vision algorithms outside benchmark datasets, e.g., on robots?
  • How should learned knowledge be stored? Do we need a universal representation for knowledge?
  • In what types of problems or conditions can deep learning achieve end-to-end learning?

Class Schedule

Please check this class website periodically for possible updates as the schedule may be subject to change.

A complete list of required reading can be found here.

DateTopic Reading homework
1/17 Course introduction Slides [pdf]
Homework: paper sign up on Canvas (Google doc).
Piazza has been set up.
1/22 Planning Slides [pdf] [ppt]
Paper sign up due.
1/24 Overview papers,
[LeCun et al., Nature'15]
[Kober et al., IJRR'13]
1/29 Reinforcement learning Part I: Dynamic Programming (DP) methods Slides [pdf]
1/31 Applications Afshaan's slides on image captioning & VQA [pdf]
Jean's slides on social navigation [pdf]
Social navigation [Kretzschmar et al., IJRR'16]
Image captioning [Karpathy et al., CVPR'15]
VQA [Antol et al., ICCV'15]
2/5 Reinforcement learning Part II: Monte Carlo methods, Temporal Difference learning Slides [pdf]
2/7 RL with human guidance [Griffith et al., NIPS'13]
[Knox & Stone AAMAS'12]
[Ling & Fidler NIPS'17]
2/12 Project proposal presentations;
Reinforcement learning Part III: Policy Gradient
Slides [pdf]
2/14 Inverse reinforcement learning [Ng & Russell ICML'00],
[Abbeel & Ng ICML'04]
[Coates et al., ICML'08]
2/19 Inverse reinforcement learning
Guest lecturer: Katharina Muelling
2/21 Deep learning basics (part 1) Slides [pdf]
2/26 AlexNet, Inception, & VGG AlexNet [Krizhevsky et al. NIPS'12],
Inception [Szegedy et al. CVPR'15],
VGG [Simonyan & Zisserman ICLR'15]
Extra: [Szegedy et al. AAAI'17]
2/28 Scheduling
Guest lecturer: Steve Smith
3/5 Deep learning basics (part 2) Slides [pdf]
3/7 Language understanding Slides [pdf]
3/12 Spring break
3/19 Convolutional Neural Networks (CNNs) for image understanding Slides [pdf]
3/21 Deep IRL,
Image segmentation (pixel-level classification)
[Wulfmeier et al. CoRR'15]
[Wulfmeier et al., IROS'16]
DenseNet [Huang et al. CVPR'17]
DeepLab [Chen et al., ArXiv'17]
U-Net [Ronnenberger et al., MICCAI'15]
3/26 Modeling sequences using RNN Slides [pdf]
3/28 Project midterm presentations
4/2 Image captioning (image to text) Slides [pdf]
4/4 word2vec, skip-thought vectors word2vec [Mikolov et al. NIPS'13]
Skip-thought vectors [Kiros et al., NIPS'15]
4/9 Image synthesis using GANs
(text to image, image to image)
Slides [pdf]
4/11 Natural language direction generation [MacMahon et al. AAAI'06]
[Mei et al., AAAI'16]
[Daniele et al., HRI'17]
4/16 Intelligence architecture Slides [pdf]
4/18 BLEU, METEOR, ROUGE, CIDEr BLEU [Papineni ACL'02]
CIDEr [Vedantam et al. CVPR'15]
MS COCO [Chen et al., ArXiv'15]
4/23 Evaluation design methodologies
Guest lecturer: Minkyung Lee
4/25 Blackbox vs. reasoning models [Levine, Finn et al., JMLR'16]
[Johnson et al. ICCV'17]
4/30 Final presentation Session 1
5/2 Final presentation Session 2 Additional papers [pdf]

Reading Material

There will be no designated textbooks for this course but recommended ones:

  • Deep learning - by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, The MIT Press ( available online ).
  • Reinforcement learning: an introduction - by Richard Sutton and Andrew Barto, The MIT Press ( available online ).
  • Planning algorithms - by Steven LaValle, Cambridge University Press ( available online )


Component Percentage

Homework 30
Class participation 30
Class project 40

Homework (30%)

There will be a weekly reading homework unless specified otherwise. For each week, there will be a list of reading material, e.g., either a book chapter or 2-3 technical papers. For each chapter or paper, write up to one-paragraph report for the summary and observations/thoughts/questions for class discussion. It is important to rephrase the text using your own words.

Although it is strongly recommended that the students do all of the reading homework, each student will get one free-pass, i.e., no homework for one week. If the student uses the free-pass, the student must attend the class that discusses the material.

Class Participation (30%)

There are two aspects in class participation. First, each student must take a lead role in a literature review class at least once. A student is expected to prepare a 20-minute presentation of 1-2 technical papers and lead the discussion. Second, students are encouraged for active participation in paper discussion. Examples of active participation include raising/answering insightful or clarification questions, or sharing additional literature review on related work.

Class Project (40%)

A team can have at most three students. In all reports, include the names and the Andrew email addresses of the project members. For each report, up to 2 extra pages in addition to the specified page limit are allowed to include references or graphics only.

Project proposal (10%, 5 pages limit)

A proposal must include:

  • Project title (Use self-explanatory titles)
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Project schedule (include time commitment of each member)
  • Expected outcomes
  • Team members, contact information, a bio sketch for each member describing one's technical background and intended contributions to the project.
  • 10-minute team presentation

Midterm report (10%, 5 pages limit)

A midterm report must include:

  • In-depth literature review on existing work
  • Detailed technical approach
  • Schedule update (what has/hasn't been completed)
  • Detailed plan for experiments or other evaluation methods
  • Preliminary results
  • 10-minute team presentation

Final presentation (10%, 20 minutes)

Final report (10%, 10 pages limit)

A final report must include:

  • Project title
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Evaluation plans, experimental settings
  • Results
  • Summary of technical contributions
  • Conclusion


Academic Integrity

We formally follow the guidelines in the CMU's academic integrity policy (http://www.cmu.edu/policies/student-and-student-life/academic-integrity.html).

Reasonable Person Principle (RPP)

We informally follow Reasonable Person Principle (RPP), a base culture of CMU's School of Computer Science, where everyone gives/gets the benefit of doubt for trying to be reasonable. The four rules of RPP are the following:

  • Everyone will be reasonable.
  • Everyone expects everyone else to be reasonable.
  • No one is special.
  • Do not be offended if someone suggests you are not being reasonable.

Extensions and Late Assignments

Each student will have up to 5 days of grace that can be used for any homework in whatever way without a penalty (Note that there will NOT be any extension for final project presentation and report). For example, you can use all of the 5 days for the first homework assignment, or split into 2 and 3 days to use for the first and the second assignments, respectively. After the 5 grace days have been used up, there will be no additional extensions; 50% will be deducted 1 day after a due date, and no points will be given after 2 days.