+ 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning

Spring 2019

Instructor: Jean Oh (jeanoh@cmu.edu)
(It is important that you prefix the subject line with [16-785] when emailing the instructor.)
TA: Allan Wang (allanwan at andrew)
Location: Newell Simon Hall (NSH) 3002
Dates/Times: Monday & Wednesday, 09:00 - 10:20 AM
Office hours: Allan: NSH 4506, Mon/Wed 10:30 - 11:30 AM
Jean: NSH 4521, Wed 3:00 - 5:00 PM

Course Description

This course covers the topics on building cognitive intelligence for robotic systems. Cognitive capabilities constitute high-level, humanlike intelligence that exhibits reasoning or problem solving skills. Such capabilities as semantic perception, language understanding, and task planning can be built on top of low-level robot autonomy that enables autonomous control of physical platforms. The topics generally bridge across multiple technical areas, for example, vision-language intersection and language-action/plan grounding. This course is composed of 50% lectures and 50% seminar classes. There are no explicit prerequisites for this class, but a general background knowledge in AI and machine learning is assumed.

Course Goals

In this course, we will strive to answer the following research questions and beyond towards the goal of developing cognitive capabilities on robots.
  • How can we make robots to perform tasks following natural language instructions?
  • How can we develop robots that can describe in natural language what they perceive through vision or explain what they are doing and why?
  • How can we fuse information coming in multiple modalities, e.g., language and images, to understand context-aware, semantic meanings of sensory data?
  • How do we measure the quality of information translated between different modalities, e.g., how do we measure the quality of language description given an image? What are the limitations and shortcomings of existing metrics?
  • How can we make use of semantic information digested from raw sensory inputs in the process of planning to solve a problem/task?
  • How do we measure the performance of computer vision algorithms outside benchmark datasets, e.g., on robots?
  • How should learned knowledge be stored? Do we need a universal representation for knowledge?
  • What types of problems or conditions is end-to-end learning suitable for?

Class Schedule

Please check this class website periodically for possible updates as the schedule may be subject to change.

* Guest lecture schedules are tentative and can change to different dates.

Date Topic Reading homework Due
1/14 Course introduction Slides [pdf]
Homework 1: task analysis (due 1/23)
1/16 Planning and learning -- Part I Slides [pdf]
1/21 No class (MLK day)
1/23 Planning and learning -- Part II Slides [pdf]
Homework 2: requirements for datasets/tasks (Spreadsheet) (due 1/30)
Homework 1 due
1/28 Reinforcement learning -- Part I Slides [pdf]
1/30 Paper discussion.
Pick one paper from each of the two topics: datasets and biases.
Datasets: [MS COCO]
[Visual Genome]
[Talk the Walk]
Homework 2 due
1/30 Project team formation
2/4 Reinforcement learning -- Part II Slides [pdf]
2/6 Project proposal presentations Written proposals
2/11 Deep learning basics Slides [pdf]
2/13 Paper discussion Evaluating datasets (MS COCO, VisualGenome, CLEVR, Talk the Walk)
[Visual Genome]
[Talk the Walk]
2/18 Inverse reinforcement learning Slides [pdf]
2/20 Deep learning tutorial (Allan Wang) Slides [pdf]
2/25 Language to plan translation Slides [pdf]
2/27 Paper discussion
[Feature-based prediction (Kuderer12)]
[Guided cost learning (Finn16)]
[Deep IRL (Wulfmeier15)]
(Optional: [Wulfmeier16])
3/4 Conversational systems (Guest lecturer*: Alex Rudnicky, LTI)
3/6 Multimodal approaches for affective computing (Guest lecturer*: Zakia Hammal, RI)
3/11 Spring break
3/18 Natural language understanding (Guest lecturer*: Florian Metze, LTI) Slides [pdf] Call for Papers: ICML 2019 Workshop on How2
3/20 Paper discussion [Coreferencing (Kong14)]
[DenseNet (Huang17]]
3/25 Paper discussion [Language grounding (Paul16)]
[Text style transfer (Shen17)]
3/27 Project midterm presentations Project midterm reports
4/1 Image to text translation Slides [pdf]
4/3 Paper discussion [Neural Module Networks (Andreas17)]
[Compositional Modular Networks (Hu16)]
4/8 Art and AI using generative models (Guest lecturer*: Eunsu Kang, MLD)
4/10 Paper discussion [Progressive GAN (Karras18)]
[Conditional GAN (Wang18)]
4/15 AI and Ethics (Guest lecturer*: Michael Skirpan, CMU) Mike's slides on data bias
Homework 3: Bias and Class Project
4/17 Paper discussion (Biases): The first two papers on the list will be presented. Please feel free to review any two and participate in the class discussion. [Zhao et al., 2017*]
[Burns et al., 2018*]
[Antol et al., 2015]
[Ramakrishnan et al., 2018]
[Torralba et al., 2011]
4/22 Science of integrated intelligence
4/24 Paper discussion [Limits and potentials of deep learning (Sunderhauf18) ]
[Connection between GAN, IRL, and EBM (Finn et al. 2016)]
4/29 Final project presentations
5/1 Project final reports

Reading Material

There will be no designated textbooks for this course but recommended ones:

  • Planning algorithms - by Steven LaValle, Cambridge University Press ( available online )
  • Reinforcement learning: an introduction - by Richard Sutton and Andrew Barto, The MIT Press ( available online ).
  • Deep learning - by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, The MIT Press ( available online ).


Component Percentage

Reading homework30
Paper presentations15
Review homework 10
Class participation 5
Class project 40

Reading homework (30%)

There will be 10 reading homework. For each seminar class, there will be a list of reading material, e.g., either a book chapter or 1-3 technical papers. For each chapter or paper, write up to one-paragraph report on how and why you would modify the idea presented in the paper. Include other observations/thoughts/questions for class discussion.

Although it is strongly recommended that the students do all of the reading homework, each student will get one free-pass, i.e., no homework for one week. If the student uses the free-pass, the student must attend the class that discusses the material.

Paper presentations (15%)

Each student must take a lead role in a literature review class at least once. A student is expected to prepare a 20-minute presentation of 1-2 technical papers and lead the discussion.

Review homework (10%)

There will be 5 homework assignments. These assignments will help the students review the material discussed during the class.

Class Participation (5%)

Students are encouraged for active participation in paper discussion. Examples of active participation include raising/answering insightful or clarification questions, or sharing additional literature review on related work.

Class Project (40%)

A team can have at most 3 students. In all reports, include the names and the Andrew email addresses of the project members. For each report, up to 2 extra pages in addition to the specified page limit are allowed to include references or graphics only.

Sample projects (from Independent Study)

Weekly project report (5%)

Each team is required to submit a written report on the project progress (1 report per team). The expected length of a report is a half page either in a paragraph or a list of bullets. Extra pages are allowed for figures and references.

Project proposal (5%, 5 pages limit)

A proposal must include:

  • Project title (Use self-explanatory titles)
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Project schedule (include time commitment of each member)
  • Expected outcomes
  • Team members, contact information, a bio sketch for each member describing one's technical background and intended contributions to the project.
  • 10-minute team presentation

Midterm report (10%, 5 pages limit)

A midterm report must include:

  • In-depth literature review on existing work
  • Detailed technical approach
  • Schedule update (what has/hasn't been completed)
  • Detailed plan for experiments or other evaluation methods
  • Preliminary results
  • 10-minute team presentation

Final presentation (10%, 20 minutes)

Final report (10%, 10 pages limit)

A final report must include:

  • Project title
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Evaluation plans, experimental settings
  • Results
  • Summary of technical contributions
  • Conclusion


Academic Integrity

We formally follow the guidelines in the CMU's academic integrity policy (http://www.cmu.edu/policies/student-and-student-life/academic-integrity.html).

Reasonable Person Principle (RPP)

We informally follow Reasonable Person Principle (RPP), a base culture of CMU's School of Computer Science, where everyone gives/gets the benefit of doubt for trying to be reasonable. The four rules of RPP are the following:

  • Everyone will be reasonable.
  • Everyone expects everyone else to be reasonable.
  • No one is special.
  • Do not be offended if someone suggests you are not being reasonable.

Extensions and Late Assignments

Each student will have up to 5 days of grace that can be used for any homework in whatever way without a penalty (Note that there will NOT be any extension for final project presentation and report). For example, you can use all of the 5 days for the first homework assignment, or split into 2 and 3 days to use for the first and the second assignments, respectively. After the 5 grace days have been used up, there will be no additional extensions; 50% will be deducted 1 day after a due date, and no points will be given after 2 days.