+ 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning

Spring 2021

Instructor: Jean Oh (jeanoh@cmu.edu)
(Please prefix the subject line with [16-785].)
Location: Zoom meeting link
Dates/Times: Tuesday & Thursday, 4:00 - 5:20 PM (Eastern Time)
Office hours: Zoom meeting link, Wed 5:00 - 7:00 PM

[Class Home] [Course Syllabus]

NOTE: This course will be fully virtual. Since the course is discussion intensive synchronous attendance is required.

Course Description

This course covers the topics on building cognitive intelligence for robotic systems. Cognitive capabilities constitute high-level, humanlike intelligence that exhibits reasoning or problem-solving skills. Such capabilities as semantic perception, language understanding, and task planning can be built on top of low-level robot autonomy that enables autonomous control of physical platforms. The topics generally bridge across multiple technical areas, for example, vision-language intersection and language-action/plan grounding.

This course is composed of 50% lectures and 50% seminar classes. Since this is a project-oriented course, we will put a special emphasis on learning research skills, e.g., problem formulation, literature review, ideation, evaluation planning, results analysis, and hypothesis verification.

Prerequisites: There are no explicit prerequisites for this class, but a general background knowledge in AI and machine learning is assumed.

Course Goals

In this course, we will strive to answer the following research questions and beyond towards the goal of developing cognitive capabilities on robots.
  • How can we make robots to perform tasks following natural language instructions?
  • How can we develop robots that can describe in natural language what they perceive through vision or explain what they are doing and why?
  • How can we fuse information coming in multiple modalities, e.g., language and images, to understand context-aware, semantic meanings of sensory data?
  • How do we measure the quality of information translated between different modalities, e.g., how do we measure the quality of language description given an image? What are the limitations and shortcomings of existing metrics?
  • How can we make use of semantic information digested from raw sensory inputs in the process of planning to solve a problem/task?
  • How do we measure the performance of computer vision algorithms outside benchmark datasets, e.g., on robots?
  • How should learned knowledge be stored? Do we need a universal representation for knowledge?
  • How can we make robots learn to improve over time, e.g., by learning new skills?
Using these research questions, we will learn to follow basic steps of conducting research through class projects.

Class Schedule

Please check Canvas for the current schedule.





(23:59 ET)


Course introduction

1: Questionnaire

Fri, 2/5


Seminar overview

2: Project team

Mon, 2/15


Planning and learning -- Part I + project brainstorming


Planning and learning -- Part II + project brainstorming

 3: Paper reading

2/16, 2/18


 Paper discussion + project teams

Robot learning & planning


Paper discussion + project brainstorming

Reinforcement learning, imitation learning


No Class


Project proposal presentations

Written proposals


Creativity in AI & Robotics

(Guest lecturer: Robert Twomey, Carson Center for Emerging Media Arts, UNL)


Paper discussion

Creativity in AI & robotics


Vision Language Navigation


Paper discussion

Vision-Language Planning


Exploration (Guest lecturer: Ji Zhang, RI)


Paper discussion

Autoencoders for robotics


Social Navigation


Paper discussion

Social navigation


Experiment design (Guest lecturer: Liz Carter, RI)


Project midterm presentations

Project midterm reports


Cognitive Architecture (Guest lecturer: Christian Lebiere, Psychology)


Paper discussion



Understanding context


No Class

Context representation, embeddings (recommended reading)


Evaluation methodologies


Paper discussion

Bias, AI & ethics


Science of integrated intelligence


Paper discussion

 Intelligent robots (bluesky papers)


Final project discussions


Final project presentations

Project final reports

Reading Material

There will be no designated textbooks for this course but recommended ones:

  • Planning algorithms - by Steven LaValle, Cambridge University Press ( available online )
  • Reinforcement learning: an introduction - by Richard Sutton and Andrew Barto, The MIT Press ( available online ).
  • Deep learning - by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, The MIT Press ( available online ).


Component Percentage

Reading homework20
Paper presentations20
Class participation 20
Class project 40

Reading homework (20%)

There will be 9 reading homeworks (+1 recommended reading). For each seminar class, there will be a list of reading material, e.g., either a book chapter or 1-3 technical papers. For each chapter or paper, write 1-paragraph report on how and why you would modify the idea presented in the paper. Include other observations/thoughts/questions for class discussion.

Although it is strongly recommended that the students do all of the reading homework, each student will get one free-pass, i.e., no homework for one week.

Paper presentations (20%)

Each student must take a lead role in a literature review class at least once. A student is expected to prepare a 20-minute presentation of 1-2 technical papers and lead the discussion.

Class Participation (20%)

Students are strongly encouraged for active participation in class discussion. Examples of active participation in a paper discussion include raising/answering insightful or clarification questions, or sharing additional literature review on related work.

Class Project (40%)

A team can have at most 3 students. In all reports, include the names and the Andrew email addresses of the project members. For each report, up to 2 extra pages in addition to the specified page limit are allowed to include references or graphics only.

Sample projects

Weekly project report (5%)

Each team is required to submit a written report on the project progress (1 report per team). The expected length of a report is a half page either in a paragraph or a list of bullets. Extra pages are allowed for figures and references.

Project proposal (5%, 5 pages limit)

A proposal must include:

  • Project title (Use self-explanatory titles)
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Project schedule (include time commitment of each member)
  • Expected outcomes
  • Team members, contact information, a bio sketch for each member describing one's technical background and intended contributions to the project.
  • 10-minute team presentation

Midterm report (10%, 5 pages limit)

A midterm report must include:

  • In-depth literature review on existing work
  • Detailed technical approach
  • Schedule update (what has/hasn't been completed)
  • Detailed plan for experiments or other evaluation methods
  • Preliminary results
  • 10-minute team presentation

Final presentation (10%, 20 minutes)

Final report (10%, 10 pages limit)

A final report must include:

  • Project title
  • Problem definition
  • Technical challenges
  • Proposed approach
  • Evaluation plans, experimental settings
  • Results
  • Summary of technical contributions
  • Conclusion


Academic Integrity

We formally follow the guidelines in the CMU's academic integrity policy

Reasonable Person Principle (RPP)

We informally follow Reasonable Person Principle (RPP), a base culture of CMU's School of Computer Science, where everyone gives/gets the benefit of doubt for trying to be reasonable. The four rules of RPP are the following:

  • Everyone will be reasonable.
  • Everyone expects everyone else to be reasonable.
  • No one is special.
  • Do not be offended if someone suggests you are not being reasonable.

Extensions and Late Assignments

Each student will have up to 5 days of grace that can be used for any homework in whatever way without a penalty (Note that there will NOT be any extension for final project presentation and report). For example, you can use all of the 5 days for the first homework assignment, or split into 2 and 3 days to use for the first and the second assignments, respectively. After the 5 grace days have been used up, there will be no additional extensions; 50% will be deducted 1 day after a due date, and no points will be given after 2 days.