Language Grounding to Vision and Control
Fall 2017, CMU 10-808

Instructor: Katerina Fragkiadaki
Lectures: T, 9:00-12pm, 5222 Gates and Hillman Centers (GHC)
Office Hours: Tuesday 3:00-4:00pm, 8015 GHC

Class goals
Schedule
Resources
Grading
Prerequisites

Class goals

This is a seminar course that will visit recent progress on the problem of language acquisition through pairing of multiple modalities (vision, haptics, audio etc), as well as active interaction with the world. The central questions/topics we will visit are:

How can language help accelerate learning of an autonomous agent (if at all)
How humans acquire language and why?
Inductive biases for strong generalization
Architectures for agent capable of compositional grounding of language
State representation of video visual scenes and imaginations from story reading
Language for high level planning and control.
Neural-symbolic architectures for hierarchical symbolic grounding

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Lecture notes will be added as lectures progress.

Date	Topic	Readings	Presenters (Slides)
8/29	The Grounding Problem, Learning from Data vs. Programming with Language, Explanation-Based Learning, Course Overview	[1-6]	Katerina (Intro.pdf)
9/5	Grounding language on programs(I): Executable Semantic Parsing	[16-19]	Katerina, Tejas, Sarah (18.pdf, 16.pdf, 17.pdf)
9/12	Compositionally of Meaning and Recursive networks, Pointer Networks	[20-24, 53, 69]	Katerina, Ricson (RNNsPNs.pdf, 20.pdf)
9/19	Grounding Language on Visual Concepts (I)	[43-47, 70-72]	Nikhil, Rishub, Ben, Katerina (43.pdf, 44.pdf, 70.pdf, 71.pdf)
9/26	Grounding Language on Visual Programs (II)	[35-37, 74-75]	Sumedha, Siliang, Sarah, Arjun (37.pdf, 75.pdf)
10/3	Grounding Language through Multi-Agent Collaboration	[48-51]	Manasi, Arjun, Ricson, Varun, Siliang, Deepika (49.pdf, 44.pdf)
10/10	Language and Memory State Representations: Architectures that Keep Track of State	[26-29, 73]	Mohit, Tanmay, Ricson, Varun, Siliang (27.pdf, 28.pdf, 29.pdf)
10/17	Neural-Symbolic Architecture, Rule-based NN	[7-8, 76-81]	Rishub, Varun, Ricson, Sumedha, Ben, Sarah (7.pdf, 78.pdf, 81.pdf)
10/24	Learning Theorem Proving, Neural-Symbolic Architectures, Rule-based NN	[11-14]	Ricson, Varun, Rishub, Nikhil (13.pdf)
10/31	Grounding Language on Programs: Program Induction	[30-34]	Mohit, Ben, Varun, Ricson, Sumedha (31.pdf, 32.pdf)
11/7	Conversational Agents	[88-91]	Ricson, Varun, Sarah, Rhea, Arjun, Deepika, Siliang (89.pdf, 90.pdf)
11/14	Grounding Language to Robotic Programs (I)	[57-59, 67-68]	Siliang, Deepika, Rishub, Ricson (57.pdf, 58.pdf, 59.pdf)
11/21	Grounding Language to Robotic Programs (II)	[60-65]	Ricson, Deepika, Siliang, Rhea, Sarah (60.pdf, 63.pdf, 64.pdf)
11/28	Weakly Supervised Semantic Parsing	[82-87]	Mohit, Sumedha, Arjun, Rhea, Ben, Deepika (85.pdf, 87.pdf)
12/5	Future of Language Grounding		Katerina

Resources

Readings

Grading

The grade is determined by a paper presentation you need to do, your participation in class (asking good questions, making connections between topics etc.) as well as a final project. The final project can be a small innovation on top of methods and algorithms presented in the course, or your own project idea on topics covered in the course. The course grade is a weighted average of your participation in class (30%), your paper presentation (30%), and your final project (40%).

Prerequisites

This course assumes familiarity with Computer Vision, basic NLP concepts, machine learning, deep learning.

Web design: Anton Badev

Language Grounding to Vision and Control Fall 2017, CMU 10-808