LSTM and its variants for visual recognition, VALSE 2016 [Intro][slides]

Abstract: LSTM Recurrent networks have been first introduced to address the sequential prediction tasks, and then extended to multidimensional image processing tasks such as image generation, object detection, object and scene parsing. It has achieved a big breakthrough on solving kinds of visual recognition tasks benefiting from the long-range memorization of LSTM networks. In this tutorial, I will first introduce the LSTM networks and its variants, and explain their interesting and powerful characteristics in sequential tasks and image processing. Especially, I will focus on explaining why LSTM networks can effectively boost the hierarchical feature representations for RGB and depth images, which is naturally complementary to CNNs. Then I will mainly overview the techniques of extending LSTM networks to kinds of concrete visual recognition tasks.

Semantic Object Parsing with Graph LSTM, ECCV 2016
Deep Structured Scene Parsing by Learning with Image Descriptions, CVPR 2016
Semantic Object Parsing with Local-Global Long Short-Term Memor, CVPR 2016
Human Parsing With Contextualized Convolutional Neural Network, ICCV 2015 [videos]
Deep Contextual Modeling for Visual Recognition, Invited talk at NanJing University of Science & Technology, 2016
Deep structured model for fine-grained image parsing, Invited talk at Institute of Information Engineering, CAS, 2016