Resources

Selected Projects




  • Semantic Object Parsing with Graph LSTM [PDF]

    We propose a novel Graph LSTM model that extends the traditional LSTMs from sequential and multi-dimensional data to general graph-structured data. Instead of evenly and fixedly dividing an image into pixels or patches as previous LSTMs did, Graph LSTM takes each arbitrary-shaped superpixel as a semantically consistent node of a graph, while the spatial neighborhood relations are naturally used to construct the undirected graph edges.

    European Conference on Computer Vision (ECCV), 2016 (Spotlight)




  • Reversible Recursive Instance-level Object Segmentation [PDF]

    We propose a novel Reversible Recursive Instance-level Object Segmentation (R2-IOS) framework to address the challenging instance-level object segmentation task. R2-IOS consists of a reversible proposal refinement sub-network that predicts bounding box offsets for refining the object proposal locations, and an instance-level segmentation sub-network that generates the foreground mask of the dominant object instance in each proposal.

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016




  • Towards Computational Baby Learning: A Weakly-supervised Approach for Object Detection [PDF]

    Intuitive observations show that a baby may inherently possess the capability of recognizing a new visual concept (e.g., chair, dog) by learning from only very few positive instances taught by parent(s) or others, and this recognition capability can be gradually further improved by exploring and/or interacting with the real instances in the physical world. Inspired by these observations, we propose a computational model for slightly-supervised object detection.

    IEEE International Conference on Computer Vision (ICCV), 2015




  • Human Parsing with Contextualized Convolutional Neural Network [PDF][Page with Data]]

    We address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

    IEEE International Conference on Computer Vision (ICCV), 2015 (Oral)