ECCV 2016 VisStory
ECCV2016 2nd Workshop on Storytelling with Images and Videos (VisStory)

This workshop is jointly organized with Large Scale Movie Description and Understanding Challenge (LSMDC2016) in conjunction with ECCV 2016, Amsterdam, The Netherlands.

Important Dates

  • Submission deadline: August 30, 2016

  • Acceptance Notification: September 15, 2016

  • Workshop: October 16, 2016 Sunday

Location: Roeterseiland Room C1.04


The ability to craft and understand story narratives is one of the key cognitive tools used by humans for communication. Media created by humans in oral, written, or visual form, carry the underlying stories, which are essential for their complete semantic understanding. Images and videos are no exception, and in fact, are prime examples of such phenomena. Furthermore, with widespread availability of mobile recording devices and emergence of social networking, image and video data around real-world objects, events, and activities are becoming bigger in volume and richer in the representation. Such abundance of visual data, makes the understanding of stories increasingly important, in order to help address visualization, exploration, and semantic comprehension of the visual data. Recent research has shown that the stories play an important role in intuitive exploration of large collections of geotagged or timestamped images, summarization of hours-long egocentric videos, boosting the accuracy of face, activity, and object recognition in unstructured images and videos, and obtaining automatic language descriptions from images. These applications, however, only scratches the surface of what may be possible with such semantically rich understanding.

This joint workshop aims to invite experts in a variety of related fields, including vision, graphics, Web mining, language processing, and HCI, to provide a perspective on the research that exists, and initiate the discussion for the next big challenges in the data-driven and vision-oriented storytelling. Aside from exploring challenging research problems, the workshop also encourages the ideas of creative and commercial applications in the domains such as media, entertainment, and digital arts.

Topics of interest include but not limited to:

  • Event detection and storytelling in social media

  • Interactive exploration and summarization for large collections of images or videos

  • Story-based summarization and detection of ego-centric videos

  • Question and answering for visual content

  • Language descriptions for images and videos

  • Deep learning architectures for story representation

  • Story extraction from movies or comic books

  • Causality detection in images or videos

  • Story-based detection of human activities and objects in images and videos

  • Analysis of geo-tagged photo collections

  • Novel tasks with movie description and understanding challenge dataset

Call for Papers

The workshop will mostly consist of a selected set of invited talks given by leading researchers in the related fields. It is welcomed to submit relevant work that has been recently published, is in progress, or is to be presented other venues including the ECCV main conference. We encourage the format of 2–6 page long Extended abstracts. Accepted submissions will be invited to present as spotlight talks.


Other submission guidelines are as follows.

  • Submissions should follow the ECCV format.

  • The recommended paper length is 2–6 pages.

  • Review process will be double blind.

  • There will be no official proceedings.

Submission Site

Please submit papers in PDF format here (now closed).

Invited Speakers

Sanja Fidler
(University of Toronto)
Jason J. Corso
(University of Michigan)
Devi Parikh
(Georgia Tech)
Cees Snoek
(University of Amsterdam)


09:00 – 09:10 Organizers Introduction talk
09:10 – 09:40 Sanja Fidler (University of Toronto) TBD
09:40 – 10:10 Jason J. Corso (University of Michigan) Discovering and Describing Steps in Temporal Processes with Deep Embeddings
10:10 – 10:30 Coffee Break
10:30 – 11:10 Organizers LSMDC Challenge Introduction
11:10 – 11:25 Winner for Movie description Video Description by Combining Strong Representation and a Simple Nearest Neighbor Approach
Gil Levi, Dotan Kaufman, Lior Wolf (Tel Aviv University), Tal Hassner (University of Southern California)
11:25 – 11:40 Winner for Movie annotation & retrieval,
Movie fill in the blank
Video Captioning and Retrieval Models with Semantic Attention [paper]
YoungJae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim (Seoul National University)
11:40 – 12:20 Poster presentation
12:20 – 13:30 Lunch Break
13:30 – 14:00 Devi Parikh (Georgia Tech) Learning Common Sense from Stories
14:00 – 15:00 Spotlight Talks See the papers below.
15:00 – 15:30 Cees Snoek (University of Amsterdam) Recognizing events in videos without examples
15:30 – 15:40 Closing

Spotlight Papers

Each paper is allocated 10 minutes for spotlight talk and 2 minute for questions and switching between speakers.

  1. Huijuan Xu and Kate Saenko. Dual Attention Network for Visual Question Answering.

  2. Desara Xhura and John Brown. Why did You Choose Those Key-frames? Why did You Skip Those Parts? A User Study on Video Summarization.

  3. Atousa Torabi, Niket Tandom, Leonid Sigal. Learning Language-Visual Embedding for Movie Understanding with Natural-Language.

  4. Hendrik Heuer, Christof Monz, Arnold Smeulders. Generating Captions without Looking Beyond Objects.

  5. Tuan Do. Event-driven Movie Annotation using MPII Movie Dataset.


Program Committees

  • Eric P Xing (CMU, Advisory)

  • Yong Jae Lee (UC Davis)

  • Vicente Ordonez-Roman (U Virginia)

  • Hyun Oh Song (Google Research)

  • Makarand Tapaswi (KIT)

  • Bo Xiong (UT Austin)

  • Hao Zhang (CMU)