Structures and objects, captured in image data, are often idealized by the viewer. For example, buildings may seem to be perfectly straight, or repeating structures such as corn’s kernels may seem almost identical. However, in reality, such flawless behavior hardly exists. The goal in this line of work is to detect the spatial imperfection, i.e., departure of objects from their idealized models, given only a single image as input, and to render a new image in which the deviations from the model are either reduced or magnified. Reducing the imperfections allows us to idealize/beautify images, and can be used as a graphic tool for creating more visually pleasing images. Alternatively, increasing the spatial irregularities allow us to reveal useful and surprising information that is hard to visually perceive by the naked eye (such as the sagging of a house’s roof). I will consider this problem under two distinct definitions of idealized model: (i) ideal parametric geometries (e.g., line segments, circles), which can be automatically detected in the input image. (ii) perfect repetitions of structures, which relies on the redundancy of patches in a single image. Each of these models has lead to a new algorithm with a wide range of applications in civil engineering, astronomy, design, and materials defects inspection.

Tali Dekel is currently a Research Scientist at Google, working on developing computer vision and computer graphics algorithms. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William T. Freeman. Tali completed her Ph.D studies at the school of electrical engineering, Tel-AvivUniversity, under the supervision of Prof. Shai Avidan, and Prof. Yael Moses. Tali’s Ph.D. focused on the use of multi-camera systems to solve classic and innovative tasks in computer vision and computer graphics including 3D structure and 3D motion estimation, content-geometry aware stereo retargeting, and photo sequencing (recovering temporal order of distributed image set). In her postdoc studies, she has been working on developing new algorithms that detect and visualize imperfections/irregularities in a single image. Her research interests include computer vision and graphics, geometry, 3D reconstruction, motion analysis, and image visualization.

People love stories. Pictures allow for engaging storytelling but this is still an expensive and exclusive art form — visual stories remain difficult to create.  Editing a single image or a video clip has historically been easier than animation, where keyframe synthesis dominates despite its dramatically high costs.  In this talk, I will describe my efforts to make animated storytelling more accessible. Some of this work has been featured in Photoshop and Illustrator, used by startups (3Gear Systems/NimbleVR, now Oculus; Mixamo, now Adobe; and FaceShift, now Apple), and, most recently, within Adobe Character Animator, a system for performance-based animation.

Jovan Popovic is a Senior Principal Scientist at Adobe Systems.  After receiving bachelor's degrees in mathematics and computer science in 1995, he attended the University of Washington and Carnegie Mellon University, where he earned a doctoral degree for his work in computer animation and geometric modeling. He was on the faculty at the Massachusetts Institute of Technology before moving to Seattle to join Adobe Research in 2008.  Since 2013, he has steered the vision, architecture, research, and implementation of the Adobe Character Animator, a new software product for performance-based animation.

Sponsored in part by Disney Research

The cell is the basic structural and functional unit of all living organisms. Inside a cell, macromolecular complexes are nanomachines that participate in a wide range of processes. The recent revolutions in Electron CryoTomography enables 3D visualization of cell organization in a near native state at molecular resolution. The produced 3D images provide detailed information about all macromolecular complexes, their structures, their abundances, and their specific spatial locations and orientations inside the field of view. However, extracting this information is very challenging and current methods usually rely on templates of known structure. Here, we formulate a template-free structural analysis as a pattern mining problem and propose a new framework called "Multi Pattern Pursuit" for supporting de novo discovery of macromolecular complexes in cellular tomograms without using templates of known structures. Our tests on simulated and experimental tomograms show that our method is a promising tool for such analysis.

Dr. Min Xu is an Assistant Research Professor of Computational Biology at the Computational Biology Department in the School of Computer Science at Carnegie Mellon University. He received degrees in Computational Biology, Computer Science, and Applied Mathematics. He has more than 16 years of research experience in various Computational Biology areas.  His current research focus on Cellular Electron CryoTomography 3D image derived modelling of cell organization at molecular resolution.

Although SFM and SLAM have achieved great success in the past decade, some critical issues are not adequately addressed, which greatly restrict their applications in practice. For example, how to efficiently obtain long and accurate feature tracks and close complex loops for multiple sequences? How to efficiently perform global bundle adjustment for large datasets with limited memory space? How to perform robust SLAM in dynamic environments? How to handle fast motion and strong rotation? In this talk, I will introduce our recent works for addressing these key issues. A live AR demo on a mobile device and a set of applications will be presented.

Dr. Guofeng Zhang now is an Associate Professor at State Key Lab of CAD&CG, Zhejiang University. He received his BS and Ph.D degrees in Computer Science from Zhejiang University, in 2003 and 2009, respectively. Currently, he is a visiting scholar at Robotics Institute of CMU, working with Michael Kaess and Martial Hebert. His research interests include structure-from-motion, SLAM, 3D reconstruction, augmented reality, video segmentation and editing. He has published 20 papers in the major journals (TPAMI, TIP, TVCG, TMM, CVIU) and conferences (ICCV, CVPR, ECCV, ISMAR) in computer vision, graphics and augmented/mixed reality areas. Based on these research achievements, the group he led has successfully developed several systems about SFM/SLAM and 3D Reconstruction, such as ACTS, LS-ACTS, RDSLAM and RKSLAM, which can be downloaded from the ZJUCVG group website .

Today, we are moving faster than ever towards Weiser’s seminal vision of technology being woven into the fabric of our everyday lives. Not only have we adopted mobile, and more recently, wearable technologies, that we depend on almost every hour of our waking lives, there is an internet, and more recently, an Internet of Things, that connects us to each other and our surrounding environments. This unique combination of instrumentation and connectivity offers an opportunity to fundamentally change the way in which we learn and share knowledge with one another. In this talk, I will outline my research in the areas of interaction for wearables and the Internet of Things, and discuss how these technologies can be leveraged for learning, performance, and coordination of real-world physical tasks and activities.

Tovi Grossman is a Distinguished Research Scientist at Autodesk Research, located in downtown Toronto. Dr. Grossman’s research is in HCI, focusing on input and interaction with new technologies. In particular, he has been exploring how emerging technologies, such as wearables, the Internet of Things, and gamification can be leveraged to enhance learning and knowledge sharing for both software applications and real-world physical tasks. This work has led to a number of technologies now in Autodesk products used by millions of users, such as Autodesk Screencast and Autodesk ToolClip™ videos.

Dr. Grossman received a Ph.D. in Human-Computer Interaction from the Department of Computer Science at the University of Toronto. He has over 80 peer-reviewed journal and conference publications. Fourteen of these publications have received best paper awards and nominations at the ACM UIST and CHI conferences. He has also served as the Technical Program Co-Chair for the ACM CHI 2014 Conference, and the Program Co-Chair for the ACM UIST 2015 Conference.

Sponsored in part by Disney Research.

There is much to benefit from 3D recovery of clouds and aerosol distributions, in high spatio-temporal resolution and wide coverage. Algorithms are developed for such tasks, including stereo triangulation (clouds) and tomography (aerosols). However, existing remote sensing instruments may lack the spatio-temporal resolution desired to properly exploit these algorithms. There is a need to capture frequent samples of the atmospheric radiance field from many viewpoints. To help achieve this, we develop a new imaging system, based on a wide, dense, scalable network of wide-angle cameras looking upward. The network uses low-cost units, to enable large scale deployment in the field. We demonstrate high-resolution 3D recovery of clouds based on data captured by a prototype system. We use space carving to recover the volumetric distribution of clouds. This method leads directly to cloud shapes, bypassing surface triangulations that are based on image correspondence. Furthermore, network redundancy solves various radiometric problems that exist in monocular or stereoscopic systems.

Work is joint with Dmitry Veikherman, Aviad Levis and Yoav Y. Schechner.

Sponsored in part by Disney Research.

In this talk I will present the use of motion cues, in particular long-range temporal interactions among objects, for computer vision tasks such as video segmentation, object tracking, pose estimation and semantic segmentation. The first part of the talk will present a method to capture such interactions and to construct an intermediate-level video representation. We also use them for tracking objects, and develop a tracking-by-detection approach that exploits occlusion and motion reasoning. This reasoning is based on long-term trajectories, which are labelled as object or background tracks with an energy-based formulation. We then show the use of temporal constraints for estimating articulated human poses, which is cast as an optimization problem. We present a new approximate scheme to solve it, with two steps dedicated to pose estimation.

The second part of the talk presents the use of motion cues for semantic segmentation. Fully convolutional neural networks (FCNNs) have become the new state of the art for this task recently, but rely on a large number of images with strong pixel-level annotations. To address this, we present motion-CNN (M-CNN), a novel FCNN framework which incorporates motion cues and is learned from video-level weak annotations. Our learning scheme to train the network uses motion segments as soft constraints, thereby handling noisy motion information. We demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.

Karteek Alahari is an Inria permanent researcher (chargé de recherche) since October 2015. He has been at Inria since 2010, initially as a postdoctoral fellow in the WILLOW team in Paris, and then on a starting research position in Grenoble since September 2013. Dr. Alahari's PhD from Oxford Brookes University, UK, was on efficient inference and learning algorithms. His work as a postdoc focused on new models for scene understanding problems defined on videos. His current research interests are models for human pose estimation, semantic segmentation and object tracking, and weakly supervised learning.

Host: Olga Russakovsky

Matrix completion is a generic framework aiming to recover a matrix from a limited number of (possibly noisy) entries. In this content, low-rank regularizers are often imposed so as to find matrix estimators that are robust to noise and outliers. In this talk I will discuss three recent advances on matrix completion, developed to solve three different vision applications. First, coupled matrix completion to solve joint head and body pose estimation. Second, non-linear matrix completion to recognize emotions from abstract paintings. Third, self-adaptive matrix completion for remote heart-rate estimation from videos.

Xavier Alameda-Pineda received the M.Sc. degree in mathematics and telecommunications engineering from the Universitat Politècnica de Catalunya – BarcelonaTech in 2008 and 2009 respectively, the M.Sc. degree in computer science from the Université Joseph Fourier and Grenoble INP in 2010, and the Ph.D. degree in mathematics/computer science from the Université Joseph Fourier in 2013. He worked towards his Ph.D. degree in the Perception Team, at INRIA Grenoble Rhône-Alpes. He currently holds a postdoctoral position at the Multimodal Human Understanding Group at University of Trento. His research interests are machine learning and signal processing for scene understanding, speaker diaritzation and tracking, sound source separation and behavior analysis.


In recent years, deep learning has begun to dominate computer vision research, with convolutional neural networks becoming the standard machine learning tool for a wide range of tasks. However, one of the requirements for these methods to work effectively is a rich source of training data. Therefore, parallel applications in "real-world" robotics such as manipulation, are often still limited by the capacity to generate large-scale, high-quality data. In this talk, I will introduce some techniques I have developed to train robots using simulation, without the need to conduct costly real-world experiments. Specifically, I will talk about multi-view active object recognition, robotic grasping using physics simulation, and deep reinforcement learning for robotic arm control.

Ed Johns is a Dyson Fellow at Imperial College London, working on computer vision, robotics and machine learning. He received a BA and MEng from Cambridge University, followed by a PhD in visual recognition and localisation from Imperial College London. After post-doctoral work at University College London, he then took up a research fellowship and returned to Imperial to help set up the Dyson Robotics Lab with Professor Andrew Davison. He now works on visually-guided robot manipulation for domestic robotics.

Faculty Host: Michael Kaess


Deep learning has been proven very successful in many applications that require advanced pattern matching, including computer vision. However, it is still unclear how deep learning could be involved in other tasks such as logic reasoning. In this talk, I introduce two of our recent works on this direction, Visual Question and Answering and Computer Go. We show that with different architecture, we could achieve state-of-the-art performance against existing approaches.

Yuandong Tian is a Research Scientist in Facebook AI Research, working on Deep Learning and Computer Vision. Prior to that, he was a Software Engineer in Google Self-driving Car team in 2013-2014. He received Ph.D in Robotics Institute, Carnegie Mellon University on 2013, Bachelor and Master degree of Computer Science in Shanghai Jiao Tong University. He is the recipient of 2013 ICCV Marr Prize Honorable Mentions for his work on global optimal solution to nonconvex optimization in image alignment.


Subscribe to VASC