We study the problem of learning geometric and physical object properties from visual input. Inspired by findings in cognitive science that even infants are able to perceive a physical world full of dynamic content, we aim to build models to characterize physical and geometric object properties from synthetic and real-world scenes. In this talk, I will present some models we recently proposed for 3D shape recognition and synthesis, and for physical scene understanding. I will also present a newly collected video dataset of physical events.
Jiajun Wu is a third-year Ph.D. student at Massachusetts Institute of Technology, advised by Professor Bill Freeman and Professor Josh Tenenbaum. His research interests lie on the intersection of computer vision, machine learning, and computational cognitive science. Before coming to MIT, he received his B.Eng. from Tsinghua University, China, advised by Professor Zhuowen Tu. He has also spent time working at research labs of Microsoft, Facebook, and Baidu.
With 3D printing, geometry that was once too intricate to fabricate can now be produced easily and inexpensively. In fact, printing a object perforated with thousands of tiny holes can be actually cheaper and faster than producing the same object filled with solid material. The expanded designspace admitted by additive fabrication contains objects that can outperform traditional shapes and exhibit interesting properties, but navigating the space is a challenging task.
In this talk, I focus on two applications leveraging this design space. First, I discuss how to customize objects' deformation behaviors even when printing with a single material. By designing structures at the microscopic scale, we can achieve perfectly isotropic elastic behavior with a wide range of stiffnesses (over 10000 times softer than the printing material) and effective Poisson's ratios (both auxetic and nearly incompressible). Then, with an optimization at the macroscopic scale, we can decide where in the object to place these structures to achieve a user-specified deformation goal under prescribed forces.
Next I tackle a problem that emerges when using micro-scale structures: fragility, especially for the softer designs. I discuss how to efficiently analyze structures for their likelihood of failure (either brittle or ductile fracture) under general use. Finally, I show how to optimize a structure to maximize its robustness while still maintaining its elastic behavior.
Julian Panetta is a PhD candidate at NYU's Courant Institute, where he is advised by Denis Zorin. Julian is interested in simulation and optimal designproblems, specifically focusing on applications for 3D printing. Before joining NYU, he received his BS in computer science from Caltech and did research at NASA's Jet Propulsion Lab.
Sponsored in part by Disney Research
Understanding and reasoning about our visual world is a core capability of artificial intelligence. It is a necessity for effective communication, and for question/answering tasks. In this talk, I discuss some recent explorations into visual reasoning to gain an understanding of how humans and machines tackle the problem. I’ll also describe how algorithms initially developed for visual understanding can be applied to other domains, such as program induction.
C. Lawrence Zitnick is a research manager at Facebook AI Research, and an affiliate associate professor at the University of Washington. He is interested in a broad range of topics related to artificial intelligence including object recognition, the relation of language and imagery, and methods for gathering common sense knowledge. He developed the PhotoDNA technology used by Microsoft, Facebook, Google, and various law enforcement agencies to combat illegal imagery on the web. Before joining FAIR, he was a principal researcher at Microsoft Research. He received the PhD degree in robotics from Carnegie Mellon University.
Everyone has some experience of solving jigsaw puzzles. When facing ambiguities of assembling a pair of pieces, a common strategy we use is to look at clues from additional pieces and make decisions among all relevant pieces together. In this talk, I will show how to apply this common practice to develop data-driven algorithms that significantly outperform pair-wise algorithms. I will start with describing a computation framework for the joint inference of correspondences among shape/image collections. Then I will discuss how similar ideas can be utilized to learn visual correspondences.
Qixing Huang is an assistant professor at the University of Texas at Austin. He obtained his PhD in Computer Science from Stanford University and his MS and BSin Computer Science from Tsinghua University. He was a research assistant professor at Toyota Technological Institute at Chicago before joining UT Austin. He has also worked at Adobe Research and Google Research, where he developed some of the key technologies for Google Street View.
Dr. Huang’s research spans the fields of computer vision, computer graphics, and machine learning. In particular, he is interested in designing new algorithms that process and analyze big geometric data (e.g., 3D shapes/scenes). He is also interested in statistical data analysis, compressive sensing, low-rank matrix recovery, and large-scale optimization, which provides theoretical foundation for his research. Qixing has published extensively at SIGGRAPH, CVPR and ICCV, and has received grants from NSF and various industry gifts. He also received the best paper award at the Symposium on Geometry Processing 2013.
Beginning with the philosophical and cognitive underpinnings of referring expression generation, and ending with theoretical, algorithmic and applied contributions in mainstream vision-to-language research, I will discuss some of my work through the years towards the ultimate goal of helping humans and computers to communicate. This will be a multi-modal, multi-disciplinary talk (with pictures!), aimed to be interesting no matter what your background is.
Meg Mitchell is currently a Senior Research Scientist in Google's Machine Intelligence Research in Seattle, WA. I work on advancing artificial intelligence in a way that is interpretable, understanding of art and literature, and respectful of user privacy. I work on vision-language and grounded language generation, focusing on how to help computers communicate based on what they can process. My work combines computer vision, natural language processing, social media, many statistical methods, and insights from cognitive science. I continue to balance my time between language generation, applications for clinical domains, and core AI research.
Sponsored in part by Disney Research
Structures and objects, captured in image data, are often idealized by the viewer. For example, buildings may seem to be perfectly straight, or repeating structures such as corn’s kernels may seem almost identical. However, in reality, such flawless behavior hardly exists. The goal in this line of work is to detect the spatial imperfection, i.e., departure of objects from their idealized models, given only a single image as input, and to render a new image in which the deviations from the model are either reduced or magnified. Reducing the imperfections allows us to idealize/beautify images, and can be used as a graphic tool for creating more visually pleasing images. Alternatively, increasing the spatial irregularities allow us to reveal useful and surprising information that is hard to visually perceive by the naked eye (such as the sagging of a house’s roof). I will consider this problem under two distinct definitions of idealized model: (i) ideal parametric geometries (e.g., line segments, circles), which can be automatically detected in the input image. (ii) perfect repetitions of structures, which relies on the redundancy of patches in a single image. Each of these models has lead to a new algorithm with a wide range of applications in civil engineering, astronomy, design, and materials defects inspection.
Tali Dekel is currently a Research Scientist at Google, working on developing computer vision and computer graphics algorithms. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William T. Freeman. Tali completed her Ph.D studies at the school of electrical engineering, Tel-AvivUniversity, under the supervision of Prof. Shai Avidan, and Prof. Yael Moses. Tali’s Ph.D. focused on the use of multi-camera systems to solve classic and innovative tasks in computer vision and computer graphics including 3D structure and 3D motion estimation, content-geometry aware stereo retargeting, and photo sequencing (recovering temporal order of distributed image set). In her postdoc studies, she has been working on developing new algorithms that detect and visualize imperfections/irregularities in a single image. Her research interests include computer vision and graphics, geometry, 3D reconstruction, motion analysis, and image visualization.
People love stories. Pictures allow for engaging storytelling but this is still an expensive and exclusive art form — visual stories remain difficult to create. Editing a single image or a video clip has historically been easier than animation, where keyframe synthesis dominates despite its dramatically high costs. In this talk, I will describe my efforts to make animated storytelling more accessible. Some of this work has been featured in Photoshop and Illustrator, used by startups (3Gear Systems/NimbleVR, now Oculus; Mixamo, now Adobe; and FaceShift, now Apple), and, most recently, within Adobe Character Animator, a system for performance-based animation.
Jovan Popovic is a Senior Principal Scientist at Adobe Systems. After receiving bachelor's degrees in mathematics and computer science in 1995, he attended the University of Washington and Carnegie Mellon University, where he earned a doctoral degree for his work in computer animation and geometric modeling. He was on the faculty at the Massachusetts Institute of Technology before moving to Seattle to join Adobe Research in 2008. Since 2013, he has steered the vision, architecture, research, and implementation of the Adobe Character Animator, a new software product for performance-based animation.
Sponsored in part by Disney Research
The cell is the basic structural and functional unit of all living organisms. Inside a cell, macromolecular complexes are nanomachines that participate in a wide range of processes. The recent revolutions in Electron CryoTomography enables 3D visualization of cell organization in a near native state at molecular resolution. The produced 3D images provide detailed information about all macromolecular complexes, their structures, their abundances, and their specific spatial locations and orientations inside the field of view. However, extracting this information is very challenging and current methods usually rely on templates of known structure. Here, we formulate a template-free structural analysis as a pattern mining problem and propose a new framework called "Multi Pattern Pursuit" for supporting de novo discovery of macromolecular complexes in cellular tomograms without using templates of known structures. Our tests on simulated and experimental tomograms show that our method is a promising tool for such analysis.
Dr. Min Xu is an Assistant Research Professor of Computational Biology at the Computational Biology Department in the School of Computer Science at Carnegie Mellon University. He received degrees in Computational Biology, Computer Science, and Applied Mathematics. He has more than 16 years of research experience in various Computational Biology areas. His current research focus on Cellular Electron CryoTomography 3D image derived modelling of cell organization at molecular resolution.
Although SFM and SLAM have achieved great success in the past decade, some critical issues are not adequately addressed, which greatly restrict their applications in practice. For example, how to efficiently obtain long and accurate feature tracks and close complex loops for multiple sequences? How to efficiently perform global bundle adjustment for large datasets with limited memory space? How to perform robust SLAM in dynamic environments? How to handle fast motion and strong rotation? In this talk, I will introduce our recent works for addressing these key issues. A live AR demo on a mobile device and a set of applications will be presented.
Dr. Guofeng Zhang now is an Associate Professor at State Key Lab of CAD&CG, Zhejiang University. He received his BS and Ph.D degrees in Computer Science from Zhejiang University, in 2003 and 2009, respectively. Currently, he is a visiting scholar at Robotics Institute of CMU, working with Michael Kaess and Martial Hebert. His research interests include structure-from-motion, SLAM, 3D reconstruction, augmented reality, video segmentation and editing. He has published 20 papers in the major journals (TPAMI, TIP, TVCG, TMM, CVIU) and conferences (ICCV, CVPR, ECCV, ISMAR) in computer vision, graphics and augmented/mixed reality areas. Based on these research achievements, the group he led has successfully developed several systems about SFM/SLAM and 3D Reconstruction, such as ACTS, LS-ACTS, RDSLAM and RKSLAM, which can be downloaded from the ZJUCVG group website .
Today, we are moving faster than ever towards Weiser’s seminal vision of technology being woven into the fabric of our everyday lives. Not only have we adopted mobile, and more recently, wearable technologies, that we depend on almost every hour of our waking lives, there is an internet, and more recently, an Internet of Things, that connects us to each other and our surrounding environments. This unique combination of instrumentation and connectivity offers an opportunity to fundamentally change the way in which we learn and share knowledge with one another. In this talk, I will outline my research in the areas of interaction for wearables and the Internet of Things, and discuss how these technologies can be leveraged for learning, performance, and coordination of real-world physical tasks and activities.
Tovi Grossman is a Distinguished Research Scientist at Autodesk Research, located in downtown Toronto. Dr. Grossman’s research is in HCI, focusing on input and interaction with new technologies. In particular, he has been exploring how emerging technologies, such as wearables, the Internet of Things, and gamification can be leveraged to enhance learning and knowledge sharing for both software applications and real-world physical tasks. This work has led to a number of technologies now in Autodesk products used by millions of users, such as Autodesk Screencast and Autodesk ToolClip™ videos.
Dr. Grossman received a Ph.D. in Human-Computer Interaction from the Department of Computer Science at the University of Toronto. He has over 80 peer-reviewed journal and conference publications. Fourteen of these publications have received best paper awards and nominations at the ACM UIST and CHI conferences. He has also served as the Technical Program Co-Chair for the ACM CHI 2014 Conference, and the Program Co-Chair for the ACM UIST 2015 Conference.
Sponsored in part by Disney Research.