Shoou-I Yu, 余守壹

I am currently a Research Scientist at Reality Labs Research Pittsburgh, Meta, led by Yaser Sheikh. I received my Ph.D. under the supervision of Alexander Hauptmann from the Language Technologies Institute, School of Computer Science, Carnegie Mellon University in 2016. During my undergrad I was advised by Jane Yung-jen Hsu. My research interests include (deep) multi-object tracking and large-scale video retrieval or analysis in general. Some of my more interesting YouTube videos: CVPR 13 demo video, deep features for multi-object tracking, and real-world gradient descent with momentum.

I like playing volleyball (No. 59) and watching baseball: Ichiro Suzuki, Shohei Ohtani (500k views!). I also try to travel often: glaciers! orcas! bears! jumping salmon! lava!

Email: iyu at alumni dot cmu dot edu

CV (updated 2022/9/24)

LinkedIn: https://www.linkedin.com/in/shoou-i-yu-0230b825/

Google scholar: https://scholar.google.com/citations?user=3YZTd_UAAAAJ

News

2021/12/12: Lab name changed to Reality Labs Research Pittsburgh, Meta. The work we do (media, Connect 2022).

2016/7/18: Joined Oculus Research @ Pittsburgh as a research scientist.

2016/5/13: Defended! My thesis is here. Slides are here.

2015/5/18 - 2015/8/7: Internship at Google Research Machine Perception group. Mentor: Paul Natsev, Balakrishnan Varadarajan.

2015/4/30: Successfully proposed (my thesis).

2015 Spring: A report from the Pittsburgh Supercomputing Center featuring our work.

2014/11/19: Presented our Multimedia Event Detection GPU work at the NVIDIA GPU Technology Theater @ Supercomputing 2014 (SC '14). [Recording]

2014/11/11: Presented our Multimedia Event Detection work at TRECVID 2014. Slides are here.

2013/12/19: The Marauder's Map paper was elected as the 13 Incredible Tech Inventions You Won't Believe You Missed In 2013 by Huffington Post!

2013/02/25: The Marauder's Map Multi-Camera Multi-Object Tracking paper accepted at CVPR! [Demo video].


Education

Ph.D. in Language and Information Technologies, Carnegie Mellon University, Pittsburgh, PA (2012 ~ 2016)

M.S. in Language Technologies, Carnegie Mellon University, Pittsburgh, PA (2010 ~ 2012)

B.S. in Compute Science and Information Engineering, National Taiwan University, Taipei, Taiwan (2005 ~ 2009)


Publications

  1. Authentic volumetric avatars from a phone scan. Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shun-Suke Saito, Stephen Lombardi, Shih-En Wei, Danielle Belko, Shoou-I Yu, Yaser Sheikh, Jason Saragih. ACM Transactions on Graphics (TOG) 41.4 (2022): 1-19. [pdf][video]

  2. CodedStereo: Learned Phase Masks for Large Depth-of-Field Stereo. Shiyu Tan, Yicheng Wu, Shoou-I Yu, Ashok Veeraraghavan. CVPR 2021. [pdf]

  3. Supervision by registration and triangulation for landmark detection. Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu. IEEE transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020. [pdf]

  4. InterHand2. 6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image. Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, Kyoung Mu Lee. ECCV 2020. [pdf][code and data]

  5. Epipolar transformers. Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu. CVPR 2020. [pdf][code]

  6. Self-supervised adaptation of high-fidelity face models for monocular performance tracking. Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, Hyun Soo Park. CVPR 2019. [pdf]

  7. Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh. CVPR 2018. [pdf][code]

  8. Learning patch reconstructability for accelerating multi-view stereo. Alex Poms, Chenglei Wu, Shoou-I Yu, Yaser Sheikh. CVPR 2018. [pdf]

  9. The Solution Path Algorithm for Identity-Aware Multi-Object Tracking. Shoou-I Yu, Deyu Meng, Wangmeng Zuo, Alexander G. Hauptmann. CVPR 2016. [pdf][code and data] [spotlight presentation]

  10. Strategies for Searching Video Content with Text Queries or Video Examples. Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang. ITE Transactions on Media Technology and Applications 4.3 (2016): 227-238. [pdf]

  11. Text-to-video: a semantic search engine for internet videos. Lu Jiang, Shoou-I Yu, Deyu Meng, Teruko Mitamura, Alexander G Hauptmann. [pdf]

  12. Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization. Shoou-I Yu, Yi Yang, Xuanchong Li, Alexander G. Hauptmann. arXiv 1604.07468. [pdf]

  13. Content-Based Video Search over 1 Million Videos with 1 Core in 1 Second. Shoou-I Yu, Lu Jiang, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann. ICMR 2015. [pdf]

  14. Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos. Lu Jiang, Shoou-I Yu, Deyu Meng, Teruko Mitamura, Alexander G. Hauptmann. ICMR 2015. [pdf]

  15. Fast and Accurate Content-based Semantic Search in 100M Internet Videos. Lu Jiang, Shoou-I Yu, Deyu Meng, Yi Yang, Teruko Mitamura, Alexander G. Hauptmann. ACM MM 2015. [pdf][project page]

  16. Informedia@TRECVID 2014 MED and MER. Shoou-I Yu, Lu Jiang, Zhongwen Xu, Zhenzhong Lan, Shicheng Xu, Xiaojun Chang, Xuanchong Li, Zexi Mao, Chuang Gan, Yajie Miao, Xingzhong Du, Yang Cai, Lara Martin, Nikolas Wolfe, Anurag Kumar, Huan Li, Ming Lin, Zhigang Ma, Yi Yang, Deyu Meng, Shiguang Shan, Pinar Duygulu Sahin, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Teruko Mitamura, Richard Stern and Alexander Hauptmann. TRECVID Video Retrieval Evaluation Workshop, NIST, Gaithersburg, MD, November 2014. [pdf] [slides]

  17. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples. Shoou-I Yu, Lu Jiang, Alexander Hauptmann. In ACM MM 2014. [pdf]

  18. Unsupervised Video Adaptation for Parsing Human Motion. Haoquan Shen, Shoou-I Yu, Yi Yang, Deyu Meng, Alexander Hauptmann. In ECCV 2014. [pdf]

  19. Zero-Example Event Search using MultiModal Pseudo Relevance Feedback. Lu Jiang, Teruko Mitamura, Shoou-I Yu, Alexander G. Hauptmann. In ICMR 2014. [pdf]

  20. Self-paced Learning with Diversity. Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, Alexander Hauptmann. In NIPS 2014. [pdf] [Supplementary Material]

  21. Resource Constrained Multimedia Event Detection. Zhen-zhong Lan, Yi Yang, Nicolas Ballas, Shoou-I Yu, Alexander Hauptmann. In MMM'14, 20th Intl. Conf. on Multimedia Modeling 2014. [pdf]

  22. Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization. Shoou-I Yu, Yi Yang, Alexander Hauptmann. In IEEE CVPR, 2013. [pdf] [Demo video] [2014 Nov. IEEE Signal Processing Magazine]

  23. Informedia@TRECVID 2013. Zhenzhong Lan, Lu Jiang, Shoou-I Yu, Chenqiang Gao, Shourabh Rawat, Yang Cai, Shicheng Xu, Haoquan Shen, Xuanchong Li, Yipei Wang, Waito Sze, Yan Yan, Zhigang Ma, Nicolas Ballas, Deyu Meng, Wei Tong, Yi Yang, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Richard Stern, Teruko Mitamura, Eric Nyberg, and Alexander Hauptmann. TRECVID Video Retrieval Evaluation Workshop, NIST, Gaithersburg, MD, November 2013. [pdf]

  24. E-LAMP: integration of innovative ideas for multimedia event detection. Wei Tong, Yi Yang, Lu Jiang, Shoou-I Yu, Lan Zhen-Zhong, Zhigang Ma, Waito Sze, Ehsan Younessian, Alexander Hauptmann. Journal of Machine Vision and Applications, 2013. [pdf]

  25. Multimedia Classification and Event Detection using Double Fusion. Zhen-zhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, Alexander Hauptmann. Journal of Multimedia Tools and Applications, 2013. [pdf]

  26. Informedia E-Lamp @ TRECVID 2012, Multimedia Event Detection and Recounting. Shoou-I Yu, Zhongwen Xu, Duo Ding, Waito Sze, Francisco Vicente, Zhenzhong Lan, Yang Cai, Shourabh Rawat, Peter Schulam, Nisarga Markandaiah, Sohail Bahmani, Antonio Juarez, Wei Tong, Yi Yang, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Richard Stern, Teruko Mitamura, Eric Nyberg and Alexander Hauptmann. TRECVID Video Retrieval Evaluation Workshop, NIST, Gaithersburg, MD, November 2012. [pdf]

  27. Double Fusion for Multimedia Event Detection. Zhenzhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, Alexander Hauptmann. In MMM'12, 18th Intl. Conf. on Multimedia Modeling, 2012. [pdf]

  28. Informedia @ TRECVID 2011, Multimedia Event Detection and Semantic Indexing. Lei Bao, Shoou-I Yu, Zhen-zhong Lan, Arnold Overwijk, Qin Jin, Brian Langner, Michael Garbus, Susanne Burger, Florian Metze, Alexander Hauptmann. TRECVID Video Retrieval Evaluation Workshop, NIST, Gaithersburg, MD, December 2011. [pdf]

  29. Informedia @ TRECVID 2010. Huan Li, Lei Bao, Zan Gao, Arnold Overwijk, Wei Liu, Long-fei Zhang, Shoou-I Yu, Ming-yu Chen, Florian Metze and Alexander Hauptmann. TRECVID Video Retrieval Evaluation Workshop, NIST, Gaithersburg, MD, December 2010. [pdf]

  30. A Content-Based Method to Enhance Tag Recommendation. Yu-Ta Lu, Shoou-I Yu, Tsung-Chieh Chang, Jane Yung-jen Hsu. In IJCAI ‘09: Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence, pages 2064 – 2069, 2009. [pdf]

  31. Improved Factoring of RSA Modulus. Jiun-Ming Chen, Shoou-I Yu, Yi Ou-Yang, Po-Han Wang, Chi-Hung Lin, Po-Yi Huang, Bo-Yin Yang, Chi-Sung Laih. In Proceedings of the 25th Workshop on Combinatorial Mathematics and Computation Theory, Chung Hua University, Hsinchu Hsien, Taiwan, 2008. [pdf]


Others

Here is a link to my undergrad webpage.

Last Updated: 2022/9/24