|
Selected Publications
*: indicating equal contribution or alphabetic ordering.
For all publications, please see Google scholar.
Act to See: Emergent Active Visual Perception in Video CoT via Tool Use.
Martin Q. Ma, Yuxiao Qu, Willis Guo, Aditya Agrawal, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency.
Under Submission, 2025.
Paper, Code
Video Active Perception: Efficient Inference-Time Long-Form Video Understanding with Vision-Language Models.
Martin Q. Ma, Willis Guo, Aditya Agrawal, Ankit Gupta, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency.
ICCV workshop, 2025.
[Paper], Code
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Lingjing Kong*, Martin Q. Ma*, Guangyi Chen, Eric Xing, Yuejie Chi, Louis-Philippe Morency, Kun Zhang.
Highlight, The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2023.
[Paper], [Video], [Slides], [Code]
Complex Transformer: A Framework for Modeling Complex-Valued Sequence
Muqiao Yang*, Martin Q. Ma*, Dongyu Li*, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
International Conference on Acoustics, Speech and Signal Processing, (ICASSP) 2020
Oral, Neural Information Processing Systems Science meets Engineering of Deep Learning Workshop (NeurIPS SEDL) 2019.
[Paper], [Video], [Slides], [Code]
|