SCS Researchers Win Best Paper at MLSys 2025

Adam KohlhaasMonday, June 2, 2025

A collaborative research project led in part by SCS researchers Tianqi Chen and Ruihang Lai received a Best Paper award at the 2025 Conference on Machine Learning and Systems last month.

A collaborative research project led in part by researchers in Carnegie Mellon University's School of Computer Science has received top honors at the 2025 Conference on Machine Learning and Systems (MLSys). The paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," was named Best Paper at this year's event, held May 12-15 in Santa Clara, California.

Ruihang Lai, a Ph.D. student in the Computer Science Department (CSD), and Tianqi Chen, an assistant professor in both CSD and the Machine Learning Department, were among the paper's lead contributors. They joined forces with the University of Washington's Zihao Ye, who is a visiting scholar at CMU, to develop scalable solutions for deploying large language models (LLMs) in real-time environments.

FlashInfer introduces a high-performance attention engine optimized to serve LLMs. The project began as a joint effort between CMU; the University of Washington's Allen School of Computer Science and Engineering; and OctoAI, an AI systems startup acquired by NVIDIA. Initially designed to improve inference throughput and flexibility, FlashInfer has evolved into a widely used open-source library with production deployments and active contributions from across the AI systems community.

"It is amazing to see FlashInfer grow from a collaborative research project to a community project being used in major open-source LLM inference engines," Chen said.

More information about FlashInfer, including source code and documentation, is available through the project's repository. For more information about MLSys, visit the conference website.

For More Information

Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu