Marylee WilliamsThursday, September 25, 2025Print this page.

The Breakdown
***
A tool developed at Carnegie Mellon University could make it easier to identify patients at risk for cancer, bolstering early detection — a crucial step for improving treatment outcomes.
Researchers in CMU's School of Computer Science developed CATch Cancer Early With Healthcare Foundation Models (CATCH-FM), a non-invasive tool that prescreens for cancer using only the patient's electronic health record (EHR). In preliminary tests, 50 to 70% of patients flagged as high risk for lung, liver and pancreatic cancer were later diagnosed with those conditions. Hospitals could use CATCH-FM to discover which people need additional medical screenings, saving time and — hopefully, one day — lives.
"The CATCH-FM advanced healthcare foundation model and framework has already shown value in prior retrospective studies and holds great potential to expand into other cancer types and even noncancer medical fields, ultimately benefiting a broader population," said Dr. Wei-An Chang, the director of the medical intensive care unit at Kaohsiung Medical University Hospital (KMUH) in Taiwan. "By combining CMU and its spin-off startups' cutting-edge AI technologies with KMU's outstanding clinical capacity and biomedical research expertise, this collaboration is opening the door to a transformative future in smart health care."
CMU and a university spinout, Xlue Inc., are collaborating with KMUH to use technologies like CATCH-FM in clinical settings. Together, they aim to create a new paradigm for smart health care that fuses CMU's global leadership in AI with Xlue's technical capabilities and KMUH's medical expertise. The research team is working with KMUH to integrate CATCH-FM into the hospital workflows, initially focusing on early lung cancer screening and matching patients with drug trial opportunities.
CATCH-FM uses autoregressive training, also known as next-token prediction, to forecast a patient's next health event based on what it has learned from the EHR data. It functions similarly to chatbots, like ChatGPT, but in this case the model predicts a patient's next medical event or diagnosis from their medical history and its previous training.
Researchers trained CATCH-FM as part of academic collaborations on EHR data from large-scale Taiwanese national health care research claims databases, a dataset of millions of patients' health records spanning two decades. They then fine-tuned the tool using datasets of patients with lung, liver and pancreatic cancers, allowing it to focus on cancer-specific patterns in EHRs. For instance, if a patient has certain diagnoses or has taken certain medications, CATCH-FM could identify the patient as high-risk for lung cancer and prioritize them for additional screenings.

"We've seen how powerful large language models and foundation models are on different tasks," said Chenyan Xiong, an associate professor in CMU's Language Technologies Institute (LTI) who led the development of CATCH-FM. "My collaborators and I have always been passionate about health care and wondered whether these tools could be applied in this domain. We wanted to start with cancer risk prediction because we believe it can be deeply impactful. This work is really the marriage of AI technology and hospital collaboration to see progress in health care."
CATCH-FM required training on large-scale, anonymous health care records data from Taiwan. Researchers trained CATCH-FM on medical sequence codes, which are often used for diagnosis and billing, and found that this data was clearer, making it easier for the model to learn. The team built CATCH-FM rather than use an off-the-shelf LLM because the training data wasn't publicly available and patient privacy was a priority.
In a study, researchers tested CATCH-FM on past medical records of more than 30,000 patients who had been diagnosed with certain cancers to see how well the model could predict cancer risk. Following suggestions from clinicians, they split their evaluations into identifying cancer risk for patients with no prior cancer history and patients with a history of cancer but not the targeted cancers — lung, liver and pancreatic. Among patients with no prior cancer diagnosis, CATCH-FM correctly flagged as high risk about half of those who were later diagnosed with cancer. For patients with prior cancer diagnosis, the model flagged seven out of ten.
"By collaborating with health care systems and clinicians, we start to build a path to bring the power of foundation models to the patients' health care domain," Xiong said. "For me, it's really promising to see how all these powerful AI tools, like foundation models and generative AI, have the potential to be transferred into the health care system. I think this has the potential to improve patient care and eventually save lives."
The SCS research team included Gary Gao, a master's student in the Computer Science Department; Liwen Sun, an LTI master's student; and Hao-Ren Yao, an LTI research scientist. The team also included Ophir Frieder from Georgetown University.
Xiong, Yao and Frieder launched Xlue Inc. to focus on health care foundation models and their clinical applications. The researchers partnered with KMUH's Center for Responsible AI in Medicine, which promotes applications aligned with transparency principles to ensure reliability and accountability in AI-driven healthcare, to focus on hospital clinical data and prospective research.
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu