Marylee WilliamsTuesday, July 22, 2025Print this page.

Invasive plants wreak havoc on farms and natural ecosystems. They can destroy crops by crowding out native plants and competing for resources, resulting in millions of dollars in annual losses.
Tracking down, monitoring and ultimately eradicating invasive species is a priority for people working in agriculture, conservation and ecology, but the task is time-consuming and costly. Artificial intelligence could help, but scientists lack the data needed to develop a tool that can effectively identify and monitor these plants.
To help fill this data gap, researchers at Carnegie Mellon University worked with scientists at a conservation ranch in Montana to develop a method that trains machine learning models to detect invasive species more effectively with limited data, allowing researchers to create AI tools that can detect and monitor invasive species.
The project currently focuses on leafy spurge, a noxious weed with small green flowers rapidly spreading across the pastures and grasslands of the Great Plains. It is toxic to livestock and can render whole hayfields inedible. Research into leafy spurge estimates that it causes more than $35 million in losses annually in the country's beef and hay production.
"These invasive plants are a serious problem," said Ruslan Salakhutdinov, a faculty member in CMU's School of Computer Science. "Leafy spurge can destroy the ecosystems around it. Building a machine learning tool to help identify it was tough because we didn't have massive amounts of data on this plant, even online. It became a problem of trying to build accurate models with limited data, and the solution has a big impact on the ecology and environment."
Salakhutdinov, the UPMC Professor of Computer Science in CMU's Machine Learning Department (MLD), worked on the project with Brandon Trabucco, an MLD doctoral student; Max Gurinas at Harvard University; and Kyle Doherty, a staff scientist at MPG Ranch, which manages more than 15,000 acres of conservation property in Western Montana for research.
Researchers in SCS wanted to leverage new generative AI tools to improve existing models trained to detect leafy spurge using existing drone images collected at MPG Ranch. Salakhutdinov and Trabucco also wondered if using synthetic images of leafy spurge made with AI could create the needed data to make the models work better.
The researchers introduced a method called DA-Fusion, which uses pretrained text-to-image diffusion models for data augmentation. The standard approach to data augmentation involves making simple changes to an image, like cropping or flipping it, to generate a new image from an existing one. But DA-Fusion changes the semantic content, which is a high-level feature like the image's object or scene context. For example, if an original image of leafy spurge showed the weed in a crop field, the augmented image might be the same weed in a different setting — among trees or grass at different times of the year, perhaps. Or it could be the original background with a different type of plant.
DA-Fusion created diverse training data of leafy spurge under various weather conditions, such as snow or during a spring bloom. This spared ecologists in Montana from going out to gather data in every weather condition.
"The costs to effectively manage a conservation ranch like MPG can be quite high, and a lot of that is due to the labor necessary to access remote areas and assess the presence of plants like leafy spurge," said Gurinas. "Machine learning techniques like the one we've developed allow a degree of automation that makes conservation efforts across wider areas more economically feasible."
By improving the diversity and quality of training data, researchers can improve the accuracy of machine learning models with fewer examples. Researchers agree that establishing this relationship between conservation scientists and machine learning researchers is critical for the future of agriculture and ecology.
"The exciting thing about the datasets that we're building is that they're unique. There aren't many ecological datasets out there for the machine learning community to sink their teeth into," said Doherty, who considered himself an "AI nerd" prior to the project, but expanded his knowledge while working with Salakhutdinov's lab. "I think people are interested in making an impact. You can solve the problems of restoration ecologists and combat climate change. It's meaningful work that's important to put out there."
Doherty and colleagues at MPG Ranch have made the leafy spurge dataset openfor the machine learning community to explore.
Researchers plan to continue this collaboration, working to gather more data and improve the identification of leafy spurge and other flora and fauna, such as black bears. By creating the datasets and making them publicly available, the team hopes collaborations between the two fields become more feasible. "These tasks are some of the most important we face as a society," Trabucco said. "Problems like leafy spurge are underserved, and maybe the advances that we see in machine learning can help us in the ways we've seen these tools solve other problems and unlock new abilities."
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu