For most parents, videos of the baby are a one-way affair: they might shoot hours of their newborn cooing or squirming or crawling, but even the proudest mom, dad or grandparent has better things to do than actually watch all those hours in which, let’s be honest, there’s not much happening.
When Eric Xing, a professor of machine learning at CMU, had a baby five years ago, he knew that even in small iPhone doses, his videos weren’t very satisfying to show off.
“The viewer and I needed to wait anywhere from 10 seconds to a minute” for the interesting parts of a video to appear, he recalls. “And then they would laugh or they would cheer, and they got satisfaction. But that’s not a very interesting experience, because most of the time we were waiting for the boring part in the videos to pass.”
Unlike most parents, Xing was ready to tackle the problem. For more than two years, he worked with Ph.D. student Bin Zhao (CS’11) to study the challenges of automatically editing videos down to only the most interesting parts.
“This has been a primary problem in computer vision,” Xing says. Previous attempts have been based on the assumption that the whole video is already shot and available for analysis—but Xing says machine learning offers a different, more efficient approach.
Their technology, called LiveLight, “basically watch(es) the movie with you, and while it’s watching, it has this human-like intelligence to establish a dictionary of what might be interesting, what might be boring, and then based on that, makes a judgment on the subsequent segments in the video.” In November, LiveLight was selected for Popular Science magazine’s “Best of What’s New” as one of the top 100 technological innovations of 2014. (Four of the inventions on the list, including LiveLight, originated at SCS.)
The exact nature of the algorithm behind LiveLight, Xing says, is the “special sauce” on which he’s obtained a patent. “While you’re watching the movie,” he says, “you basically are extracting key features from the movie—an interesting set of actions, or changes taking place during the movie, during the video.” Unlike earlier attempts, Xing and Zhao’s algorithm learns more about what’s “interesting” the more it watches by building a dictionary to describe what it’s seeing. “Once you use this memory to watch for the next few frames, you are going to be able to tell whether they’re similar to what you’ve watched before or whether it’s never been seen before in the dictionary…and then when you see something that is indeed very boring or repetitious, you are going to be able to label it as not useful, and you are going to delete it.”
As it’s learning, LiveLight can automatically edit the “interesting” parts together into a finished video, or it can offer a human editor a selection of “interesting” and “not interesting” moments to pick from.
Xing and Zhao quickly realized the technology was useful for more than just baby videos.
“We see a lot of long videos produced every day,” Xing says—not just endless hours of footage from surveillance cameras, but also streams of video being captured by GoPro cameras, Google Glass and other mobile video technology.
“It is very, very time-consuming for anyone to watch those videos and get the interesting moments out of them,” he says. “And suppose you are a Google Glass user and you turn your camera on to record, say, an hour’s worth of video. You know, just to upload all this information to some cloud server is a pretty expensive consumption of bandwidth. If an automatic program can carve that stuff out of the final version of the video, it will result in a shorter video that can be uploaded in a more economical way.”
Testing and refining the algorithm required an ample supply of hours of mostly-boring video, and Xing found it in abundance in various places, including “arbitrary movies from YouTube.” After more than two years of development, Xing and Zhao were confident enough in LiveLight to post several demonstration videos and to present the technology at the IEEE Computer Vision and Pattern Recognition Conference in Columbus, Ohio.
They’re already working to commercialize LiveLight through a startup, PanOptus Inc., for which Zhao is serving as CEO.
“CMU has been an extremely friendly place for entrepreneurship,” Xing says. While PanOptus hasn’t yet signed any firm deals to put its technology into the marketplace, “we’ve had some serious customers who want to have our technology, and also some investors who want to put investments on that.”
In a world where most searches still involve text, Xing says LiveLight “is basically among the very few initial instances of doing serious machine learning on imagery data and visual data. It’s asking how to understand what people see instead of what people write or speak…and if there is a way to allow people to understand images and videos and be able to distill useful information out of them, I think that will really change the way people down the road communicate with each other and entertain each other.”
PanOptus isn’t the only company pursuing automated editing and interpretation of lengthy videos. John Sepassi, sales manager at IntelliVision, says his company is one of the survivors from an initial wave of interest in the field that immediately followed the 9/11 attacks. While he isn’t familiar with Xing’s LiveLight technology, Sepassi says demand for the sort of product he’s offering is growing. And as embedded processors become more and more powerful, he predicts the technology is going to be incorporated right into the cameras, “instead of being PC or server based.” His company’s clients are especially interested in real-time alerts when a surveillance camera picks up something unusual; that function, he says, is more desirable in the marketplace right now than the ability LiveLight offers to deliver an edited video after the fact.
As for Xing’s original subject, now five and half years old, “he was very amused and proud” about being featured in the demonstration video for LiveLight. Xing uses the technology on his own iPhone, and he’s not shy about showing off his son’s video antics now that they’re being automatically edited.
“Even though it’s not perfect,” he says of his 30-second clips, “at least I’m not wasting people’s time.”
—Broadcaster and freelance writer Scott Fybush is based in Rochester, N.Y., where he operates NorthEast Radio Watch. This is his first Link byline.
The Link Magazine | 412-268-8721 | email@example.com