Mahadev Satyanarayanan --- better known as Satya --- is the Carnegie Group Professor of Computer Science. Founding director of Intel Research Pittsburgh, he also was a principal architect and implementer of the Andrew File System (AFS).Much of Satya's research focuses on mobile and pervasive computing --- the ability to move from place to place and platform to platform seamlessly. One outcome of this work was Coda, an open-source system to support distributed file access on intermittent networks. Key ideas from Coda were incorporated into Windows 2000 and Outlook 2003.Satya's most recent research in this area has been the Internet Suspend/Resume system, or ISR, which allow users to leave one computer and pick up their work at the exact same place on another computer. He spoke with Managing Editor Jason Togyer:
How did Coda evolve from AFS?
As you become more dependent on a distributed file system such as AFS, your inconvenience level becomes higher when one of the servers goes down. We wanted to preserve the positive features of AFS, but make it more resistant to failure. The answer was the disconnected operation mechanism in Coda --- the forerunner of today's hot-sync mechanisms --- which caches server data on local machines so users can continue working even when their network connections are lost. At the time, laptops were just starting to appear, and I realized that Coda was a perfect fit for the world of wireless computing --- even today, wireless networks are spotty.
After we published a paper describing the evolution of Coda from AFS, a Japanese magazine, Nikkei Electronics, asked if they could translate the article. Of course, I can't read Japanese, so how could I know if the translation was accurate?
Well, they added cartoons by a Japanese artist, Gaich Muramatsu. One half of the first cartoon shows unhappy AFS users dealing with a broken network connection to the servers. The other half shows happy Coda users still working. I could see just from the cartoons that everyone understood it --- they captured the idea perfectly.
How does ISR build on Coda?
If you run Windows XP at home, and move to a machine using Windows Vista, you don't get the same user experience. Why not? At all times, you should have one world you're dealing with, not different worlds on different machines.
Most people think of cloud computing as data being stored in the cloud. With ISR, the entire machine is stored in the cloud, and if the right business model can be constructed, people would treat computers like furniture. If you check into a hotel, you don't have to bring your own furniture. Why should you lug around your computer? Just have your world magically appear as you access parts of it at any loaner computer.
Are there issues with platform portability?
Yes, but just about all of today's laptops and desktops use Intel processors, so that can be handled through virtual machine technology. I'd like to see ISR deployed at campus scale. All it takes are the resources to make that happen --- I see no technical challenges, but we need more funding.
Does Diamond also build on Coda and AFS?
Diamond is completely unrelated! When I started the Intel lab, I asked myself: "Data caching is a powerful technique in hardware and software and on the Web, but is there any circumstance in which caching data is not useful?"
It turns out that caching and indexing work well when you're searching text, but not when you're searching rich content, such as images. Creating an index ahead of time for all conceivable queries you might pose about very rich data is almost impossible.
But caching the results of searches is useful. So rather than building an index, Diamond does discard-based search. For example, you highlight a portion of an image and tell Diamond to find similar images. You discard the results that don't fit your query, refine your query, then search again. Since the refined query typically has significant overlap with the previous queries, Diamond can reuse cached results from those queries.
Can Diamond be extended to data besides images?
Diamond knows nothing about the underlying data content. We've recently extended it to live data sets from webcams, and you could also use it to sort audio. You couldn't just throw sound at it and say "search" --- you'd have to write some sound-specific plug-in for that --- but once you did, it would do the search very efficiently.
What are some practical applications?
The medical and pharmaceutical communities have shown a lot of interest. Let's say you have a mammogram, and a tissue mass causes you concern. Should you call the patient in for a biopsy? With Diamond, you could highlight a portion of the image, ask it to show you other patients who had the same kinds of masses, and find out what their biopsy results were. Our goal here isn't to replace doctors, but to provide a Google-like tool that helps doctors with decision-making.
One application we've built on Diamond is called FatFind --- it automatically counts fat cells, which is useful in automating drug discovery. Another is called PathFind, which allows pathologists to rapidly search through stained tissue samples.
(John Barna photo)
Jason Togyer | 412-268-8721 | jt3y [atsymbol] cs.cmu.edu