Who shares your data?

A CMU-led research team is taking a hard look at those lengthy privacy policies that most of us ignore—and it wants to help us understand them

The research team includes Lorrie Cranor, CMU associate professor of computer science and engineering and public policy, Noah Smith, associate professor of language technologies and machine learning, and Norman Sadeh, a Carnegie Mellon professor of computer science and leader of the Usable Privacy Policy Project.

In a way, our daily computing experience is a lot like the old song, “Dem Bones.” The app is connected to the operating system, the operating system’s connected to the platform, the platform’s connected to the provider, and the provider is connected to the advertising network.

All of those connections, sharing our personal information—do we ever really wonder where it’s going, who might be using it, and for what purposes?

“In the old days, you bought a computer from Dell, Microsoft made the operating system, and Intuit made the accounting software,” says Travis Breaux, assistant professor in CMU’s Institute for Software Research. “These were all big companies, and you paid something for the software you used.”

Today, software—increasingly in the form of “apps”—is often free. Think of Facebook, or Google’s Gmail. But often, in exchange for using a free “app,” we give up some of our privacy, in the form of personal information that can be traded among companies that hope to market their products or services to us.  

Whether it’s through the apps they access from their smartphones, or their browsing activities on the Web, people are leaking information to a variety of players, often with very little understanding of what’s being collected, says Norman Sadeh, professor of computer science and director of CMU’s Mobile Commerce Laboratory. “Our studies have repeatedly shown that people are concerned about these practices, yet feel helpless when it comes to understanding what’s really happening, and what they can do to regain control over their information,” he says. (Editor’s Note: For more information on the same topic, see “Research Notebook” in this issue of The Link.)

Sadeh is leading a 42-month, $3.75 million research project to develop computer systems that can semi-automatically read website privacy policies and highlight their most important aspects. Ultimately, it could provide something like easy-to-understand letter grades for various aspects of privacy policies—sort of the way that Consumer Reports reviews cars and appliances.

Sponsored by the National Science Foundation through its Secure and Trustworthy Cyberspace program, the “Usable Privacy Policy Project” also includes Breaux; Noah Smith, CMU associate professor of language technologies and machine learning; Lorrie Cranor, CMU associate professor of computer science and engineering and public policy; Alessandro Acquisti, CMU associate professor of information technology and public policy; and law school researchers at Stanford and Fordham.

Prior research by Sadeh and his colleagues has shown that—although user privacy preferences can be fairly complex—there is often a relatively small number of considerations that matter most to them. Rather than attempt to automatically read and understand the full content of each website’s privacy policy, the research team will look for text relevant to those issues that people care the most about. Then, it will develop algorithms that can automatically or semi-automatically understand the assurances (or lack thereof) given by those privacy policies, and summarize their findings in a short, easy-to-digest format—possibly something as simple as a letter grade.

The Internet has made it possible to collect data about individuals on an unprecedented scale. Let’s say you use an app that helps you find restaurants. Things such as your location, your name, maybe even your phone book and email addresses, might go into that app’s database. Your location helps pinpoint nearby restaurants, while your contact list provides the app with the names of restaurants your friends also liked. From that data, the app can construct a pretty good profile of your tastes and recommend restaurants you’re likely to enjoy.

It seems innocuous. But someone with bad intentions—a burglar intent on breaking into your house, or a stalker trying to harass or harm you—could use the same information to figure out where you’re likely to be at any given time.

And even if you sign into applications using a throwaway email address, or don’t give your real name, it isn’t necessarily difficult to figure out your real identity; former SCS professor Latanya Sweeney, now at Harvard, proved in 2010 that the names of supposedly anonymous participants in medical research studies could be determined by comparing details of their treatments with publicly available data such as voter registration lists.

Corporations that run large app stores, such as Google’s Play or Apple’s iTunes, have detailed data-collection policies in place, but many independent developers lack the time, ability or interest to worry about user privacy. Even if they do, conflicting policies at different levels can lead to privacy leaks where information is shared without consent from users. “A single Facebook window could have three or more different privacy policies at work,” says Ashwini Rao, a Ph.D. student in CMU’s Institute for Software Research. “Individually, they may be OK, but when you put them together, you start seeing conflicts.”

Work recently done at Carnegie Mellon by Breaux and Rao illustrates the perils of conflicts between privacy policies. In a paper presented in July at the 21st IEEE International Requirements Engineering Conference in Rio de Janeiro, Breaux and Rao reported on their research into the privacy policies of Facebook, game developer Zynga and AOL, which places ads within Zynga games.

They found, for example, that Facebook’s policies tell app developers they cannot share user data with a third party advertisers, even if the user consents, but Zynga’s policies allow them to share user data with the user’s permission. And, Rao says, they found a hidden data flow between Zynga and AOL’s advertisers over which the user has little or no control.

To compare the privacy policies of Facebook, Zynga and AOL, Breaux and Rao manually mapped them from natural language to formal logic, then compared them using proofs. “We applied conventional techniques from programming to privacy policies,” Breaux says. “Our results show that developers can check their data practices for conflicts with third-party privacy requirements.”

Theoretically, developers could adapt Breaux and Rao’s technique to identify potential data leaks and privacy risks and mitigate them before releasing a new app to the public. But in practice, Breaux envisions that developers will someday use lightweight tools—perhaps similar to those being designed by the Usable Privacy Policy Project—to express their data practices while designing an application.

The goal isn’t to put roadblocks in the path of developers who want to roll out the “next big app.” Instead, it’s to give them—and end users—the tools they need to intelligently understand the implications of privacy policies. “Users are not willing to invest a lot of time learning how to use their privacy settings,” Sadeh says. “We need to empower them to make more informed decisions.”

—Jason Togyer (DC’96) is editor of The Link. He still doesn’t own a smart phone.

For More Information: 

Jason Togyer | 412-268-8721 | jt3y@cs.cmu.edu