Gross and Acquisti, Workshop-Privacy in Electronic Society 2005

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search
Created and maintained by Sachin Agarwal (User:Sachina)
HomePage: Sachin Agarwal  Contact me

Related Page Social Networking and Privacy

Gross, R. and Acquisti, A., Information Revelation and Privacy in Online Social Networks. in Workshop on Privacy in the Electronic Society, (Alexandria, VA, 2005), ACM Press.


Information Revelation and Privacy in Online Social Networks

This paper summarizes the patterns of information revelation in online social networks and their privacy implications. The authors present a comprehensive analysis of the data obtained from the profiles of 4,000 Carnegie Mellon students on Facebook, an online social networking site. They analyze the amount of information disclosed by these profiles and study the usage of privacy settings provided by Facebook. The authors also pinpoints the possible privacy attacks on such profiles and analyze the numbers related to the users who make use of the privacy preferences provided by the social networking site.

Information Revelation and Online Social Networking

The basic purpose of online social networking sites (usually) is to enable online interaction and communication among people but there might be a variety of usage patterns. Most of these social networking sites use a model of showing the user profiles to others on the site who can also visualize the network of relations with other users on the site. The author discusses the following patterns of revelation of personal information:

  1. The pretense of identifiability: The profile individual user is to be identified using some means of profile ID. Genarally, the social networking sites encourage the use of real names to identify the profile owner. The encouragemnet may be enforced by the website using technical specifications, registration requirements, etc.) like in Facebook. In case of dating sites, these real names may be filtered , thus creating a sort of weak pseudonymity between actual person and his/her online identity (e.g., Friendster). Such sites tries to achieve this by showing only the first name of the profile owner. Some other social nwtworking sites may completely discourage the use of real names (for e.g., In spite of the different approaches to identifiability, most of these sites encourage the users to put up their personal photographs in their profiles.
  2. The type of revealed information: The type of information revealed may rotate around the habits, hobbies or interests of the profile owners. There might be various other directions of revelation which might be the semi-public information (current and previous schools and employers), private information (drinking and drug habits, sexual orientation) and open-ended entries.
  3. Visibility of Information: The visibility of information is usually quite variable. Some social networking sites enable user to view any member's profile. Some sites may restrict access to the members who are on the direct or extended network of the profile owner.

In spite of the risks, people are usually eager to disclose maximum amount of their information to maximum number of other members of the site even after reading the recommendation of the hosting service.

Social Network Theory and Privacy

The definition of privacy varies from individual to individual and also depends on the person's social network. In some situations,a person may only be willing to share his information among his close friends while in other situations, a person may be willing to share it with anonymous strangers but not with the ones who know him/her better.

The relevance of relations of different depth and strength in a person’s social network has been discussed by social network theorists in [ Granovetter, 1973 ] and [ Granovetter, 1983 ] and the importance of weak ties in the flow of information across different nodes in a network. Network theory has also been used to explore how distant nodes can get interconnected through relatively few random ties ([Milgram, 1967], [ Milgram, 1977 ] and [Watts, 2003]).

[ Strahilevitz, 2004 ] presents that the consideration of how information is expected to flow from node to node in somebody’s social network should also inform that person’s expectations for privacy of information revealed in the network. When the social network theory is applied to information revelation study in online social networks, the following differences in online and offline scenarios come into picture:

  1. The offline social network ties are quite diverse in terms of how close and intimate a subject perceives a relation to be. The online social networks usually treat these connections as binary. [ Boyd, 2004 ] notes that “there is no way to determine what metric was used or what the role or weight of the relationship is. While some people are willing to indicate anyone as Friends, and others stick to a conservative definition, most users tend to list anyone who they know and do not actively dislike. This often means that people are indicated as Friends even though the user does not particularly know or trust the person”.
  2. [ Donath, 2004 ] note that “the number of weak ties one can form and maintain may be able to increase substantially, because the type of communication that can be done more cheaply and easily with new technology is well suited for these ties”.
  3. While an offline social network may include up to a dozen of intimate or significant ties and 1000 to 1700 “acquaintances” or “interactions” ([ Donath, 2004 ] and [ Strahilevitz, 2004 ]), an online social networks can list hundreds of direct “friends” and include hundreds of thousands of additional friends within just three degrees of separation from a subject.

All the above points show that online social networks are vaster and have weaker ties than an offline social network.

Privacy Implications

Authors state that privacy implications mainly depend on the level of identifiability of information, its recipients and uses. The profile owner may be identified even by using the meager information provided by the social networking site. The user's usually put same or similar photographs on different social networking sites which could be used to identify the profile owner using multiple sites on which the common user holds an account. Similarly other information like demographic data, category-based representation of interests can be used to reveal identity. According to author, there are a few questions to ponder:

  1. To whom may identifiable information be made available? - The hosting site (who may use and extend the information), within the network (whose extension in time may not be completely known to user). The easiness of joining and lack of security measures may pose a threat to a person's information on social networking sites which may be accessible to third parties whom the user may not want to know.
  2. How can the information be use? - Identity theft, physical stalking, embarrassment, price discrimination, blackmailing.


Distribution of CMU Facebook profiles for different user categories(Gross and Acquisti, 2005)
Distribution of CMU Facebook profiles for different user categories(Gross and Acquisti, 2005)
Gender distribution for different user categories(Gross and Acquisti, 2005)
Gender distribution for different user categories(Gross and Acquisti, 2005)
Age distribution of Facebook profiles at CMU(Gross and Acquisti, 2005)
Age distribution of Facebook profiles at CMU(Gross and Acquisti, 2005)
Percentages of CMU profiles revealing various types of personal information(Gross and Acquisti, 2005)
Percentages of CMU profiles revealing various types of personal information(Gross and Acquisti, 2005)

Authors present a detailed analysis of data obtained from profiles of Carnegie Mellon students from (a social networking site) and presents the information revelation concerns and potential privacy attacks using this data. The (at the time of this paper writing) is a college-oriented social networking site which spread to "573 campuses and 2.4 million users". It requires a college email account for a person to join a college network on Facebook. The members can have a very granular control over the searcjabiity and visibility of personal information. The default settings make it searchable by anyone in school network on Facebook and the content is visible to all users on the school network by default. The Facebook also collects IP address etc, which may also be shared with third parties.

The authors downloaded (in July 2005) 4050 profiles from the CMU network on Facebook for analysis. The data shows that majority of users are undergraduate students and the majority of users are male. These have been shown in Figures. The other related figures are as follows: 90.8% contain images, 87.8% reveal birth date, 39.9% mention phone number and 50.8% list current residence.62.9% displayed their relationship status. The phone number was displayed by substantially more number of males(47.1%) than females(28.9%). The Facebook also allows viewers to see the first and last name of the profile owner.

Authors also analyzed the data validity and identifiability. For profile names, they categorized them into 3 categories: Real name (which appears to be real and accounted for 89% in data), Partial Name(when only first name was given and accounted for 3% in data) and Fake Name(which seemed obviously fake and accounted for 8% in data). Data analysis also showed that only 1.2% of users (18 female and 45 male) actually changed the default searchable settings from "everyone " to "only being searched by CMU profile members". Only 0.06%(3 profiles) changed profile visibility by restricting access only to CMU users. Using the above numbers, authors makes a very strong statement saying, "information suitable for creating a brief digital dossier consisting of name, college affiliation, status and a profile image can be accessed for the vast majority of Facebook users by anyone on the website". Additional personal data - such as political and sexual orientation, residence address, telephone number, class schedule, etc. are made available by the majority of users to anybody else at the same institution. This increases the chances of obtaining even temporary control of an institution’s single email address.

Overview of the privacy risks and number of CMU profiles susceptible to it(Gross and Acquisti, 2005)
Overview of the privacy risks and number of CMU profiles susceptible to it(Gross and Acquisti, 2005)

The various privacy implications of such high amount of data accessibility from Facebook, that have been discussed in the paper are as follows:

  1. Stalking: The large amount of information available on the profiles make the members susceptible to potential adversary to know the whereabouts of a person for a good amount of time for the day. By knowing about the residence as well as the classes the student is taking during the semester, anyone can easily prepare the duplicate schedule to know the whereabouts. Another potential affect is cyber-stalking using AOL messenger which allows users to add "buddies" without confirmation. Data shows that 77.7% of profiles had there AOL userID online on Facebook.
  2. Re-identification: This mainly delas with linking the dataset without explicit identifiers like name and address using a dataset with common attributes. Large number of people in Facebook dataset disclosed their full birth date and gender on their profiles (88.8%). 50.8% revealed their address and ZIP code. Any potential adversary may use this information to de-identify the sensitive data like hospital records. Similarly the face re-identification may result in linking different profiles on different websites. Another dangerous impact could be the SSN and identity theft using the address, name, birth date, ZIP code etc. type of information available on the social networking sites.
  3. Building a digital dossier: The large amount of information available from the user profiles may be used to build a digital dossier which may have a long term impact on the members when they might enter sensitive jobs and their past information is still available somewhere.
  4. Fragile privacy protection: The possibly weak protection mechanism on social networking sites may pose a threat to identities which might be turned into a public data by some hacker. Hackers can pose a threat of fake email address, manipulating the users or even change the advanced search features in profiles making the whole information public.

Related links and References

Personal tools
  • Log in / create account