Jaspreet Bhatia

jbhatia@cs.cmu.edu
4208 Wean Hall, Carnegie Mellon University

Home
Publications
Research
Teaching

Semantic Incompleteness in Privacy Goals

Companies that collect personal information online often maintain privacy policies that are required to accurately reflect their data practices and privacy goals. To be comprehensive and flexible for future practices, policies contain ambiguity. Ambiguity in data practice descriptions undermines policies as an effective way to communicate system design choices to users. In this paper, we report an investigation to identify incompleteness by representing data practice descriptions as semantic frames. The approach is a grounded analysis to discover which semantic roles corresponding to a data action are needed to construct complete data practice descriptions. Our results include 698 data action instances obtained from 949 manually annotated statements across 15 privacy policies and three domains: health, news and shopping. We identified 2,316 instances of 17 types of semantic roles and found that the distribution of semantic roles across the three domains were similar. Incomplete data practice descriptions can affect the user’s perceived privacy risk, which we measure using factorial vignette surveys. We observed that user risk perception decreases when two roles are present in a statement: the condition under which a data action is performed, and the purpose for which the user’s information is used.

[Paper]

This work is supported by NSF Frontier Award #1330596 and NSF CAREER Award #1453139.

Privacy Goal Mining

Privacy policies describe high-level goals for corporate data practices. We have developed a semi-automated framework that combines crowdworker annotations, natural language typed dependency parses, and a reusable lexicon to improve goal-extraction coverage, precision, and recall. Our results show that no single framework element alone is sufficient to extract goals; however, the overall framework compensates for elemental limitations. Human annotators are highly adaptive at discovering annotations in new texts, but those annotations can be inconsistent and incomplete; dependency parsers lack sophisticated, tacit knowledge, but they can perform exhaustive text search for prospective requirements indicators; and while the lexicon may never completely saturate, the lexicon terms can be reliably used to improve recall.

[Paper]

This work is supported by NSF Frontier Award #1330596. For more details about this project, please visit our Usable Privacy Project Website.

Vagueness in Privacy Policies

Vagueness undermines the ability of organizations to align their privacy policies with their data practices, which can confuse or mislead users thus leading to an increase in privacy risk. We have developed a theory of vagueness for privacy policy statements based on a taxonomy of vague terms derived from an empirical content analysis of privacy policies. The taxonomy was evaluated in a paired comparison experiment and results were analyzed using the Bradley-Terry model to yield a rank order of vague terms in both isolation and composition. The theory predicts how vague modifiers to information actions and information types can be composed to increase or decrease overall vagueness. We further provide empirical evidence based on factorial vignette surveys to show how increases in vagueness will decrease users' acceptance of privacy risk and thus decrease users' willingness to share personal information.

[Paper]

This work is supported by NSF Award #1330596, NSF Award #1330214 and NSA Award #141333.

Empirical Measurement of Perceived Privacy Risk

Personal data is increasingly collected and used by companies to tailor services to users, and to make financial, employment and health-related decisions about individuals. When personal data is inappropriately collected or misused, however, individuals may experience violations of their privacy. Despite to the recent shift toward a risk-managed approach for privacy, there are to our knowledge no empirical methods to determine which personal data is most at-risk. We conducted a series of experiments to measure perceived privacy risk, which is based on expressed preferences and which we define as an individual's willingness to share their personal data with others given the likelihood of a potential privacy harm. These experiments control for one or more of the six factors affecting an individual's willingness to share their information: data type, discomfort associated with the data type, data purpose, privacy harm, harm likelihood, and individual demographic factors such as age range, gender, education level, ethnicity and household income. To measure likelihood, we adapt Construal Level Theory from psychology to frame individual attitudes about risk likelihood based on social and physical distances to the privacy harm. The findings include predictions about the extent to which the above factors correspond to risk acceptance, including that perceived risk is lower for induced disclosure harms when compared to surveillance and insecurity harms as defined in Solove's Taxonomy of Privacy. In addition, we found that likelihood was not a multiplicative factor in computing privacy risk perception, which challenges conventional concepts of privacy risk in the privacy and security community

[Paper]

This work is supported by NSF Award CNS‐1330596, NSA Award #141333, and ONR Award #N00244‐16‐1‐0006

Jaspreet Bhatia

jbhatia@cs.cmu.edu 4208 Wean Hall, Carnegie Mellon University

jbhatia@cs.cmu.edu
4208 Wean Hall, Carnegie Mellon University