Information Flow Experiments

Determining Information Usage from the Outside

Using our rigorous statistical methodology, we have analyzed ads served by Google. We explored how they are related to the interests Google claims to infer about people at its Ad Settings webpage. We found

  1. Discrimination: gender-based discrimination in job-related ads
  2. Opacity: browsing substance abuse websites leads to rehab ads despite Google's own Ad Settings showing no evidence of such tracking
  3. Choice: Google's Ad Settings allows some control over the ads you see

We detail these results and our larger research program below.

Discrimination

We went to Google's Ad Settings, a webpage that provides some information about and control over the profile that Google maintains on you. (Click here to see what Google has on you.)

Google Ad Settings

Over hundreds of browsers, we randomly edited the profile to be either “female” or “male” and visited job-related websites. We found that the “male” instances were much more likely to receive ads promoting high paying jobs than the “female” instances.

Top ads for identifying the female group
Times shown to    
Ad TitleAd URLFemalesMales
Jobs (Hiring Now)www.jobsinyourarea.co458
4Runner Parts Servicewww.westernpatoyotaservice.com365
Criminal Justice Programwww3.mc3.edu/Criminal+Justice291
Goodwill - Hiringgoodwill.careerboutique.com12139
UMUC Cyber Trainingwww.umuc.edu/cybersecuritytraining3830
Top ads for identifying the male group
Times shown to    
Ad TitleAd URLFemalesMales
$200k+ Jobs - Execs Onlycareerchange.com3111816
Find Next $200k+ Jobcareerchange.com736
Become a Youth Counselorwww.youthcounseling.degreeleap.com0310
CDL-A OTR Trucking Jobswww.tadrivers.com/OTRJobs08
Free Resume Templatesresume-templates.resume-now.com810

Opacity

We had browsers randomly either go to websites associated with substance abuse or not. Those that did received ads for a rehab center while those that didn't did not.

Rehab Ad
Top ads for identifying browsers that visited websites associated with substance abuse
Times shown to    
Ad TitleAd URLNo Substance AbuseSubstance Abuse
The Watershed Rehab
www.thewatershed.com/Help
02276
Watershed Rehab
www.thewatershed.com/Rehab
0362
The Watershed Rehab(none)0771
Veteran Home Loans
www.vamortgagecenter.com
2233
CAD Paper Rolls
paper-roll.net/Cad-Paper
021
Top ads for identifying browsers that did not visit the websites
Times shown to    
Ad TitleAd URLNo Substance AbuseSubstance Abuse
Alluria Alert
www.bestbeautybrand.com
90
Best Dividend Stocks
dividends.wyattresearch.com
5424
10 Stocks to Hold Forever
www.streetauthority.com
11876
Delivery Drivers Wanted
get.lyft.com/drive
5414
VA Home Loans Start Here
www.vamortgagecenter.com
419

Despite this clear change in the ads shown, Google's Ad Settings showed no inferred interests. The transparency tool was opaque!

Similarly, visiting websites associated with disabilities resulted in more ads related to disabilities (such as for www.abilitiesexpo.com). This time Ad Settings did show inferred interests, but none were related to disabilities.

Choice

We found that removing interests from the Ad Settings decreased the number of ads received related to that interest. In particular, we had a set of web browsers visit websites related to online dating. We randomly selected half of the browsers and removed any interests related to online dating from them. The browsers that kept the interests were differentiated from those with the interests removed by the presence of dating ads.

Top ads for identifying the group that kept dating interests
Times shown to    
Ad TitleAd URLKeptRemoved
Are You Single?
www.zoosk.com/Dating
243378
Top 5 Online Dating Sites
www.consumer-rankings.com/Dating
40813
Why can't I find a date?
www.gk2gk.com
515
Latest Breaking News
www.onlineinsider.com
61
Gorgeous Russian Ladies
anastasiadate.com
210
Top ads for identifying agents in the group that removed dating interests
Times shown to    
Ad TitleAd URLKeptRemoved
Car Loans w/ Bad Credit
www.car.com/Bad-Credit-Car-Loan
837
Individual Health Plans
www.individualhealthquotes.com
2146
Crazy New Obama Tax
www.endofamerica.com
2251
Atrial Fibrillation Guide
www.johnshopkinshealthalerts.com
025
Free $5 - $25 Gift Cards
swagbucks.com
532

In our experiments paper, we present these and other results in detail.

For each of the experiments above, we used machine learning to find the differences in ads. The machine learning algorithm created a classifier that identifies which experimental group each browser belongs to based upon the ads it receives. The tables above show the ads most used by the classifier in this task.

We used a second round of data collection and statistical tests to validate that statistically significant effects exist for each of these experiments. Our statistical tests looked at the accuracy of the classifier and computed a p-value that quantifies how statistically significant the results are. P-values below 0.05 are typically considered significant. Ours are much smaller; it's very certain that the behaviors we experimented with actually caused the observed ads.

Statistical Significance of Results
ExperimentClassifier AccuracyP-Value
Discrimination93%0.0000053
Opacity81%0.0000053
Choice74%0.0000053

We used permutation testing, a statistical approach that avoids making questionable assumptions common in other works. For more information about our statistical analysis, see our methodology paper.

Information flow analysis has largely ignored the setting where the analyst has neither control over nor a complete model of the analyzed system. We formalize such limited information flow analyses and study an instance of it: detecting the usage of data by websites. We prove that these problems are ones of causal inference. Leveraging this connection, we push beyond traditional information flow analysis to provide a systematic methodology based on experimental science and statistical analysis. Our methodology allows us to systematize prior works in the area viewing them as instances of a general approach and to develop a statistically rigorous tool, AdFisher, for detecting information usage.

AdFisher uses machine learning to automate the selection of a statistical test. We use it to find that Google's Ad Settings is opaque about some features of a user's profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse will change the ads shown but not the settings page. We also found that setting the gender to female results in getting fewer instances of an ad related to high paying jobs than setting it to male.

We make our tool, AdFisher, freely available on Github at https://github.com/tadatitam/info-flow-experiments.

The code used for running our experiments and the raw data from them are available below with each publication that details the results.

Amit Datta, Michael Carl Tschantz, and Anupam Datta
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination
Privacy Enhancing Technologies Symposium (PETS) 2015
Read the paper: official version, preprint
Tech report arXiv:1408.6491: version 1, version 2
Download the code and raw data: version 1, version 2
Read additional details here
Michael Carl Tschantz, Amit Datta, Anupam Datta, and Jeannette M. Wing
A Methodology for Information Flow Experiments
The IEEE Computer Security Foundations Symposium (CSF) 2015
Read the paper: official version, preprint
Tech report arXiv:1405.2376
Read the TR here
Download the code and raw data here
Amit Datta, Anupam Datta, Suman Jana, and Michael Carl Tschantz
Poster: Information Flow Experiments to study News Personalization
Poster at the IEEE Symposium on Security and Privacy, 2015
Read the paper: official abstract, preprint
Michael Carl Tschantz, Anupam Datta, and Jeannette M. Wing
Information Flow Investigations
CMU Tech Report CMU-CS-13-118
Read the paper here
Michael Carl Tschantz, Anupam Datta, and Jeannette M. Wing
Information Flow Investigations: Extended Abstract
Abstract for 5-Minute Talk at CSF 2013
Read the paper here
Al Jazeera America, MIT Tech Review, Pittsburgh Post-Gazette, Yahoo Tech, Wired, NYT's TheUpshot, Washington Post's The Intersect