Information Flow Experiments
Determining Information Usage from the Outside
Using our rigorous statistical methodology, we have analyzed ads served by Google. We explored how they are related to the interests Google claims to infer about people at its Ad Settings webpage. We found
- Discrimination: gender-based discrimination in job-related ads
- Opacity: browsing substance abuse websites leads to rehab ads despite Google's own Ad Settings showing no evidence of such tracking
- Choice: Google's Ad Settings allows some control over the ads you see
We detail these results and our larger research program below.
Findings
Discrimination
We went to Google's Ad Settings, a webpage that provides some information about and control over the profile that Google maintains on you. (Click here to see what Google has on you.)
Over hundreds of browsers, we randomly edited the profile to be either “female” or “male” and visited job-related websites. We found that the “male” instances were much more likely to receive ads promoting high paying jobs than the “female” instances.
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | Females | Males |
Jobs (Hiring Now) | www.jobsinyourarea.co | 45 | 8 |
4Runner Parts Service | www.westernpatoyotaservice.com | 36 | 5 |
Criminal Justice Program | www3.mc3.edu/Criminal+Justice | 29 | 1 |
Goodwill - Hiring | goodwill.careerboutique.com | 121 | 39 |
UMUC Cyber Training | www.umuc.edu/cybersecuritytraining | 38 | 30 |
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | Females | Males |
$200k+ Jobs - Execs Only | careerchange.com | 311 | 1816 |
Find Next $200k+ Job | careerchange.com | 7 | 36 |
Become a Youth Counselor | www.youthcounseling.degreeleap.com | 0 | 310 |
CDL-A OTR Trucking Jobs | www.tadrivers.com/OTRJobs | 0 | 8 |
Free Resume Templates | resume-templates.resume-now.com | 8 | 10 |
Opacity
We had browsers randomly either go to websites associated with substance abuse or not. Those that did received ads for a rehab center while those that didn't did not.
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | No Substance Abuse | Substance Abuse |
The Watershed Rehab | www.thewatershed.com/Help | 0 | 2276 |
Watershed Rehab | www.thewatershed.com/Rehab | 0 | 362 |
The Watershed Rehab | (none) | 0 | 771 |
Veteran Home Loans | www.vamortgagecenter.com | 22 | 33 |
CAD Paper Rolls | paper-roll.net/Cad-Paper | 0 | 21 |
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | No Substance Abuse | Substance Abuse |
Alluria Alert | www.bestbeautybrand.com | 9 | 0 |
Best Dividend Stocks | dividends.wyattresearch.com | 54 | 24 |
10 Stocks to Hold Forever | www.streetauthority.com | 118 | 76 |
Delivery Drivers Wanted | get.lyft.com/drive | 54 | 14 |
VA Home Loans Start Here | www.vamortgagecenter.com | 41 | 9 |
Despite this clear change in the ads shown, Google's Ad Settings showed no inferred interests. The transparency tool was opaque!
Similarly, visiting websites associated with disabilities resulted in more ads related to disabilities (such as for www.abilitiesexpo.com). This time Ad Settings did show inferred interests, but none were related to disabilities.
Choice
We found that removing interests from the Ad Settings decreased the number of ads received related to that interest. In particular, we had a set of web browsers visit websites related to online dating. We randomly selected half of the browsers and removed any interests related to online dating from them. The browsers that kept the interests were differentiated from those with the interests removed by the presence of dating ads.
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | Kept | Removed |
Are You Single? | www.zoosk.com/Dating | 2433 | 78 |
Top 5 Online Dating Sites | www.consumer-rankings.com/Dating | 408 | 13 |
Why can't I find a date? | www.gk2gk.com | 51 | 5 |
Latest Breaking News | www.onlineinsider.com | 6 | 1 |
Gorgeous Russian Ladies | anastasiadate.com | 21 | 0 |
Times shown to | |||
---|---|---|---|
Ad Title | Ad URL | Kept | Removed |
Car Loans w/ Bad Credit | www.car.com/Bad-Credit-Car-Loan | 8 | 37 |
Individual Health Plans | www.individualhealthquotes.com | 21 | 46 |
Crazy New Obama Tax | www.endofamerica.com | 22 | 51 |
Atrial Fibrillation Guide | www.johnshopkinshealthalerts.com | 0 | 25 |
Free $5 - $25 Gift Cards | swagbucks.com | 5 | 32 |
In our experiments paper, we present these and other results in detail.
Methodology
For each of the experiments above, we used machine learning to find the differences in ads. The machine learning algorithm created a classifier that identifies which experimental group each browser belongs to based upon the ads it receives. The tables above show the ads most used by the classifier in this task.
We used a second round of data collection and statistical tests to validate that statistically significant effects exist for each of these experiments. Our statistical tests looked at the accuracy of the classifier and computed a p-value that quantifies how statistically significant the results are. P-values below 0.05 are typically considered significant. Ours are much smaller; it's very certain that the behaviors we experimented with actually caused the observed ads.
Experiment | Classifier Accuracy | P-Value |
---|---|---|
Discrimination | 93% | 0.0000053 |
Opacity | 81% | 0.0000053 |
Choice | 74% | 0.0000053 |
We used permutation testing, a statistical approach that avoids making questionable assumptions common in other works. For more information about our statistical analysis, see our methodology paper.
Research Abstract
Information flow analysis has largely ignored the setting where the analyst has neither control over nor a complete model of the analyzed system. We formalize such limited information flow analyses and study an instance of it: detecting the usage of data by websites. We prove that these problems are ones of causal inference. Leveraging this connection, we push beyond traditional information flow analysis to provide a systematic methodology based on experimental science and statistical analysis. Our methodology allows us to systematize prior works in the area viewing them as instances of a general approach and to develop a statistically rigorous tool, AdFisher, for detecting information usage.
AdFisher uses machine learning to automate the selection of a statistical test. We use it to find that Google's Ad Settings is opaque about some features of a user's profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse will change the ads shown but not the settings page. We also found that setting the gender to female results in getting fewer instances of an ad related to high paying jobs than setting it to male.
Software
We make our tool, AdFisher, freely available on Github at https://github.com/tadatitam/info-flow-experiments.
The code used for running our experiments and the raw data from them are available below with each publication that details the results.
Publications
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination Privacy Enhancing Technologies Symposium (PETS) 2015 Read the paper: official version, preprint Tech report arXiv:1408.6491: version 1, version 2 Download the code and raw data: version 1, version 2 Read additional details here |
A Methodology for Information Flow Experiments The IEEE Computer Security Foundations Symposium (CSF) 2015 Read the paper: official version, preprint Tech report arXiv:1405.2376 Read the TR here Download the code and raw data here |
Poster: Information Flow Experiments to study News Personalization Poster at the IEEE Symposium on Security and Privacy, 2015 Read the paper: official abstract, preprint |
Information Flow Investigations: Extended Abstract Abstract for 5-Minute Talk at CSF 2013 Read the paper here |
Press Coverage
People
- Amit Datta, grad student, Carnegie Mellon University, amitdatta@cmu.edu
- Anupam Datta, Associate Professor, Carnegie Mellon University, danupam@cmu.edu
- Michael Carl Tschantz, Researcher, International Computer Science Institute, mct@icsi.berkeley.edu
- Jeannette M. Wing, Corporate Vice President, Microsoft Research, wing@microsoft.com