|
LATANYA SWEENEY
Assistant Professor
Institute for Software Research International
www
Perhaps the biggest clash between technology and society involves
privacy. The task of maintaining privacy and confidentiality in a globally
networked, technically empowered society is quite difficult, tricky
and fun. Data privacy (or more precisely, data anonymity) is emerging
as a new study within computer science that is the study of computational
solutions for releasing information about entities (such as people,
companies, governments) such that certain properties (such as identity)
are controlled while the data remain practically useful. While these
problems have been studied, in part, by statisticians and earlier computer
scientists, their solutions have been rendered insufficient in today's
technically empowered society. So, in data anonymity, we develop new
approaches and tools for today's computational environment.
My colleages and I (in the Laboratory for International Data Privacy,
for which, I am the director) take a two-prong approach to data anonymity.
On the one hand, we work as data detectives and on the other hand,
we also work as data protectors.
As data detectives, we develop computer systems that learn information
about humans from disparate pieces of information. (This is often called
data re-identification and data fusion.) Work in this area includes
constructing sensitive profiles on individuals, companies or governments
from publicly and semi-publicly available data. We also do re-identification
experiments. Given data believed to be anonymous, we re-identify the
individuals by providing the identities of the people who are the subjects
of the data.
As data detectives, we learn the kinds of attacks that must be thwarted
to adequately protect data. So, as data protectors, we turn our experiences
around. We work on systems that properly de-identify the information
so that scientific guarantees can be made that no one can be re-identified.
I term this two-prong approach computational disclosure control, which
is the study of computational techniques for discovering and controlling
inferences that can be drawn from disclosed data such that the identity
of individuals and other entities contained in the released data cannot
be recognized even though the data remain practically useful.
|