Privacy Preserving Speech Processing
A Tricky Issue
The following headline appeared recently on ABC news: ``Exclusive: Inside Account of U.S. Eavesdropping on Americans''. The article goes on to say: "Despite pledges by President George W. Bush and American intelligence officials to the contrary, hundreds of US citizens overseas have been eavesdropped on as they called friends and family back home...". This highlights a serious problem: the NSA desires to monitor telephonic conversations to determine if any of them are of import to national security, but this has resulted in an intolerable invasion of the privacy of hundreds of regular citizens. Presumably, the security agency is mainly interested in detecting certain voices, or identifying certain key phrases. However, even if they did so using automated systems, current technologies would still provide them full access to the subjects' recordings.
Fortunately, it may be possible to devise technological solutions to the above problem. The solution lies in the use of Secure Multiparty Computation (SMC) protocols.
Secure Multiparty Computation Protocols
SMC protocols were first introduced by Andrew Yao (1982) in his seminal paper where he presented the solution to the so-called millionaire problem. Two millionaires desire to find out who of them is richer, without divulging their actual worth to one another. The actual solution involves an exchange of partial information between the parties such that at no point is either party given any clue to how much the other may be worth.
A simpler illustrative example is this one: A number of people in a party desire to find out what their average income is. How would they do so without revealing their actual worth? A naive solution that satisfies this condition is this: the first member of the group adds a random number to their income and passes the number to the second member. The second person adds his or her own income to this number and passes it on to the third, and so on. Note that at no point does anyone have any idea of what anyone besides themselves actually makes, thanks to the random number added by the first party. In other words the privacy of the individuals in the computation is preserved. The number makes a loop and returns to the first party who now simply subtracts the random number they initially added, divides the sum by the number of people and announces the result.
Why is this solution naive? Because we assume that all parties are honest. But the mathematics of SMC have become increasingly sophisticated and modern protocols permit not only to ensure privacy, but also to ensure that all parties have been honest.
Privacy Preserving Audio Classification
In earlier work [2,3] Smaragdis and Shashanka have shown how SMC protocols may be used to create privacy preserving classifiers using Gaussian mixture classifiers and HMM, and how these models themselves may in turn be trained in a privacy preserving manner.
The protocols proposed by Smaragdis and Shashanka permit a data to be classified using a Gaussian-mixture-based classifier or HMM-based classifier such that the entity who owns the classifier is never able to view any meaningful form of the data being classified. The entity in charge of the data, in turn, cannot know either the parameters of the classifiers or the outcome. Furthermore, Smaragdis and Shashanka also propose a mechanism to train such classifiers without obtaining clear access to the data.
Privacy Preserving Speech Processing
State-of-art speaker-verification, speaker-recognition and speech recognition systems are based largely on GMMs and HMMs. The above protocols provide a basic framework to extend these to fully privacy-preserving form.
However many challenges remain. The primary challenge is that of computation -- SMC protocols involve many exchanges of information between multiple parties for each computation. Speeding up the computation involves loss of privacy. The privacy-computation tradeoff is a serious bottleneck even in the simplest cases.
In more complex problems such as word spotting or recognition, additional complexity is introduced through the use of priors in the form language models or word probabilities. Information is potentially leaked by the manner in which priors probability terms are accessed, and nodes in graphs are visited. Current secure protocols are either insufficient to address these issues or are too slow to be effective.
The Tricky Issue and Other Benefits
Using the methods we are developing in this project, we envision a mechanism whereby NSA could obtain publicly-accepted forms of legal sanction to look for prespecified voices or phrases in incoming data. The privacy-preserving framework will ensure that they are only notified when these occur, but will have no access to the voice data itself, thus preserving citizens' privacy.
While this is not a perfect solution since the efficacy will depend on the accuracy of the actual speaker ID or keyword spotting algorithm that has been secured, and the chosen keywords may also occur in innocent conversation, it is still a vast improvement over the current situation.
This work also has implications for voice research.
Governments and corporations currently legally possess large quantities of audio recordings that they desire to develop processing algorithms for, but lack the necessary expertise in house. Privacy concerns prevent them from contracting the research out since the data cannot be shared, necessitating long legal procedures of clearances. Our work will enable them to permit outside researchers to work on these data without actually having access to it. Many other areas of broad impact can similarly be identified.
- Paris Smaragdis, Adobe Inc.
- Madhusudana Shashanka, Mars Inc.
 Andrew Yao: "Protocols for Secure Computations", Proceedings of Twenty-third IEEE Symposium on Foundations of Computer Science, Chicago, Illinois, November 1982, 160-164.
 Shashanka M. and P. Smaragdis: "Secure sound classification: Gaussian mixture models", IEEE International Conference on Acoustics Speech and Signal Processing, 2006.
 Smaragdis, P. and Shashanka M.: "A framework for secure speech recognition", IEEE Transactions on Acoustics, Speech and Language Processing, Vol. 15, 1404-1413.