PITTSBURGH—Scientists at Carnegie Mellon University have discovered that our ears use the most efficient way to process the sounds we hear, from babbling brooks to wailing babies. These results represent a significant advance in understanding how sound is encoded for transmission to the brain, according to the authors, whose work is published with an accompanying "News and Views" editorial in the Feb. 23 issue of Nature.
The research provides a new mathematical framework for understanding sound processing and suggests that our hearing is highly optimized in terms of signal coding—the process by which sounds are translated into information by our brains—for the range of sounds we experience. The same work also has far-reaching, long-term technological implications, such as providing a predictive model to vastly improve signal processing for better-quality compressed digital audio files and designing brain-like codes for cochlear implants, which restore hearing to the deaf.
To achieve their results, the researchers took a radically different approach to analyzing how the brain processes sound signals. Abstracting from the neural code at the auditory nerve, they represented sound as a discrete set of time points, or a "spike code," in which acoustic components are represented only in terms of their temporal relationship with each other. That's because the intensity and basic frequency of a given feature are essentially "kernalized," or compressed mathematically, into a single spike. This is similar to a player piano roll that can reproduce any song by recording what note to press when the spike code encodes any natural sound in terms of the precise timings of the elemental acoustic features. Remarkably, when the researchers derived the optimal set of features for natural sounds, they corresponded exactly to the patterns observed by neurophysiologists in the auditory nerves.
"We've found that timing of just a sparse number of spikes actually encodes the whole range of nature sounds, including components of speech such as vowels and consonants, and natural environment sounds like footsteps in a forest or a flowing stream," said Michael Lewicki, associate professor of computer science at Carnegie Mellon and a member of the Center for the Neural Basis of Cognition (CNBC). "We found that the optimal code for natural sounds is the same as that for speech. Oddly enough, cats share our own optimal auditory code for the English language."
"Our work is the only research to date that efficiently processes auditory code as kernalized spikes," said Evan Smith, a graduate student in psychology at the CNBC. Until now, scientists and engineers have relied on Fourier transformations—initially discovered 200 years ago—to separate and reconstitute parameters like frequency and intensity as part of traditional sound signal processing. "Our new signal processing framework appears far more efficient, effective and concise in conveying a rich variety of natural sounds than anything else," Lewicki said.
Smith and Lewicki's approach dissects sound based only on the timing of compressed "spikes" associated with vowels (like cat vocalizations), consonants (like rocks hitting one another) and sibilants (ambient noise).
To gather sounds for their research, the scientists traipsed through the woods and recorded cracking branches, crunching leaves and wind rustling through leaves before returning to the laboratory to decode the information contained in this rich set of sounds. They also discovered what they consider the most "natural" sound: if they play back a random set of spikes, it sounds like running water.
"We're very excited about this work because we can give a simple theoretical account of the auditory code which predicts how we could optimize signal processing to one day allow for much more efficient data storage on everything from DVDs to iPods," Lewicki said.
"For instance, if we could use a cochlear implant to 'talk' to the auditory nerve in a more natural way via our discovered coding, then we could quite possibly design implants that would convey sounds to the brain that are much more intelligible," he said.
The authors' research, which combines computer science, psychology, neuroscience and mathematics, is funded by the National Institutes of Health and the National Science Foundation.
The CNBC is dedicated to understanding the neural mechanisms that give rise to cognitive processes, including learning and memory, language and thought, perception and attention, and planning and action. The CNBC faculty includes researchers with primary and joint appointments in the departments of Biological Sciences, Computer Science, Psychology, Robotics and Statistics at Carnegie Mellon; and Bioengineering, Mathematics, Neurobiology, Neurology, Neuroscience, Psychiatry and Psychology at the University of Pittsburgh. See http://www.cnbc.cmu.edu for more information.
Byron Spice | 412-268-9068 | bspice [atsymbol] cs.cmu.edu