Audio Rendering of STEM Content

These are a series of projects that we undertook to render the STEM content in audio with an intention to assist the people with print disabilities, who face a lot of difficulties today to be able to even access such content, while pursuing STEM education is still a distant dream. It is surprising that even with such developments in technology, someone has to really struggle for education.

For example, consider something as simple and basic as math equations. Mathematics, in its visual form, gives the reader a very high level of granularity in perceiving the equation (we get the concept of subscript and superscript by just glancing at an equation). Now, try to picture the situation of a visually impaired student. Yes, we can use an text to speech engine and make it read the equation. The point dies here itself coz as we all know, math in itself is quite complex and speaking an equation just like a normal sentence leaves the student with all the more confusion. In other words, the TTS systems today do not have such granularity as is available when we SEE an equation and it is beyond reason to try to comprehend something in mathematics without having such clarity, even for normal sighted people.

Code

Slides

Accepted Publications

SIGNIFICANCE OF PARALINGUISTIC CUES IN THE RENDERING OF MATHEMATICAL EQUATIONS


SAMPLE WAVEFILES


Technique 1: Original TTS

Technique 2: Pauses and Intonation Variation

Technique 3: Special Sounds

Technique 4: 3D Audio

Technique 5: Special Sounds + Audio Spatialization

Venkatesh Potluri, Sai Krishna, Priyanka Srivastava, Kishore Prahallad

ICON 2014

Text to speech (TTS) systems hold promise as an information access tool for literate and illiterate including visually challenged. Current TTS systems can convert a typical text into a natural sounding speech. However, auditory rendering of mathematical content, specifically equation reading is not a trivial task. Mathematical equations have to be read so that appropriate bracketing such as parentheses, superscripts and subscripts are conveyed to the listener in an accurate way.Earlier works have attempted to use pausesas acoustic cues to indicate some of the semantics associated with the mathematical symbols. In this paper, we first analyse the acoustic cues which human-beings employ while speaking the mathematical content to (visually challenged) listeners and then propose four techniques which render the observed patterns in a text-to speech system. The evaluation considered eight aspects such as listening effort, content familiarity, accentuation, intonation, etc. Our objective metrics show that a combination of the proposed techniques could render the mathematical equations using a TTS system as good as that of a human being.

SYNTHESIS OF STATISTICAL CONTENT USING AUDIO CUES


Venkatesh Potluri, Sai Krishna, Priyanka Srivastava, Kishore Prahallad

30th Annual International Technology and Persons with Disabilities Conference 2015

Current TTS systems can convert a typical text into a natural sounding speech. However, rendering Mathematical content ( Equations, Bar Graphs and Pie Charts ) in audio is not a trivial task and it can not effectively be achieved with currently available mainstream TTS technology found in most screen readers. The ambiguity in most cases is caused if the listener is unable to identify the beginning and end of one of the mentioned demarkations. Earlier works have attempted to use pauses as acoustic cues to indicate some of these.In addition to equations, the current day TTS systems have very little or no capability to render bar graphs and pie charts in audio. In this work, we perform an experiment to measure the effectiveness of mathematical equations synthesised using a traditional TTS system. We analyse the acoustic cues which human-beings employ while speaking the mathematical content to (visually challenged) listeners and propose four techniques which render the observed patterns in a text-to- speech system.