According to the Ethnologue there are currently about 7000 living languages in the world. Similar estimates are reported by other sources [1] [2]. The majority of them are technologically ignored.
The languages of the world project is a collaboration of leading research institutes around the world, coordinated by the Language Technologies Institute at Carnegie Mellon University. Its affiliates include the developers of the most advanced speech and language systems.
The Languages Of The World project aims to develop langauge technologies for all living languages. Technologies will include speech recognition, synthesis, understanding and translation.
The various languages of the world are known to be interconnected -- they can be categorized into a hierachy of groups that not only characterize the lexical and phonetic similarity of different languages, but also represent a genealogy that identifies the order in which they emerged from earlier ancestral langauges.
It has long known by linguists that studying them in isolation not sufficient -- consider the ensemble.
Apply the same principle from a technology standpoint.
What do we know about Chinese that helps us with Mongolian. What do we know about Chinese that helps us comprehend Turkic? Does knowledge of Tamil/Telugu help us with Brahui (in Baluchistan)? What about Finno-Uguric?
Does knowledge of grammatical structure in Romanian be used to develop models blindly for Spanish.
Relationships and modelling paradigms. Vowel length not a feature in english, but in hindi. HMM-based model not great for this -- how to do this; will this generalize to other languages? Pharyngialization in arabic. Retroflexion is allophone of R in english, unique sound in tamil.
How does the modelling/adaptation work? How do we generalize?

A center for data, systems (recog and synth), tools, and algorithms. Participants contribute, share, collaborate.
Resources including systems and data for all languages in the world.
Funding, visiting positions, etc.
Build systems for everything, including langauges for which we have NO resources, based on similarity etc. Challenges are great.
For well studied langauges, research on acoustic/production level relationships. Why can an Indian not say "Daddy" the way an American does? How does this affect synthesis for Indian using models/units obtained from an American.
Mission: To help people communicate better across boundaries of native language, geography and culture.

