According to the
Ethnologue there are currently about 7000 living languages in the world. Similar estimates are reported by other sources
[1] [2]. The majority of them are technologically ignored.
The languages of the world project is a collaboration of leading research institutes around the world, coordinated by the Language Technologies Institute at Carnegie Mellon University. Its affiliates include the developers of the most advanced speech and language systems.
The
Languages Of The World project aims to develop langauge technologies for all living languages. Technologies will include speech recognition, synthesis, understanding and translation.
The various languages of the world are known to be interconnected -- they can be categorized into a hierachy of groups that not only characterize the lexical and phonetic similarity of different languages, but also represent a genealogy that identifies the order in which they emerged from earlier ancestral langauges.
It has long known by linguists that studying them in isolation not sufficient -- consider the ensemble.
Apply the same principle from a technology standpoint.
What do we know about Chinese that helps us with Mongolian. What do we know about Chinese that helps us comprehend Turkic?
Does knowledge of Tamil/Telugu help us with Brahui (in Baluchistan)? What about Finno-Uguric?
Does knowledge of grammatical structure in Romanian be used to develop models blindly for Spanish.
Relationships and modelling paradigms. Vowel length not a feature in english, but in hindi. HMM-based model not great for this -- how to do this; will this generalize to other languages? Pharyngialization in arabic. Retroflexion is allophone of R in english, unique sound in tamil.
How does the modelling/adaptation work? How do we generalize?
A center for data, systems (recog and synth), tools, and algorithms. Participants contribute, share, collaborate.
Resources including systems and data for all languages in the world.
Funding, visiting positions, etc.
Build systems for everything, including langauges for which we have NO resources, based on similarity etc. Challenges are great.
For well studied langauges, research on acoustic/production level relationships. Why can an Indian not say "Daddy" the way an American does? How does this affect synthesis for Indian using models/units obtained from an American.
The
Languages of the World project, organized by the
Center for Innovations in Spoken Language and the
Language Technologies Institute at Carnegie Mellon University, aims to develop an interconnected network
Mission:
To help people communicate better across boundaries of native language, geography and culture.
The languages of the world project is a collaborative effort of an international network of people learning, teaching and gathering knowledge about language and communication. If you join us, together we can do the following:
- We will have fun.
Playing with language is fun. Working with other people is fun.
Making new friends is fun.
- We will learn.
Some of us will be studying a new language. All of us will
learn-by-doing. Some of us will learn by teaching others.
- We will teach.
We are a network of people studying and teaching each other our
languages.
- We will help other people.
We are a network of friendly, caring people. We will help other
people, inside and outside our network.
- We will help the community.
We will be developing reference resources and collecting data
about language. These resources will be made available to the
research community and to the general public.
- We will be developing advanced technology to support these goals.