Harshit:

Aasish and I would be talking next week about our accompanying project to the seminar. We have been working with Alan to explore methods for generating resources for endangered or less-resourced languages. Many such languages share linguistic similarity with more resource rich languages or 'pivots'.

We have explored methods to exploit this property to create mono and multilingual dictionaries and study language distance. More specifically we are working on generating speech lexicon for Pashto using resources and data in Farsi, Urdu and Hindi. We'd also talk about the nature of cognates found and their relation to language distance.

Here are some related papers:

M. Davel & E. Barnard (2005). Bootstrapping Pronunciation Dictionaries: Practical Issues. Interspeech.
http://www.meraka.org.za/pubs/davel05practical.pdf

John Kominek & Alan W. Black (2006). Learning pronunciation dictionaries: language complexity and word selection strategies. HLT-NAACL.
http://www.aclweb.org/anthology/N06-1030

T. Mark Ellison and Simon Kirby (2006). Measuring language divergence by intra-lexical comparison. ACL.
http://www.aclweb.org/anthology/P/P06/P06-1035.pdf