Kornel Laskowski

Former Student (c/o R Stern)
Language Technologies Institute
School of Computer Science
Carnegie Mellon University

kornel AT cs DOT cmu DOT edu
Carnegie Mellon University
407 S Craig St, SCR 218
Pittsburgh PA, 15213
USA
Phone: +1 412 268 2518
Fax: +1 412 268 5578

KTH Speech, Music and Hearing
Lindstedstvägen 24
SE-100 44 Stockholm
Sweden
Phone: +46 8 790 97 51
Fax: +46 8 790 78 54

Kornel Laskowski

Extended Degree-of-Overlap (EDO) Model: A Normative Implementation in C

The multi-port EDO (MPEDO) model provides bigram transition probabilities of multi-participant vocal activity overlap. It "ties" transitions in the actual multi-participant vocal activity space, between states which are specific to the number and index assignment of participants, in an alternate space which is independent of both the number of and the index assignment of participants. It is indended to provide likelihoods over multi-participant vocal activity chronograms, to aid in vocal activity detection in multi-party settings, and to enable prediction of vocal activity deployment in those same settings. The model was developed with Tanja Schultz of the Language Technologies Institute at Carnegie Mellon University (now at the Karlsruhe Institute of Technology) and Mari Ostendorf of the Department of Electrical Engineering at the University of Washington (when she was visiting the Karlsruhe Institute of Technology).
The more recent single-port EDO (SPEDO) model was developed with Mattias Heldner (now at the Department of Linguistics at Stockholm University) and Jens Edlund at the Department of Speech, Music and Hearing at the Royal Institute of Technology.

edo-1.2.9.tar.gz (15 Aug 2011)
- implements a K- and T- independent perplexity over observed speech activity using the single-port model (SPEDO)
- replicates the figures in (Laskowski, Edlund & Heldner, ICASSP 2011) using Makefile.ICASSP2011
- replaces Figure 1 in (Laskowski, Edlund & Heldner, ICASSP 2011) with a corrected version
- obsoletes edo-1.0.5.tar.gz (03 Nov 2010)
edo-1.0.5.tar.gz (03 Nov 2010)
- includes sorting and random number generation code derived from Numerical Recipes in C (2nd ed., 30 Oct 1992)
- implements a K- and T- independent perplexity over observed speech activity using the multi-port model (MPEDO)
- replicates results from Errata 1 to (Laskowski, 2010) using lex.Q (available below) and the scripts ACL2010_Table1.sh and ACL2010_Table2.sh
- additionally implements several guessing baselines for comparison
- obsoletes edo-1.0.0.tar.gz (26 Oct 2010)
dcm-1.0.0.tar.gz (03 Nov 2010)
- implements perplexity over observed speech activity using direct compositional models (cf. Laskowski, 2010)
- replicates results from Errata 1 to (Laskowski, 2010) using lex.Q and the scripts ACL2010_Table1.m, ACL2010_Table2_CD.m, and ACL2010_Table2_UI.m (packaged in the distribution)
- the MATLAB implementation is suboptimal in speed, size, and clarity, and is provided for the purposes of comparison with edo-1.0.5.tar.gz
lex.Q.tar.gz (26 Oct 2009)
- exemplar data (from the ICSI Meeting Corpus) for exercising the models on this page
- derived from the (essentially) continuous-time, forced-alignment-mediated, and participant-attributed speech activity references in the ICSI Meeting Recorder Dialog Act annotations (version icsi_mrda+hs_corpus_050512.tar.gz)
- sampled at a frame step of 100 ms, a frame size of 100 ms, and a within-frame threshold of 0.5

References:

The EDO model was first proposed in

Kornel Laskowski and Tanja Schultz (2006), Unsupervised Learning of Overlapped Speech Model Parameters for Multichannel Speech Activity Detection in Meetings. In proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2006), Toulouse, France, 14-19 May, pp993-996. [poster: p1, p2]

but was not explicitly thus named until

Kornel Laskowski and Tanja Schultz (2007), Modeling Vocal Interaction for Segmentation in Meeting Recognition. In Machine Learning for Multimodal Interaction (A. Popescu-Belis, S. Renals, and H. Bourlard, eds.), Springer Berlin/Heidelberg (Lecture Notes in Computer Science 4892), pp259-270. Presented at the 4th Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI2007), Brno, Czech Republic, 28-30 June. [slides]

Its application to multi-pass ASR in meetings was described in

Kornel Laskowski, Christian Fügen, and Tanja Schultz (2007), Simultaneous Multispeaker Segmentation for Automatic Meeting Recognition. In proceedings of the 15th EURASIP European Signal Processing Conference (EUSIPCO2007), Poznań, Poland, 03-07 September, pp1294-1298. [slides]

Most recently, it was applied to predicting the future within conversations, under the assumption of conditional dependence in

Kornel Laskowski (2010), Modeling Norms of Turn-Taking in Multi-Party Conversation, In proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL2010), Uppsala, Sweden, 11-16 July, pp999-1008. [slides] [errata 1, 03 Nov 2010]

and conditional independence in

Kornel Laskowski, Jens Edlund & Mattias Heldner (2011), A Single-Port Non-Parametric Model of Turn-Taking in Multi-Party Conversation, In proceedings of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2011), Praha, Czech Republic, 22-27 May, pp5600-5603. [poster]

Last modified: Tue 16 Aug 2011 0308hrs GMT