Language Technologies Institute
School of Computer Science
Carnegie Mellon University

kornel AT cs DOT cmu DOT edu
407 S Craig St, SCR 218
Pittsburgh PA, 15213
Phone: +1 412 268 2518
Fax: +1 412 268 5578

KTH Speech, Music and Hearing
Lindstedstvägen 24
SE-100 44 Stockholm
Phone: +46 8 790 97 51
Fax: +46 8 790 78 54

Mel Spectral Flux (MSF): A Normative Implementation in C

MSF, the negative logit of the cosine distance between consecutive Mel spectra, is an easy-to-compute instantaneous-frame correlate of speaking rate. The representation was developed with Anna Hjalmarsson, at the Department of Speech, Music and Hearing at KTH, to detect final lengthening. It is currently being explored to detect final creak, to characterize voice quality, to aid in second-language learning, and to quantify rate of speech (ROS) in general conversational settings. To reproduce the results from (Hjalmarsson & Laskowski, 2011):
  • obtain the file dealsnippets.tar.gz from Anna Hjalmarsson and place in SOMEPATH
  • obtain Makefile.INTERSPEECH2011 and place in SOMEPATH
  • change directory to SOMEPATH and run "make -f Makefile.INTERSPEECH2011 all"
  • the Makefile downloads, builds and executes all required code to produce encapsulated PostScript of Figures 1, 3 and 4, as well as the text for Table 2 and other miscellaneous numerical results found in the paper

The MSF representation was introduced in

