How to estimate performance?
Accuracy = % of text words matched by recognizer output
- Coarse-grained
- Sensitive to missed words
- Doesn’t penalize requests for help
Inter-word latency = time interval between aligned text words
- Finer-grained
- Sensitive to hesitations, insertions
- Robust to many speech recognizer errors