We have not yet, at this stage in the project, been able to run formal evaluation tests, though throughout the six months that the project was active we did carry out component-based tests.
The whole system (running in unthethered mode) takes around 2-3 seconds to translate a typical utterance from when the speaker stops speaking, to when the system starts speaking the translation. Thus, the performance can be said to be just over real-time. However, recognition can take 1-2 seconds longer in adverse acoustic environments.
In spite of PDAs having poor audio input hardware the system works well in various environments, including offices and outside. Though in some harsher environments the system capability improves if given a few utterances to adapt to. We have found that in environments with lots of human speech around, such as bars and restaurants, performance goes down.
In informal tests we have found a greater than 80% accuracy.
The system is set up for the domain of medical interviews, and has only basic vocabulary for greetings and numbers outside that domain. Although there is shared coverage it is assume that the English speaker is a doctor and the Arabic speaker is the patient.
Although difficult to fully quantify the coverage, for English the language model covers many hundreds of sentence types, with as many as dozens of possible variations, such as diseases, ailments and body parts. The Arabic side is more constrained, but still deals with a few hundred sentence types.