Just a suggestion...
Why not use a text to speach engine to do this unless we choose not to use the text to speach (TTS) mode.
That way we would only have to do voice files for unique and or dificult names. During our past experiance with I-mark system the major complaint was that the voice quality was poor, it had very few controls for voice type, speed, gender, and languge support. But it did do the basic english TTS part well. It had difficulty with name pronaunciation mostly in spanish, but I feel that that was largly do to the specific TTS engine we were using at the time. TTS has come a long way since then and has been endorsed by a great number of visually impared organization around the world.
Again, just food for thought...