Sweating the details

SCS startup Safaba develops smarter translations for specialized clients

Say a marketing executive at a large company needs to translate new product information on her company website from English to French within a few hours. She's not fluent enough in French to do it herself, so she has several options--she can hire a human translator or a translation agency. She can also use a readily available online translation program, such as those offered by Google and Microsoft's Bing.

Translating an entire website using human translators would be time-consuming and costly, and the commonly available free online translation services have drawbacks.

"When people think of language technology and translation, most people (think) of free services, like Google, that are basically designed to support a very broad range of individual users who might want to translate anything," says Alon Lavie, research professor at CMU's Language Technologies Institute and president and CEO of the start-up Safaba Translation Solutions, which provides specialized computer-based machine translation software for global corporations.

"What Google and Microsoft are trying to do is build systems that generate the best possible translation without knowing anything specific about your company and the specific terminology and other language characteristics that you and your company typically use," Lavie says. While those translations are often understandable, he says they're not quite good enough for representing your company in another market.

To cover their bases as broadly as possible, Lavie says, free translation services build their translations from a wide range of different resources. The resulting quality can vary from one sentence to the next. Anyone who has translated a webpage from another language to English, using a free online service, is familiar with awkward translations. Sure, a native English speaker may be able to figure out the gist of a computer-generated translation phrase such as "from now officially Cologne," but it doesn't sound right.

The company exec who used an online, mass-market translation program to translate her website from English to French could end up with text using either slang (informal language instead of formal) or terms from the wrong domain or industry. Lavie uses the example of the English word "tablet." Imagine translating "tablet" on the website of a computer company as if it were a medicine "tablet" or pill, he says.

Unlike free services that aim for very broad audiences but provide somewhat crude translations, Safaba targets large organizations--such as companies in the Fortune 500--that need fast but high-quality translations of high volumes of documents, often incorporating very specific corporate language.

"We use machine-learning to give them what they need," Lavie says.

The global market for translation services is estimated at $30 billion annually, with machine translation currently accounting for a fraction of this market. As demand for large volumes of content translation increases and the technology improves, experts believe that demand for machine translation will continue to expand.

"The industry itself is growing," says Olga Beregovaya, a vice president at Welocalize Inc., which is using Safaba's translation engine to provide specialized, on-demand translation services to major clients with global audiences, such as Dell and PayPal.

While working on research projects as a faculty member at the Language Technologies Institute, Lavie became involved in the Association for Machine Translation in the Americas, or AMTA. Through AMTA, Lavie saw that the machine translation market needed programs that better captured certain specialized nuances of language.

Lavie teamed up with his former Ph.D. classmate Bob Olszewski (CS'01), who was working at the time at a different LTI spin-off company that creates software for language tutoring. The two began collaborating on a system that businesses could use to translate content into multiple languages.

With support from the SCS-based entrepreneurship program Project Olympus and CMU's Center for Technology Transfer and Enterprise Creation, Safaba was launched in summer 2009 with just Olszewski and Lavie. Since then, it's grown to include 11 employees. (Project Olympus recently merged with the similar Don Jones Center for Entrepreneurship in CMU's Tepper School of Business to create the Carnegie Mellon University Center for Innovation and Entrepreneurship.)

To create custom translation programs, companies provide Safaba with source materials that they've already had translated by humans--the same documents in, for example, English and French. Those documents already contain approved translations of specialized language such as technical descriptions or trademarked corporate slogans. Safaba's software then analyzes the materials, sentence-by-sentence, creating highly reliable statistical models used to generate translations of new documents and adapt the generated translations to respect the individual company's unique language.

Safaba's software can handle multiple languages and the company hopes to eventually incorporate its services directly into content management systems and other backend authoring technologies, allowing businesses to generate live translations as the original texts are being written.

Only a relatively small amount of material is needed to train Safaba's programs on how to adapt its language to the specific nuances of a particular company quickly and accurately, Lavie says. And because Safaba's software is learning from professionally translated documents, the translations it generates include fewer of those awkward phrases that confuse native speakers.

However, the reliance on a company's own translations can lead to problems of its own. If the translated materials include errors, the program's models will "learn" those errors, too. "There often isn't a single translation that is correct in all possible contexts," Lavie says.

To prevent errors from going public, human translators still have to get final approval before content goes live, Beregovaya of Welocalize. "We always edit Safaba output," she says. "The benefit here is that post-editing is faster than human translation, and cheaper. Safaba output is good, but (it) doesn't read as human."

Beregovaya says having access to machine-learning models developed at Carnegie Mellon gives Safaba an edge over competitors. "Initially, we went with them because of CMU's name recognition," she says. Welocalize then held a "technology bake-off" between Safaba and two competing translation services, and Safaba was the clear victor.

Welocalize is currently bidding on a project where Safaba's engine would provide translations between 35 different languages.

"We are pragmatic," Beregovaya says. "If it works better, our clients save money and we also save money."

For More Information:

Jason Togyer | 412-268-8721 | jt3y@cs.cmu.edu

Undergraduate

Master's

Doctoral

Sweating the details