The METEOR Automatic Machine
Translation Evaluation System
Alon Lavie
Abhaya Agarwal
Michael Denkowski
Carnegie Mellon University
Pittsburgh, PA, USA
Download METEOR
Please send any questions and bug reports to Michael Denkowski at mdenkows AT cs DOT cmu DOT edu.
News
- Version 0.8 has been completely rewritten as a C++ library with wrappers in Java
and Python. A length penalty has been also been added. See the README
for more information or look at some of the API features.
- Starting with version 0.7, METEOR has parameter sets optimized for
different criteir. For more details, see the README
- Starting with version 0.6, METEOR supports French, German and Spanish
apart from English. Find out the details here.
- Parameters inside METEOR have changed. Please refer
Lavie & Agarwal, 2007
for details.
About METEOR
METEOR is a system that automatically evaluates the output of machine
translation engines by comparing to them to one or more reference
translations. For a given pair of a hypothesis and reference strings,
the evaluation proceeds in a sequence of stages, with different
criteria being used at each stage to find and score unigram
matches. By default, at the first stage all exact matches are detected
between the two strings. In the second stage, all stem matches are
detected using the Snowball stemmers, and in the third stage, all
synonym matches are detected using data extracted from the WordNet 3
database.
The base system is written in C++ and also includes APIs in Java and
Python to allow anyone to easily incorporate METEOR scoring into existing
systems. The sentence aligner can also function independently of scorer
and thus be used in other systems that require monolingual sentence
alignment.
METEOR supports the SGML input file format is used by Bleu and
NIST's Machine Translation Evaluation system. Thus all translation
data that can be evaluated using Bleu (such as the TIDES data) can
also be directly evaluated using METEOR. METEOR also supports a
simple plaintext mode which is several times faster.
References
- [Agarwal & Lavie 2008] 2008, Agarwal, Abhaya and Lavie, Alon. "Meteor, m-bleu and m-ter:
Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation
Output", Proceedings of Workshop on Statistical Machine Translation
at the 46th Annual Meeting of the Association of Computational Linguistics
(ACL-2008), Columbus, June 2008. [pdf]
- [Lavie & Agarwal,2007] 2007, Lavie, A., A. Agarwal. "METEOR: An Automatic Metric for MT
Evaluation with High Levels of Correlation with Human Judgments",
To appear in Proceedings of Workshop on Statistical Machine Translation
at the 45th Annual Meeting of the Association of Computational Linguistics (ACL-2007),
Prague, June 2007 [pdf]
-
[Banerjee & Lavie,2005] 2005,Banerjee, S. and A. Lavie, "METEOR: An Automatic Metric for MT
Evaluation with Improved Correlation with Human Judgments",
Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures
for MT and/or Summarization at the 43th Annual Meeting of
the Association of Computational Linguistics (ACL-2005),
Ann Arbor, Michigan, June 2005. [pdf]
This page last modified on: 8th May, 2007.