The METEOR Automatic Machine
Translation Evaluation System
Alon Lavie
Abhaya Agarwal
Michael Denkowski
Carnegie Mellon University
Pittsburgh, PA, USA
Download METEOR
Please send any questions and bug reports to Michael Denkowski at mdenkows (at) cs (dot) cmu
(dot) edu.
News
- METEOR version 1.0 has been released. New features include HTER parameters, weightable
matcher modules, and support for paraphrase information.
- A pure java version of METEOR has been released - version 0.8.3.
- Version 0.8 has been completely rewritten as a C++ library with wrappers in Java
and Python. A length penalty has been also been added. See the README
for more information or look at some of the API features.
- Starting with version 0.7, METEOR has parameter sets optimized for
different criteir. For more details, see the README
- Starting with version 0.6, METEOR supports French, German and Spanish
apart from English.
- Parameters inside METEOR have changed. Please refer
Lavie & Agarwal, 2007
for details.
About METEOR
METEOR is a system that automatically evaluates the output of machine
translation engines by comparing to them to one or more reference
translations. For a given pair of a hypothesis and reference strings,
the evaluation proceeds in a sequence of stages, with different
criteria being used at each stage to find and score unigram
matches. By default, in the first stage all exact matches are detected
between the two strings. In the second stage, all stem matches are
detected using the Snowball stemmers. In the third stage, all
synonym matches are detected using data extracted from the WordNet 3
database. As of version 1.0, an optional fourth stage detects single
word matches according to a paraphrase database.
The latest version of the system is written in pure Java with a full
API to allow easy incorporation of METEOR scoring into existing
systems. The sentence aligner can function independently of the scorer
and thus be used in other systems that require monolingual sentence alignment.
As of version 1.0, METEOR also includes a trainer which can be used to
retune the metric's parameters to new data.
METEOR supports the SGML input file format is used by Bleu and
NIST's Machine Translation Evaluation system. Thus all translation
data that can be evaluated using Bleu (such as the TIDES data) can
also be directly evaluated using METEOR. METEOR also supports a
much faster plaintext mode.
References
-
[Lavie & Denkowski 2009] Lavie, Alon and Denkowski, Michael. "The METEOR Metric for
Automatic Evaluation of Machine Translation", to appear in special issue of the Machine
Translation Journal [pdf]
-
[Agarwal & Lavie 2008] Agarwal, Abhaya and Lavie, Alon. "Meteor, m-bleu and m-ter:
Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation
Output", Proceedings of Workshop on Statistical Machine Translation
at the 46th Annual Meeting of the Association of Computational Linguistics
(ACL-2008), Columbus, June 2008. [pdf]
-
[Lavie & Agarwal 2007] Lavie, A., A. Agarwal. "METEOR: An Automatic Metric for MT
Evaluation with High Levels of Correlation with Human Judgments",
To appear in Proceedings of Workshop on Statistical Machine Translation
at the 45th Annual Meeting of the Association of Computational Linguistics (ACL-2007),
Prague, June 2007 [pdf]
-
[Banerjee & Lavie 2005] Banerjee, S. and A. Lavie, "METEOR: An Automatic Metric for MT
Evaluation with Improved Correlation with Human Judgments",
Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures
for MT and/or Summarization at the 43th Annual Meeting of
the Association of Computational Linguistics (ACL-2005),
Ann Arbor, Michigan, June 2005. [pdf]
This page last modified on: October 5, 2009