Translation Quality Report. January 2024

The goal of this report is to compare translation quality between old and new language models. New models have not only improved quality but performance and memory usage. We used the BLEU metric and primarily Flores 101 test set in the report.

BLEU is the most popular metrics in the world for machine translation evaluation. Flores 101 test set was released by Facebook Research and has the biggest language pair coverage.

Quality metrics description

BLEU

BLEU is an automatic metric based on n-grams. It measures the precision of n-grams of the machine translation output compared to the reference, weighted by a brevity penalty to punish overly short translations. We use a particular implementation of BLEU, called sacreBLEU. It outputs corpus scores, not segment scores.

References

Papineni, Kishore, S. Roukos, T. Ward and Wei-Jing Zhu. “Bleu: a Method for Automatic Evaluation of Machine Translation.” ACL (2002).
Post, Matt. “A Call for Clarity in Reporting BLEU Scores.” WMT (2018).

COMET

COMET (Crosslingual Optimized Metric for Evaluation of Translation) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using token or sentence embeddings. Unlike other metrics, COMET is trained on predicting different types of human judgments in the form of post-editing effort, direct assessment, or translation error analysis.

References

COMET - https://machinetranslate.org/comet
COMET: High-quality Machine Translation Evaluation - https://unbabel.github.io/COMET/html/index.html#comet-high-quality-machine-translation-evaluation

Lingvanex On-premise Software Updates

New version - 1.19.0.

Changes in functionality:

Improved Arabic transliteration for text translation.
Added demo page for the Speech Recognizer.
Added ability to translate voice transcription in Slack-Bot. Also some minor bug fixes. (Notice: after updating Lingvanex Slack-Bot, it is necessary to update the manifest and reinstall the bot).

New version - 1.20.0.

Changes in functionality:

Added option to restrict auto-detection only to languages deployed on the Server (check details in the documentation).
Minor improvements in translation quality.

New version - 1.21.0.

Changes in functionality:

Added new implementation of the language autodetect.
Improved demo page for the Speech Recognizer.
Minor improvements in translation quality.

Improved Language Models

BLEU Metrics

Improved Language Models. January 2024

COMET Metrics

Improved Language Models. January 2024

Language pairs

Note: The lower size of models on the hard drive means the lower consumption of GPU memory which leads to decreased deployment costs. Lower model size has better performance in translation time. The approximate usage of GPU memory is calculated as hard drive model size x 1.2

Category

Translation Quality Report. January 2024

Quality metrics description

BLEU

References

COMET

References

Lingvanex On-premise Software Updates

New version - 1.19.0.

New version - 1.20.0.

New version - 1.21.0.

Improved Language Models

BLEU Metrics

COMET Metrics

Language pairs

More fascinating reads await

Outlook and Growth Forecast for the On-Premise Translation Market 2023–2035

Best Free Apps for Slack

Speech Recognition Quality Comparison

Category

Translation Quality Report. January 2024

Quality metrics description

BLEU

References

COMET

References

Lingvanex On-premise Software Updates

New version - 1.19.0.

New version - 1.20.0.

New version - 1.21.0.

Improved Language Models

BLEU Metrics

COMET Metrics

Language pairs

More fascinating reads await

Outlook and Growth Forecast for the On-Premise Translation Market 2023–2035

Best Free Apps for Slack

Speech Recognition Quality Comparison

Contact Us

Completed