Translation Quality Report. July 2024

The goal of this report is to compare translation quality between old and new language models. New models have not only improved quality but performance and memory usage. We used the BLEU metric and primarily Flores 101 test set in the report.

BLEU is the most popular metrics in the world for machine translation evaluation. Flores 101 test set was released by Facebook Research and has the biggest language pair coverage.

QUALITY METRICS DESCRIPTION

BLEU

BLEU is an automatic metric based on n-grams. It measures the precision of n-grams of the machine translation output compared to the reference, weighted by a brevity penalty to punish overly short translations. We use a particular implementation of BLEU, called sacreBLEU.
It outputs corpus scores, not segment scores.

References

Papineni, Kishore, S. Roukos, T. Ward and Wei-Jing Zhu. “Bleu: a Method for Automatic Evaluation of Machine Translation.” ACL (2002).
Post, Matt. “A Call for Clarity in Reporting BLEU Scores.” WMT (2018).

COMET

COMET (Crosslingual Optimized Metric for Evaluation of Translation) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using token or sentence embeddings. Unlike other metrics, COMET is trained on predicting different types of human judgments in the form of post-editing effort, direct assessment, or translation error analysis.

References

COMET - https://machinetranslate.org/comet
COMET: High-quality Machine Translation Evaluation - https://unbabel.github.io/COMET/html/index.html#comet-high-quality-machine-translation-evaluation

On-premise Private Software Updates

New version - 1.30.0.

Changes in functionality:

Added the ability to accelerate the denoiser using the GPU to the speech recognizer.
Added the ability to configure the default speech recognizer parameters for each language separately.
Added the ability to translate a document into multiple languages sequentially without having to re-upload it on the demo page.
Fixed document translation errors.

LANGUAGE PAIRS

Note: The lower size of models on the hard drive means the lower consumption of GPU memory which leads to decreased deployment costs. Lower model size has better performance in translation time. The approximate usage of GPU memory is calculated as hard drive model size x 1.2

Language Pair	Current Model's Size, mb	Test Data	Previous Model's BLEU	Current Model's BLEU	Difference	Previous Model's COMET	Current Model's COMET	Difference
English - Japanese	190,63	Flores 101	36,77	39,62	+2,85	90,33	91,56	+1,26
English - Lithuanian	113,91	Flores 101	30,84	31,28	+0,44	89,61	90,11	+0,50
English - Czech	113,91	Lingvanex	47,73	48,94	+1,21	91,66	92,09	+0,43

Frequently Asked Questions (FAQ)

How to evaluate the quality of translation?

The quality of translation can be assessed through manual and automatic approaches. Manual evaluation involves human translators checking the texts for accuracy and looking for errors. Automatic approach to the evaluation of machine translation presupposes the use of specific metrics such as BLEU, COMET, METEOR and others.

Why do we need translation quality assessment?

Translation quality assessment ensures that the translated texts meet the required standards. It allows linguists to evaluate the accuracy, fluency and the correspondence of the translated text to its intended purpose. For machine translation systems quality assessment is important to improve their engines, compare different MT providers, and identify strengths and weaknesses for future development.

How can you improve translation quality?

There are many ways to improve the quality of your translations:
1. Set clear standards or guidelines
2. Hold quality checks at multiple stages of a translation process
3. Ensure human reviews of translated texts
4. Hire professional translators with appropriate skills
5. Constantly train MT models and improve them
6. Use advanced NLP techniques to ensure accuracy
7. Combine MT with human post-editing to get the best results
8. Collect and analyze the feedback from your clients

Category

Translation Quality Report. July 2024

QUALITY METRICS DESCRIPTION

BLEU

References

COMET

References

On-premise Private Software Updates

New version - 1.30.0.

LANGUAGE PAIRS

Frequently Asked Questions (FAQ)

How to evaluate the quality of translation?

Why do we need translation quality assessment?

How can you improve translation quality?

More fascinating reads await

Outlook and Growth Forecast for the On-Premise Translation Market 2023–2035

Best Free Apps for Slack

Speech Recognition Quality Comparison

Category

Translation Quality Report. July 2024

QUALITY METRICS DESCRIPTION

BLEU

References

COMET

References

On-premise Private Software Updates

New version - 1.30.0.

LANGUAGE PAIRS

Frequently Asked Questions (FAQ)

How to evaluate the quality of translation?

Why do we need translation quality assessment?

How can you improve translation quality?

More fascinating reads await

Outlook and Growth Forecast for the On-Premise Translation Market 2023–2035

Best Free Apps for Slack

Speech Recognition Quality Comparison

Contact Us

Completed