NMT Data Studio

Everything you need to create machine translation
system for any domain

Machine Translation Toolkit

Data Preparation
Parse, filter, markup parallel and monolingual corpora. Create blocks for test and validation data
Model Training
Train custom neural architecture with parallel job lists, GPU analytics and quality estimation
Deployment
When model training finishes it can be automatically deployed as API or available to download for offline use

NMT Control Panel

To quickly deal with translation and stop running tests from the console, use a handy dashboard for all tasks, from preparing and filtering data to deploying translation tests. In the picture below: on the right is a list of tasks and GPU servers on which models are being trained. In the center are the parameters of the neural network, and below are the datasets that will be used for training.

Data Management

Working on a new language began with datasets preparation. We took them from open sources such as Wikipedia, European Parliament, Paracrawl, Tatoeba and others. To reach an average translation quality, 5M translated lines are enough.

Deployment and API

Datasets are lines of text translated from one language to another. Then the tokenizer splits the text into tokens and creates dictionaries from them, sorted by the frequency of meeting the token. The token can be either single characters, syllables, or whole words.

Quality Evaluation

After the datasets were uploaded to the database, it turned out they have a lot of words with errors and poor translation. To achieve good quality, they must be strongly filtered. You can also buy already high-quality filtered datasets.

Was made with Data Studio

Example of translation system was made with Lingvanex Data Studio

Request a Quote