NMT Data Studio
Everything you need to create machine translation
system for any domain
Machine Translation Toolkit
NMT Control Panel
To quickly deal with translation and stop running tests from the console, use a handy dashboard for all tasks, from preparing and filtering data to deploying translation tests. In the picture below: on the right is a list of tasks and GPU servers on which models are being trained. In the center are the parameters of the neural network, and below are the datasets that will be used for training.
Working on a new language began with datasets preparation. We took them from open sources such as Wikipedia, European Parliament, Paracrawl, Tatoeba and others. To reach an average translation quality, 5M translated lines are enough.
Deployment and API
Datasets are lines of text translated from one language to another. Then the tokenizer splits the text into tokens and creates dictionaries from them, sorted by the frequency of meeting the token. The token can be either single characters, syllables, or whole words.
After the datasets were uploaded to the database, it turned out they have a lot of words with errors and poor translation. To achieve good quality, they must be strongly filtered. You can also buy already high-quality filtered datasets.
Was made with Data Studio
Example of translation system was made with Lingvanex Data Studio