Lingvanex Data Studio

Everything you need to create machine translation
system for any domain

Machine Translation Toolkit

Data Preparation

Parse, filter, markup parallel and monolingual corpora. Create blocks for test and validation data

Model Training

Train custom neural architecture with parallel job lists, GPU analytics and quality estimation


When model training finishes it can be automatically deployed as API or available to download for offline use

From Novice to Expert

Lingvanex dashboard combines the latest linguistic and statistical techniques that are used to train the software to customer domains and improve translation quality. In the picture below: on the right is a list of tasks and GPU servers on which models are being trained. In the center are the parameters of the neural network, and below are the datasets that will be used for training.

Work with Parallel Data

Working on a new language began with datasets preparation. The dashboard has many predefined datasets from open sources such as Wikipedia, European Parliament, Paracrawl, Tatoeba and others. To reach an average translation quality, 5M translated lines are enough.

Dictionary and Tokenizer Tuning

Datasets are lines of text translated from one language to another. Then the tokenizer splits the text into tokens and creates dictionaries from them, sorted by the frequency of meeting the token. The token can be either single characters, syllables, or whole words. With Lingvanex Data Studio you can control the whole process of creating SentencePiece token dictionaries for every language separately.

Data Filtering and Quality Estimation

More than 20 filters are available to filter parallel and monolingual corpora to get the quality dataset from opensource or parsed data. You can markup named entities, digits and any other tokens to train system to leave some words untranslated or translated in specific way.

Create your Translation System

with the good quality and features  in 1 DAY

Contact us now

And get free access to our solutions!