What is a translation corpus?
Like most machine-learning systems, machine translation (MT) requires massive amounts of data to produce intelligent results.
A translation corpus is a large and structured set of translated texts between two languages. Machine translation algorithms are often trained using datasets created by human translators to achieve high-quality output.
For companies seeking to improve their machine translation engines, Gengo can source large amounts of translation data across 70+ language pairs. Our crowd of 22,000+ human translators will deliver the volume you need to build and train an effective machine translation system.
We’re able to quickly prepare massive translation datasets with 500K+ segments per language pair with minimal lead time.
All data is translated by human translators (no PEMT), cleanly segmented, and aligned for easy input into your system.
We offer clear, competitive per segment pricing depending on the volume and language(s) you need.
Looking for a particular type of content? Gengo segments content into 23 different categories:
Machine translation retraining
We can identify and correct errors in your machine translation output to produce natural, error-free translations.
Need a particular language pair or content type? We can create tailored corpora built specifically for your system.