What is a parallel text corpus?
Like most machine-learning systems, machine translation (MT) requires massive amounts of data to produce intelligent results.
A parallel text translation corpus is a large and structured set of translated texts between two languages. Machine translation algorithms are often trained using parallel corpora created by human translators in order to achieve high-quality output.
Gengo can source large amounts of parallel text data across 70+ language pairs for companies seeking to improve or develop machine translation engines. Our crowd of 22,000+ human translators will deliver the volume you need to build and train an effective machine translation system.
Why Gengo?

Scale
We’re able to quickly prepare massive translation datasets with 500K+ segments per language pair with minimal lead time.
Quality
All data is translated by human translators (no PEMT), cleanly segmented, and aligned for easy input into your system.
Value
We offer clear, competitive per segment pricing depending on the volume and language(s) you need.Content categories
Looking for a parallel translation corpus for a particular type of content? Gengo segments content into 23 different categories:
- Art & Entertainment
- Automotive
- Business & Industrial
- Human resources
- Education
- Family & Parenting
- Finance
- Food & Drink
- Medicine, Health & Fitness
- Hobbies & Interests
- Home & Garden
- Law, Govt & Politics
- News
- Pets
- Real estate
- Religion & Spirituality
- Science
- Retail
- Society & Culture
- Sports
- Style & Fashion
- Technology & Computing
- Travel