Since machine translation started to be commercially available, commercial users of machine translation had the challenge to identify use cases where MT increases value for the respective company. Over the years, three major use case categories seem to have evolved:
The category of use cases that try to cope with the “big data” challenge is mainly trying to make large data available to more potential consumers of these data. The criteria on whether MT helps or not is directly derived by the behavior of the millions of users if interacted with content that was translated. But in any case, the financial impact is measured finally in “revenue”.
Google – case study
Google Translate is a translation service that provides instant translations between dozens of different languages. It can translate words, sentences and web pages between any combination of our supported languages. With Google Translate, we hope to make information universally accessible and useful, regardless of the language in which it’s written. When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation for you. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be. This process of seeking patterns in large amounts of text is called “statistical machine translation”. Since the translations are generated by machines, not all translation will be perfect. The more human-translated documents that Google Translate can analyse in a specific language, the better the translation quality will be. This is why translation accuracy will sometimes vary across languages.
eBay – case study
eBay has developed its very own machine translation tools to help fuel expansion in Russia. The tools allow Russian buyers to get accurate translations of eBay seller listing and communications in real time, making working with eBay much easier for international transactions. In 2013, eBay began to use machine translation in the eBay environment to enable people across country boundaries to communicate and deal with each other. eBay rolled out their first machine translation technology in January, translating Russian to/from English in real-time. Machine translation is used to match user queries and products in the inventory across languages (query translation), and to present the results to the user translated into the language of the user’s original query. This opens up new avenues for sellers around the world to reach global buyers, across language barriers. Currently, eBay translates millions of queries and query results daily per language.
Microsoft – case study
Bing Translator (previously Live Search Translator and Windows Live Translator) is a user facing translation portal provided by Microsoft as part of its Bing services to translate texts or entire web pages into different languages. All translation pairs are powered by the Microsoft Translator statistical machine translation platform and web service, developed by Microsoft Research, as its backend translation software. Two transliteration pairs (between Chinese Traditional and Chinese Simplified) are provided by Microsoft’s Windows International team.
Facebook – case study
Facebook’s latest acquisition could help it connect users across language barriers. It has just announced that it’s acquired the team and technology of Pittsburgh’s Mobile Technologies, a speech recognition and machine translation startup that developed the app Jibbigo. From voice search to translated News Feed posts, Facebook could to a lot with this technology.
Facebook tells me “We’ll continue to support the [Jibbigo] app for the time being.” Jibbigo launched in 2009, and allows you to select from over 25 languages, record a voice snippet in that language or type in some text, and then get a translation displayed on screen and read aloud to you in a language of your choosing.
Some References and Reads:
Machine Translation and Professional Translators Community
During the last couple of years, machine translation post-editing has become one of the hottest most discussed topics in the translation industry as evidenced by conferences, forums and webinars. What is the motivation driving this new found interest?
Translators are motivated to use machine translation output since the quality of engines has reached the point where using them leads to proven productivity gains, ranging between 30 and 300%. Thus, machine translation is becoming a commonplace productivity tool similar to translation memories. The emergence of several post-editing standards that are tied to the desired quality levels of the final output allows translators to have additional income opportunities translating content that previously remained monolingual.
Machine translation is still not perfect, and there are recurring challenging issues for post-editors such as lexical coverage, word order, compound formation, word form agreement, omissions and several more. However, a recent positive development, established feedback loops back to the MT engine developers and deployment managers, gives translators more confidence in future engine improvements.
Just as with human translation, post-editing throughput can vary and depends on:
- language pair
- content type & complexity
- domain knowledge
- quality requirements
- use of automatic QA tools
- quality of training data and reference material
AMTA is actively supporting machine translation as a productivity tool both for language service providers and freelance translators. In all of our past conferences, members of the professional translation community presented their findings on MT adoption as a part of the commercial user track (URL to the past conference proceedings). Synchronizing AMTA and ATA (American Translators Association) conferences for several years helped both organizations drive attendance and draw interest to the topic of machine translation as a productivity tool for translators.
Training professional post-editors
What is post-editing?
The “term used for the correction of machine translation output by human linguists/editors” (Veale and Way 1997)
“Checking, proof-reading and revising translations carried out by any kind of translating automaton”. (Gouadec 2007)
The choice of whether to translate from scratch (“human translation”) or post-edit machine-translated output is driven by the suitability of the source content for machine translation. <>To-date the professional translators community has reported to be achieving the best results with post-editing the more formal, organized and structured content types with repetitive syntactic patters and predictable use of terminology, which makes them easier for the machine translation engines to handle:
• Annual Corporate Reports
• Light Marketing (as opposed to “transcreation”)
• Software Documentation
• Software User Interface
• SEO (Search Engine Optimization) keywords
• e-Learning Content
• User Guides and Product Manuals
• Internal Corporate Communications
• Knowledge Bases
• Proposals / Draft Applications
The decision on which post-editing quality level to select is mainly determined by the visibility, perishability and the target audience for the content, or “utility”, which is the term lately adopted by the industry. The content utility also dictates the number of errors and the error type tolerance for the given content type. As outlined in the TAUS post-editing guidelines, it is essential that the post-editors are given very explicit and clear set of instructions that describe the desired quality levels.
At the moment two levels of post-editing are recognized as industry standard: “good enough”, often referred to as “light post-editing”, and “publishable”, where the final output quality is expected to be on par with the translation performed from scratch.
The table below is an example of two levels of post editing:
The best results are achieved when post editors and machine translation engine developers are in a continuos constructive dialog around the challenges translators are facing when post editing the MT output. This both allows the post-editors to build the sense of ownership of the engines, and helps the developers to tailor their engine roadmap to the actual needs of translators.
While the “adequacy” – related feedback helps with selecting the appropriate engine training data, the “fluency” and “readability” feedback helps with fine-tuning the core engine functionality, including the language-related and the engineering issues, such as handling of metadata and locale-specific conventions.
Recognizing the need for developing a skilled and qualified post-editors workforce, several major Language Service Providers and now publish their “how-to” introductory courses on the principles and best practices for post-editing machine translation output:
Industry Initiatives on Post-editing Machine Translation Output
Below are URLs to some of the industry initiatives relevant to the use of machine translation as a post-editing productivity tool. The list in maintained in a “work in progress” mode and is being updated regularly.
TAUS Post-Editing Guidelines (created in partnership with CNGL): general post-editing guidelines for “good enough” and “human translation level” post-editing. TAUS has also recently published guidelines on pricing post-editing work and measuring post-editors’ productivity.
TAUS Dynamic Quality Framework: a set of tools and methodologies for evaluating post-editors’ productivity, selecting a machine translation engine most suitable for a specific project and reviewing machine translation and post-editing errors in a structured environment.
QTLaunchpad: European Commission-funded collaborative research initiative dedicated to overcoming quality barriers in machine and human translation and language technologies.
2013 MT Summit Workshop on Post-Editing Technologies and Practice: a recent workshop on post-editing organized by Dr. Sharon O’Brien, Michel Simard and Lucia Specia (follow the URLs to their professional web pages to see more publications).
There are several LinkedIn groups dedicated to post-editing of machine translation output and other translation automation tools:
Automated Language Translation (MT/Machine Translation): the group focusing on discussions around the trends and developments in machine translation, with the members both from the development and user side.
Neural Machine Translation technology is progressing at a very rapid pace. In the last few years, the research community has proposed several different architectures with various levels of complexity. However, even complex Neural Networks are really built from simple building blocks; and their functioning is governed by relatively simple rules. In this tutorial, we aim […]
AMTA 2018 Read more...
Tutorial Presenters: Nicola Ueffing (eBay MT Science), Pete Smith (University of Texas Arlington) and Silvio Picinini (eBay Localization) Target audience: any person involved with the deployment of MT that is interested in the quality of the corpora data that will be at the foundation of the MT deployment. Roles include managers, linguists, engineers or scientists. […]
AMTA 2018 Read more...
Boston, Massachusetts, March 21, 2018 The goal of quality estimation is to evaluate a translation system’s quality without access to reference translations (Blatz et al., 2004; Specia et al., 2013). This has many potential usages: informing an end user about the reliability of translated content; deciding if a translation is ready for publishing or if […]
AMTA 2018 Read more...
https://sites.google.com/view/loresmt/ Statistical and neural machine translation (SMT/NMT) methods have been successfully used to build MT systems in many popular languages in the last two decades with significant improvements on the quality of automatic translation. However, these methods still rely upon a few natural language processing (NLP) tools to help pre-process human generated texts in the […]
AMTA 2018 Read more...
AMTA will be hosting a one-day Technology Forum and is looking for demonstrations of Commerical products Government systems Open source software Research prototypes Other cool new software In the following areas: Machine translation Computer Assisted Translation, including hybrid CAT/MT systems Any technology that supports translation (e.g., Language ID) There is no exhibit fee. Exhibiters coming […]
AMTA 2018 Read more...
Contact: Steve Richardson (firstname.lastname@example.org) AMTA 2018 is seeking proposals for tutorials on all topics related to MT. We are interested in tutorials presented by experts from the following categories: – MT Researchers and Developers – Commercial MT Users – Government MT Users – Translators using MT Any themes connected to MT research, development, deployment, use, […]
AMTA 2018 Read more...