Machine Translation: Road to Hell or Heaven Sent? Machine versus human translation




Greater than 5 minutes, my friend!

The other day one of my mining clients rang me and said: “Tea, we have 1,000,000 words here in Brazilian Portuguese and I need them translated into English within a few days – I need to know what it says.

In the past I would have quoted the fabulous words of Darryl Kerrigan, played by Michael Caton in the wonderful Australian film The Castle. Darryl’s abiding claim to fame and his response to almost anything was: “You’re dreaming.”

A million words? In a couple of days? Was my client out of his mind? I can work magic but there are limits! Well, you might think I was out of my mind but instead of channeling Darryl, I didn’t bat an eyelid and just said: “Sure, let’s draw up the schedule and get started ASAP.”

The text consisted of request for tender descriptions for the mining industry and my client, a mining corporation, wanted to know if any of the tenders were relevant to them. Time was running out and in the past they would have simply missed out in a potential tender opportunity.

Machine Translation (MT) combined with post editing was the only solution and is gold standard when “for information only quality” of high volume is required.

Err…you mean… you pasted the text into Google Translate? No, surely not! For intelligent use of MT has absolutely nothing to do with Google Translate for us language professionals.

As I get asked daily about machine versus human translations, I thought I’d give you some handy pointers about MT, what it really is, when to use it and its benefits and pitfalls.

What is MT?

keyboard-300x125

Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as Portuguese) to another (such as English).

Why is this so difficult?

To process any translation, human or automated, the meaning of a text in the original language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word-for-word substitution. A translator must interpret and analyse all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax, semantics etc., in both languages, as well as familiarity with each local region.

books

What is Rule-based versus Statistical MT?

Rule-based machine translation relies on countless built-in linguistic rules and bilingual dictionaries for each language pair.

Statistical machine translation utilises statistical translation models whose parameters stem from the analysis of monolingual and bilingual corpora. You can also combine them and have a hybrid MT system.

For us, MT is mostly a customised engine fed with high quality data (e.g. millions of translated words of, say, mining terminology) hosted on a private server.

What’s the quality like?

You know the old principle: Garbage in – Garbage out. Your computer output is only as good as the quality of the data you have been feeding your engine. In our case, a couple of million words of previously translated mining material had been fed into the engine and still the text needed to be “cleaned up” by so-called “post editors”, qualified translators or linguists who are trained for post-editing.

scrable-292x300

So, it’s not publication quality?

I always say: Quality is what the client defines as quality. In this case, my client required the translation “for information only” purposes. The first part required “raw MT output only”. They needed to run a keyword search to isolate any relevant calls for tender. The quality was good enough for this, certainly neither for publication nor for easy reading. Once the small number of calls for tender were identified that were at all potentially of interest, the rest could be discarded immediately and it left us with only 8% of the text to look at. These 80,000 words had to be “cleaned” up by our mining translators, i.e. post editors so they made sense. This was so-called “Light Post Editing”, still for information purposes, hence still no publication quality, but it suited the purpose.

What do Post Editors do?

The linguists that are “cleaning up” the raw MT text output have specific instructions according to the degree of post editing (PE) required. Light, medium or full PE, they might just ensure correct use of terminology and not worry about style, grammar and preferences, or they may re-translate some sections. In this case, the “lightly post edited” section (the 8%) revealed that only two calls for tender were relevant and these two sections of 25,000 words received “Full Post Editing”, the client was given these and could respond to them in time.

What kind of texts are suitable for MT?

Companies like Bosch and Siemens have been using MT+PE for many years. Their technical writers have specific pre-editing guidelines on how to write, so their documentation is suitable for MT in the first place: Avoid long sentences, use language logically and precisely (adhere to the literal meaning of words), avoid idioms, figurative language and cultural references, don’t omit words, and adhere strictly to punctuation rules.

When should you not even consider MT?

Yes, you guessed it. Literary and any kind of rhetorical texts are not suitable for MT. Life sciences/medical and legal texts as well as marketing and sales & advertising copy and corporate communication in general just to name a few.

What’s a smart engine?

Your MT improves over time; therefore it is a long-term investment. Every time the post editors correct the machine translated text, this is fed back immediately into the MT engine and the output is improved instantly for the rest of the document.

How expensive is it?

This depends on the language combination, industry, volume and how much data including previous TMs (translation memories) and existing translations are available. You pay subscription fees for the MT engine + post editing. Price per 100 words for MT output differs per language but can be down to 20% of that of traditional translation. The real ROI does not come immediately. Time saving, however, is enormous.

So why not just use Google Translate? It’s free after all!

googletrans-300x187

Well, it’s not free if you connect it to your system through a plug-in – but that’s not the issue. Firstly, Google owns all content that is passed through Google Translate, so there goes your confidentiality through the window. We translators adhere to a strict code of ethics including confidentiality, and that would be lost immediately. Secondly, it’s not a private customised engine, but a general one that is fed with any data and unauthorised edits. It has come a long way and can show surprisingly good results in certain languages for certain sentences and will always remain a fantastic option for non-professional use by private people who want to get the gist of something non-confidential.

Back to my client… well, MT solution was heaven sent for them as they could submit the tenders on time; but it can surely be a road to hell if you rely on the raw MT output and you and your company lose credibility by providing ridiculous linguistic material.

And as far as The Castle is concerned – by keeping up with latest technologies and international industry best practices, constant innovation and being aware of what is fit for purpose, I don’t quote Darryl Kerrigan as much as I used to years ago.

You are dreaming? Not anymore! I seem to find language solutions for the impossible these days.

Tea Dietterich

About Tea Dietterich

8 thoughts on “Machine Translation: Road to Hell or Heaven Sent? Machine versus human translation

  1. Very interesting article, Tea! Thank you so much for sharing it here as well! You give a very comprehensive example where Post-Editing of Machine Translation can actually make sense.

    I wonder if this model will ever replace the traditional translation in some industries? What do you think?

    Report comment
    1. Dmitry, thanks for your comment. The translators will never be replaced by technology or new models, but translators will be replaced by translators using technology or translators applying new models.
      “Traditional” is a relative word. What is traditional for us, if not traditional for the new generation. Hence, we need to stay nimble and agile and adapt to new models and transition into emerging digitalisation with the richness and the benefit of historic knowledge, if that makes sense.

      Report comment
    2. “Traditional”… great point, Tea. Back in the dinosaur age of computers (ca 1993?), translation memories were the innovation and IBM Selectric II typewriters were traditional.

      10 years ago when SMT was new (a century ago in computer years), big agencies stuffed translation memories with raw SMT output and expected translators to sort the mess with old post-editing best practices created 50 years ago. I call this practice as the clip-art catalog style of translation. Remember the early days of computer graphic arts where Photoshop and CorelDraw came with a clip-art catalog of stick-art images? Pick one, touch-up and publish. Today, that could be described as “traditional” for some translators.

      I think we’re migrating to a new age where translators have direct access to the engine. Adaptive systems like Lilt are good for some things. Slate Desktop has its benefits. They are distinct. Some translators will like one over another. Overall, I think we’re moving in the right direction and translators have more choice than ever.

      Report comment
  2. Tea,

    Thank you for sharing your experiences with MT. It’s important to remember that each project interacts differently with MT, and there are many different MT systems.

    Google arguably delivers the least common denominator with one-size-fits-all suggestions that may or may not satisfy a professional’s needs. Your professional judgement is what’s important to makes the difference in your experience.

    I suggest that whether heaven or hell, discovering quickly and acting fast to use or abandon MT on a given project is better than being stuck in a perpetual purgatory of ambiguous results.

    Isabella Massardo published an MT review link to massardo.com where she also suggested Google is a satisfactory solution after a cursory quality review but no actual project user. Please be on the lookout for my guest post on her site (I’ll re-post here on TOM) where I show an actual apples-to-apples comparison between a personalized MT engine and Google’s results. This TOM post previews the upcoming complete analysis: link to theopenmic.co

    Report comment
    1. Hi Tom,
      Very pertinent references you make and comments. Thanks for sharing this. You also nailed it by saying that the key is discovering quickly if to use the MT output or discard it. That itself is a skill already and translator needs to ensure “it doesn’t do his/her head in to deal with that decision”, otherwise the benefit of time saving is not there for the smart translator and he will be shortchanged.

      Report comment
      1. Tea, I’ll repeat a customer experience you can lookup on our support site. Igor Goldfarb is an EN-RU patent translator who created a Slate engine with 4 years of his personal TMs. The average segment length is over 50 words. He reports that he times himself doing his work using memoQ. He reads source, reads Slate’s suggestion, comprehends both. Then he decides to make corrections (if any), or trash/translate from scratch. For 30% or more of is work (with very long average sentence length, nonetheless), he corrects/completes the segment and move to the next segment within 60 seconds. (Self-deprecating comment, I can’t even read English that fast!) In other words, output from his Slate engine made from his personal TMs requires extremely little or no correction for about 1/3 of his work. He doesn’t pay any subscription fees to Google or other online service because he owns his engine running on his own computer. So, it’s emotionally very easy to throw out bad segments because he didn’t pay for them. He has no special “data scientist” skills and It took him about 24 hours of work to make his first engine, and that one is what he reported. Stay tuned for my analysis of Isabella’s engine. We’ll publish on Thursday.

        Report comment
  3. Thank you for the useful article, Tea. However, this statement “Quality is what the client defines as quality” is arguable. Quality is either high or low, regardless of what the client says. I would rephrase it as “Acceptable quality is what the client defines as acceptable”. Anyway, I totally agree that translators should benefit from MT as much as possible to keep up with modern technologies.

    Report comment

Leave a Reply

The Open Mic

Where translators share their stories and where clients find professional translators.

Find Translators OR Register as a translator