Feeding the translation robot

Issue: *
Your Name: *
Your Email: *

Details: *

Greater than 7 minutes, my friend!

Everybody who has ever had their hands on a machine translation task knows that the output of a translation engine in most cases compares badly to the work of a human translator. But although the lack of quality in translation engine results can give reasons to laugh, translators can still influence future output. Considering the pathway and approach to great future results is not only useful for later but offers some insight into our current translation practices as well.

Machine translation as a resort for the future

One of the arguments in the discussion among translators on the rise of machine translation is that translation engines are unable to deliver creative translations. If translation engines are able to translate documents, it is supposed they can only do short documents or repetitive tasks. Until now they simply could not cope with marketing texts or any other texts where translations could not be logically or statistically determined.

It is understandable why why translation engines until now have not been able to convince the translation community of their proficiency. Today most of them are based upon statistical machine translation (SMT), a technology that uses algorithms to calculate the best possible match for a sentence. Put simply, in SMT the translation engine first makes an overview of all possible words, word combinations, and their translations in a massive corpus. It then uses those input data to calculate a translation. This mathematical model offers no room for creativity: if a = 1 in math, it cannot be 2. So if euro in an English text is dollar, it can never be translated as pound (just to state an example that is understandable for everyone). SMT will therefore always stick to the rules and never add the flair and puns that can be found in perfect creative translations.

Researchers and companies alike are now putting their money on Neural Machine Translation (NMT). NMT is a hot topic nowadays, with Google claiming that ‘in some cases human and [Google’s Neural Machine Translation] translations are nearly indistinguishable’.
Still, the majority of translation engines do not make use of NMT for financial and logistical reasons, while Google’s claim is highly contested by industry professionals. Fuzz and fire in the camp of MT professionals will ensure that human translators are still in the lead.

The lack of promising results in terms of creativity and the current lead of human translators in most fields does not mean that we should rest on our laurels however. Working with many technical clients and agencies I see that more and more companies are doing experiments with translation engines. Some of them are starting out of the blue, investing in a machine framework, using a cloud operator or building one themselves, while inputting all the content they can find. Others have already progressed somewhat, fine-tuning their machines with ever fresh input content and feedback from users. Although it will still take years before MT has a massive market share and is outpacing human translators, I cannot deny that it is going on and slowly progressing.

It will therefore not be a bad habit to start looking to translation engines and learning how to use them. Stepping on the MT train at a later stage is still possible without taking too much risk, but the earlier one gets involved in and used to machine translation, the better s/he can influence future output results.

Influencing the output of translation engines

One way to influence the output results is by taking care of the input yourself. Only by translating content that is used to train translation engines can you make sure that you will reach the quality level you wish to. If the translation engine is set up properly you can ensure that future translations by a machine will stick closely to your style and won’t require much in the way of costly and time-consuming edits.

At the same time it must be said that training translation engines to suit your style and needs is only possible to the maximum extent if you own a translation engine or if you are the only translator for a particular client and/or domain. If you are part of a team of translators it would be much more difficult to influence translation engines. You will therefore never benefit that much from the translation engines you helped to train. At the same time you risk seeing your client move and use your valuable input while never making use of your translation services again.

So translating for translation engines has the most benefit and the brightest future when you own one. If you are considering adding machine translation to your skills in the future, it might be best to consider investing in your own engines. That is why I invested in Slate Desktop (review) last year. With Slate Desktop I can create my own translation engines offline and retrain them again and again to improve the quality of the machine translations until that quality is close to my own translation style.

Of course there are alternatives, like SDL Language Cloud. My preference for Slate Desktop then was based on the fact that it does not rely on the cloud and therefore to me it seems less vulnerable to data theft.

Lessons from a translation engine

The very fact that I started using Slate Desktop taught me some important lessons. The first was that creative translations are rendered useless by an algorithm and that engines will not easily be able to cope with creativity in translations. In the past I had always translated creative texts directly with my CAT tools while now I use my translation memories to generate a translation engine. That engine, however, did not know how to make use of the texts. It was simply unable to translate creativity and flair into a fluent sentence because it only used a mathematical model for translation. This resulted in useless sentences with multiple nouns and exclamation marks that completely missed the point (or puns).

The second lesson I learned was that using synonyms did not work well either. Whereas in ‘translations with flair’ a synonym was sometimes the best option to convey the message, my engines got stuck on that flair.

A third lesson Slate Desktop taught me was that you can only train engines well with highly specialized translation memories. Because I often used a general translation memory to store all my creative translations, that way of working proved to be completely useless. Creative translations are simply too varied to be useful for a mathematical translation. On the other hand, the machine translations sometimes offered help in that the sentences generated contained words I did not come up with myself and which I sometimes had not used for years. In that way the engine threw up alternatives I could use to bring my translation to even higher levels.

Translating for machines

The only way to circumvent the above problems is by adapting your translations to the logic of a translation engine – which actually is a 360 degree change in the approach to translation. Indeed, translations should have style and have to be written as if they were not translations. Translating your texts so that mathematical algorithms can deal with them is then really a bit awkward. Yet this is a great approach to make the most out of translation engines.

If you are using a translation engine (or plan to use one), it would be good to take note of how it produces translations. Are they awkward, illegible, or utter nonsense? You then have a chance to influence future machine translation results by adapting your translations to the logic underlying the engine. Basically it boils down to the following points:

Make sure you do a literal translation. Literal does not mean that the syntax of the translation should mirror the syntax of the source text, but that you leave as much flair and creativity as is possible.
Do not split translations into more sentences than is strictly necessary. Sometimes it is unavoidable, and your translation engine will learn how to deal with it, but it may still have difficulties with this even after long training.
Translate all words differently. In many languages there are synonyms and other words that in particular contexts can be translated identically or just differently. Make sure that every word has its own particular meaning to avoid nuanced differences in machine translations.
Make sure that every tag is in the right place. Slate Desktop does not place tags in the translation, but other translation engines do. By positioning tags in the right way you make sure that the engine will do it itself in the future.
Use a specific translation memory for each and every domain (or client, but domain seems to work better). Every domain has its own specialties and oddities, and training an engine with a specific memory will avoid confusion in its ‘brains’. Indeed, a ‘nut’ in technical documentation is entirely different from a ‘nut’ in a food recipe, isn’t it?

Back to creativity

One might argue that this approach is a genuflection to translation engines. Indeed, if this is how to approach translation for the future, creativity will die and machines will win. That, however, is only a part of the truth. As with each translation, a creative translation – or even transcreation – has to be checked and edited after it has been translated. That is the same approach you should use after starting to use machine translations. After you have trained your translation engine and obtained satisfying results (i.e. legible sentences without too many editing requirements), you can safely let it loose on your translations. As soon as it is ready for it, you can output your translation and start the creative process. Of course, for small tasks avoiding the machine translation step can save you time, but for larger tasks this approach may work well. Simply edit and adapt the robot’s output to give it your human touch.

And never feed your creativity back into the robot. Robots simply cannot cope with creativity. Period.

13 thoughts on “Feeding the translation robot Thoughts on translating input for translation engines”

Elena Alieva says:

April 5, 2017 at 14:04

Hi Pieter, thanks for sharing, this is all really interesting!
I can’t imagine feeding a text into a machine and getting decent results. Though maybe it depends on the language pair. Google has been able to translate Ukrainian to Russian pretty well for years now.
Have you already obtained any “satisfying results” or is it a work in progress still?

Log in to Reply Report comment
1. Pieter Beens says:
  
  April 6, 2017 at 01:40
  
  Hi Elena, thank you for your feedback.
  Translation engines are really on a diet. If you feed them the wrong ingredients they will vomit, and if you give them an overload it is even worse. In my experience until now only a decent meal will do the trick, but even then quality is not guaranteed. A simple comma or change in word order can raise entirely different results. The best results until now I have seen with manuals, with my engine sometimes delivering great and flawless results. However, the results are differing greatly from segment to segment. I suppose I have to train my robots a lot in order to get better results. So that’s a matter of time.
  
  Log in to Reply Report comment
2. says:
  
  April 12, 2017 at 11:56
  
  Hi Elena. Pieter was one of our first customers. He had the confidence in us and the faith in a then-nonexistent Slate Desktop to buy at a pre-launch promotional discount with a no-refund policy!!! He insisted to be the first to publish a blog back in Feb last year. There was no way I was going to let him down.
  
  I remember one email from Pieter that he was not getting good results he had hoped for. I was worried. I offered to review his work and coach him through his troubles, but he insisted on doing it himself. Except the few times in the early days when we had to step-in to fix some serious bugs, Pieter’s review here is his own work and a total surprise to me. I’m happy he found his own way to get through the dark times. I’m proud that he’s our customer.
  
  I wish I could guarantee that Slate Desktop will give you good results from your TMs in your language pair in your field of work. Sadly, no one can do that. You have to build your own engines and test them.
  
  Today, we offer a 30-day money-back policy — no questions asked. So, I can guarantee that we will be here to help you through the process and we’ll refund your money if you request it.
  
  Log in to Reply Report comment
3. says:
  
  April 15, 2017 at 23:14
  
  Elena, I forgot to mention. Ukrainian is not a supported language now, but I think we can readily add it. Let me know.
  
  Log in to Reply Report comment
  1. Elena Alieva says:
    
    April 17, 2017 at 17:49
    
    Thank you for your reply, Tom! Very kind of you to suggest adding another language just because I mentioned it here but no, Ukrainian isn’t one of my languages (that’s why I needed Google Translate for it) 🙂
    I’d definitely think about using Slate Desktop. Do you plan to develop a Mac version?
    
    Log in to Reply Report comment
    1. says:
      
      April 17, 2017 at 20:58
      
      Thanks, Elena. I think I need to backup just a bit. You need to convert TMs to an engine before Slate Desktop create suggestions. So, if you don’t have RU<>UK TMs, SD won’t work for that pair. That’s why it’s a professional’s tool.
      
      Re Mac, we plan native OS X support in version 2.x. For now, we have reports that customers who are run Trados Studio and memoQ on OS X (with Parallels Desktop, VMware Fusion, or VirtualBox) can also run Slate Desktop.
      
      Log in to Reply Report comment
Olayemi Olabenjo says:

April 6, 2017 at 03:27

Although i sometimes use Google Translate to help me in the search for vocabularies while translating, i have never considered MT in the light of this write-up. The article is quite interesting, and i am considering trying out some of the ideas i got from it. My only concern is that the languages i handle are ladened with diacritics, synonyms and tonal marks, and i keep wondering if a machine would be able to do justice to the process of translating from and into the languages.
Kudos to this author for an insightful article.

Log in to Reply moderated
1. Pieter Beens says:
  
  April 6, 2017 at 03:36
  
  Thank you for your comment Olayemi!
  The main concern is that you need to feed large specific TMs in order to realize the best possible output. MT will still stumble on synonyms etc. but I bet there is no such thing as perfect machine translations. As for synonyms: literal translations work best and you need to edit the machine translation afterwards in order to make the best of it. First and foremost you need to invest in your own engine as feeding Google Translate or other engines is difficult and useless (in that your work will be destroyed by other users).
  
  Log in to Reply Report comment
2. says:
  
  April 15, 2017 at 23:23
  
  Olayemi, what languages do you work with? The tone markers and diacritics are never a problem. Synonyms depend on being present in your TMs or manage through terminology files. We support CJK, but not Thai, Khmer, Laotian, Burmese. Tom
  
  Log in to Reply Report comment
  1. Olayemi Olabenjo says:
    
    August 15, 2017 at 01:29
    
    Hi Tom, i work in Nigerian languages; Hausa, Igbo and Yoruba, from and into English/German.
    
    Log in to Reply Report comment
    1. says:
      
      August 16, 2017 at 07:04
      
      Thanks, Olayemi. Sorry we don’t support these languages with Slate. The “kernel” MT component will work with them. The “tokenizer” tool separates punctuation/symbols/etc from words. Each language has it’s own rules about how to break these apart. Contact me through my profile page if you’d like to experiment a little.
      
      Log in to Reply Report comment
says:

April 12, 2017 at 08:00

Pieter, thank you for this landmark perspective. You’re a pioneer among the growing number of translators who are benefiting from this new desktop technology. It’s nice to see you experienced the same conclusions as the academic researchers describe in their reports with “all those lines and numbers” (George Carlin’s Hippy Dippy Weatherman). That is, some types of work, like creative writing in literary works and marketing campaigns, yield inferior results with this technology. We have a longer list on our website under the “Domains” section.

As you said, companies are experimenting. I created Slate Desktop so translators can fight fire with fire. I’m starting a whole new section on our support site with articles that describe improvement strategies for expert users. Fortunately, we’ve designed the software for beginners to experience good results from the start.

Log in to Reply Report comment
Biljana Stojanovic says:

October 24, 2017 at 07:41

Great post, Pieter! Very interesting and useful explanation. If we have to train our own translation engines, it would be better to start as soon as possible.

Log in to Reply Report comment