Greater than 4 minutes, my friend!
On April 19 Google announced the introduction of the new Google Translate algorithm for English to Dutch translations. Soon it was flooded with critics from industry professionals and media alike. There seems, however, to be one point that is often overlooked: Google Translate now understands the art of merging words better than many Dutch-speaking citizens.
The new Google Translate
Back in April Google announced a new update of the English ↔ Dutch language pair for Google Translate, its translation engine. The update for English ↔ Dutch came after a major update was rolled out to some other language pairs that offered low hanging fruit for the giant technology company to reap. From April 19th onward Google Translate for English ↔ Dutch is offered based on Google’s self-described cutting-edge ‘neural translation’ technology.
According to Google, neural translation is much more than the old phrase-based system, because whole sentences are now translated rather than pieces of a sentence. ‘This makes for translations that are usually more accurate and sound closer to how people speak the language, especially when you translate a whole sentence,’ Google says.
The old Google Translate, based on phrase-based machine translation, was introduced back in 2006. In the meantime Google’s Translate team set out to achieve more, and invented the Google Neural Machine Translation system (GNMT). This method utilizes state-of-the-art training techniques to achieve large improvements. Between the phrase-based machine translation stage and the GNMT momentum Google’s researchers came up with many techniques to improve the technology. Some of them are mentioned in Google’s announcement of the Neural Machine Translation system. Google mentions the proposal to handle rare words by mimicking an external alignment model, using attention to align input and output words, and breaking words into smaller units to cope with rare words. These technologies did not make NMT fast or accurate enough to be used for production, but in April Google decided that GNMT finally was mature enough to be introduced to the masses.
Experiences with Google’s Neural Machine Translation system
The introduction of the update for Google Translate was surrounded by considerable buzz and got great attention from industry professionals and media as well. The image was created of a highly intelligent engine that was much better able to translate sentences. It was also one of the first instances in which the crowd could get its hands on an almost tangible product of machine learning. Indeed machine learning is an abstract topic for many people and Google Translate makes the results of machine learning visually available to a large crowd.
What also played a role in the media attention Google Translate received, were the hilarious fails and bloopers that had resulted in the past. The introduction of GNMT in The Netherlands almost implied that these fails and bloopers would finally come to an end and that Google Translate would yield comparable results to human translations. Both media and users alike became curious about the results.
Algemeen Dagblad, a Dutch newspaper, put Google Translate to the test. The results (in Dutch) can be read here. The editor used Google Translate to translate some test sentences by Google, which yielded good results. However, for a list of typical Dutch sentences the new neural machine-based translation still failed, producing hilarious sentences or results that simply found a mare’s nest (not ‘miss the shelf’, which is a literal translation for ‘de plank misslaan’ (‘finding a mare’s nest’) that was produced by Google Translate while writing this blog post). So, in the end Google Translate was still placed on the backseat by users and language professionals. As colleague translator Michele Hutchison says: ‘You can see progress when it comes to grammar and syntaxis, but not when it comes to understanding the content and meaning of a sentence’.
Colleague Els Hoefman did her own test at The Open Mic and came to more or less the same conclusions: ‘The new and improved version is definitely better with grammar and that seems a great step forward. The result is still poor, though.’
That is how the new Google Translate is rated now in The Netherlands: it improved a bit, but not by a giant leap. The results are still poor.
An overlooked aspect of GNMT
Since the introduction of the new machine translation algorithm to the public I have used it several times to do a quick lookup or to verify a term. In many cases I noticed a degradation of the translation quality. Since April 19th Google Translate does not necessarily produce determined or even fixed translations anymore. The translation engine seems to have become fluid, producing different terms for similar words. If it was already trustworthy, it lost that dependability back in April.
However, according to the results, Google Translate nevertheless made a leap forward. It finally seems to understand some basic grammar rules that are often not even understood by native Dutch speakers. Both Hutchison and Hoefman mentioned the improvement in their feedback on Google Translate’s results but in general the attention to the translation fails is better than the attention to what was better before.
Whatever the technology behind the neural machine translation algorithm it finally understands how two different words in Dutch should be merged. Basic words in Dutch, like ‘bankrekening’, are split into several words in English (i.e. ‘bank account’). In the previous version of Google Translate, the translation engine often split up words that it didn’t understand, producing a Dutch translation according to English grammar rules. Now, Google Translate merges word combinations even if it produces a whole new word, which in Dutch is not a problem. Professionals agree that merging words to create a new word works better according to grammar rules than splitting up words to avoid a spelling error. Thanks to the natural feeling results, this grammatical improvement often goes unnoticed, but it is a clear example of what technology and knowledge of a language can do. The current results in no way give rise to expectations that Google Translate will produce human-like results in the near future. GNMT nevertheless shows its first insights into the rules behind a language. In a culture where young people do not comprehend language rules anymore GNMT has the potential to be a force with which to reckon.