“Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English” Did Microsoft exaggerate?




Greater than 1 minutes

This article did draw some attention, and people criticize the testing method, the test set… Well, I honestly don’t know about that.

Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English

https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf

The past 3 years I’ve been doing a lot of comparative tests on human and MT translations, using new tools and relying on skilled human evaluators, all linguists of some kind. The MT output we evaluate has been produced by various competing engines, so our observations are not only about the Microsoft Chinese > English language pair.

We’ve seen 3 things that are worth mentioning:

  1. MT is improving fast, but… sometimes the fluency is better than the accuracy. Depending on the type of text you are translating, this can work in your favor… or not.
  2. MT engines are learning much more of your work, so if you are using an MT system with feedback loop (like LILT), you are gaining much more profit than before. Of course, this only has value if you always translate similar texts, or if you work for the same customer(s).
  3. Editing MT output, is often less of a hassle these days: just replacing, moving or adding a word(group). You would believe (and our customers may have heard) it is “less work”, but that may not always be true: in the past it was much easier to spot mistakes. Today the fluency may be misleading… Our tools register time when people evaluate the translation quality, and we see that better MT output does not mean humans can decide faster about the quality. Spotting mistakes is still brain-work.

Finally: not all neural MT language pairs have improved the same way, and as ever: all depends on the document you need to have translated. Some documents are just not fit for machines.

As professionals, we need to be informed, and know there is some truth in a maybe too positive message. But the trend is clear.

Gert Van Assche

About Gert Van Assche

At Datamundi we're paying a fair price to linguists and translators evaluating (label/score/tag) human translations and machine translations for large scale NLP research projects.

3 thoughts on ““Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English” Did Microsoft exaggerate?

  1. It’s highly unlikely that they exaggerated. Here are two other articles that review the Microsoft announcement:

    Microsoft defined “Human parity” as when a human translator achieves equivalence to the machine. A bit backwards, don’t you think?
    link to linkedin.com

    If (Microsoft’s “human parity”) definitions hold for incompetent translators and judges, MT has been at human parity since the 1950s.
    link to linkedin.com

    Report comment
      1. Sorry, Gert. I let this one slip through the cracks. No contradiction to what your points.

        It’s unlikely that Microsoft exaggerated about human parity. They translated ZH->EN. Then they used incompetent English skilled people to evaluate the results.

        It easy to achieve human parity when the humans are incompetent. That’s nothing new. MT’s been at parity with incompetent humans for decades.

        Report comment

Leave a Reply

The Open Mic

Where translators share their stories and where clients find professional translators.

Find Translators OR Register as a translator