Greater than 1 minutes
“Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English” Did Microsoft exaggerate?
This article did draw some attention, and people criticize the testing method, the test set… Well, I honestly don’t know about that.
https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf
The past 3 years I’ve been doing a lot of comparative tests on human and MT translations, using new tools and relying on skilled human evaluators, all linguists of some kind. The MT output we evaluate has been produced by various competing engines, so our observations are not only about the Microsoft Chinese > English language pair.
We’ve seen 3 things that are worth mentioning:
- MT is improving fast, but… sometimes the fluency is better than the accuracy. Depending on the type of text you are translating, this can work in your favor… or not.
- MT engines are learning much more of your work, so if you are using an MT system with feedback loop (like LILT), you are gaining much more profit than before. Of course, this only has value if you always translate similar texts, or if you work for the same customer(s).
- Editing MT output, is often less of a hassle these days: just replacing, moving or adding a word(group). You would believe (and our customers may have heard) it is “less work”, but that may not always be true: in the past it was much easier to spot mistakes. Today the fluency may be misleading… Our tools register time when people evaluate the translation quality, and we see that better MT output does not mean humans can decide faster about the quality. Spotting mistakes is still brain-work.
Finally: not all neural MT language pairs have improved the same way, and as ever: all depends on the document you need to have translated. Some documents are just not fit for machines.
As professionals, we need to be informed, and know there is some truth in a maybe too positive message. But the trend is clear.
It’s highly unlikely that they exaggerated. Here are two other articles that review the Microsoft announcement:
Microsoft defined “Human parity” as when a human translator achieves equivalence to the machine. A bit backwards, don’t you think?
link to linkedin.com
If (Microsoft’s “human parity”) definitions hold for incompetent translators and judges, MT has been at human parity since the 1950s.
link to linkedin.com
What are you trying to tell, Tom? Is it contradicting with what I wrote?
Sorry, Gert. I let this one slip through the cracks. No contradiction to what your points.
It’s unlikely that Microsoft exaggerated about human parity. They translated ZH->EN. Then they used incompetent English skilled people to evaluate the results.
It easy to achieve human parity when the humans are incompetent. That’s nothing new. MT’s been at parity with incompetent humans for decades.