Big Data and Little Translators – a Marriage Made in Heaven or Hell?

Greater than 10 minutes, my friend!

Every few months, the ATA publishes an article representing the views of “the translation industry”, or what the ATA euphemistically calls “stakeholders”, i.e. members of the American Translators Association who are not translators. Since you don’t have to be a translator to be a member of the ATA, the generic term du jour for the many non-translating members of the American Translators Association is at this point “stakeholders”. I wrote several posts on my silly blog on the subject of propagandistic articles written by these non-translating stakeholders, articles that seem to be aimed mostly at putting translators in their proper place, namely as obedient peons of “the translation industry”, because articles of this type are unfortunately often found in the ATA Chronicle. For example in this post which is now already almost five years old, I compared the propagandistic nature of these articles in the ATA Chronicle to the predictable propaganda saturating our establishment media.

Although the origin of the term Big Data relates to automatic correlation of market trends, customer preferences and other characteristics useful for businesses, the name “big data” itself seems to have been designed to impress potential “translation industry” stakeholder clients and to intimidate little translators at the same time. If I am not mistaken, it is also supposed to replace what used to be called “content tsunami” a few years ago. The name suggests to me, and probably to most people, Orwell’s Big Brother and his style of strict enforcement of law and order in a pliant population scared to death by the omnipresent eyes and ears of Big Brother who is armed with big data and other diabolical tools.

My simple mind naturally cannot even begin to understand all the cool details and fascinating implications of WHAT BIG DATA MEANS TO THE LANGUAGE SECTOR, as per the title of the last section in the article by Don DePalma in the July/August issue of the ATA Chronicle.

Perhaps that is why I would find the numbers, statistics, and conclusions of Mr. DePalma for us little translators, quite frightening … if I did not find them at times also more than just a little silly.

As the late, great Miguel Lorens, Spanish-to-English financial translator whose blog was so enjoyable, wrote in 2012 in an article titled “Future Schlock: Common Sense, Nonsense and the Law of Supply and Demand”, basic laws of physics are sometimes simply ignored in Don DePalma’s analyses of things present in order to arrive at the desired conclusions about the cunningly predicted greatness of things to come. This is how Miguel Lorens illustrated DePalma’s approach to reality in the post he wrote four years ago:

” […] Or imagine your friend giving you a tour of his new five-bedroom house. ‘And this is the guest room. However, the law of gravity doesn’t apply here, for whatever reason.’ You peer inside and see a bed, a dresser and a cocker spaniel floating around in zero gravity. What would you do? Would you follow your friend out to the garden to have cocktails as the furniture and the dog float round and round? Or would you devote your entire life to finding out why the law of gravity doesn’t hold in your friend’s guest bedroom?”

The question of, “Where Does Language Fit In with Big Data?” (the somewhat overbearing title – overbearing to language – of the ATA Chronicle article), is of course much less important to us than the question of where do we, little translators, fit in with Big Data?

You probably guessed it already – little translators can only fit in with the Big Data if they are obedient enough to listen to the wise voice of the “translation industry” and become diligent, indefatigable post-processors of the gunk spat out by Big Data.

As Don DePalma puts it, “Big data has increased the volume of content dramatically. At the same time, automated content enrichment and analytical tools based on big-data science, [emphasis mine; I did not know that there was such a science], will enable the training of more sophisticated tools to help humans translate the growing volume of content and enable machines to close the yawning gap between what’s generated and what’s actually translated [emphasis mine again].

Let’s think about this short and relatively simple sentence for a moment. Ok, so the volume of content has dramatically increased. It sounds like an introduction to revolutionary change, but is this change more revolutionary than for example the change involving a dramatically increased volume of data when the Chinese invented paper, replacing stone tablets, bones and tortoise shells, when Guttenberg invented the printing press, when the internet was invented by DARPA (Defense Advanced Research Projects Agency), or when the telephone became a tiny portable computer potentially containing hundreds or thousands of apps? And so on and so forth.

More stuff is out there, so new technologies are being developed to do more stuff with the stuff that is out there. When you put it like that, it sounds more like something that has been happening for centuries … because that is all it really is.

Although there is an enormous amount of content floating around in the blogosphere, to select just one part of what is identified in the article as belonging to Big Data, (and some of this content is very interesting), this also means that a lot of it is stuff out there that is completely useless and that will probably never be translated.

So how can one use this content in statistics identifying what needs to be translated and put a price tag on it (as part of Big Data from which profit can be extracted by “the translation industry”) if it is identified by using “automated content enrichment and analytical tools” and such … unless somebody, such as a company CEO, determines that all of the largely useless PR content of a propagandistic corporate blog needs to be translated. Sure, this is likely to happen, but if this content is translated with machine translation and then post-processed by us little translators, will it increase sales or improve the brand image of the company, or will this do the opposite of what was intended?

And if automatic content enrichment (whatever that is) and analytical tools (whatever they are) are used instead of a well functioning human brain to select content to be translated (instead of a CEO’s decision), these tools are guaranteed to pick unnecessary content to generate still more propagandistic PR nonsense, only this time in other languages.

So far I have published exactly 620 posts on my silly blog; this will be my 621st post in five and half years. The reason why about four or five of them have been translated so far into four or five languages with my permission, and probably more without my permission, is that other translators/bloggers wanted to share the content of what I am saying with other translators and bloggers who do not understand English.

It had nothing to do with what is called Big Data, or content tsunami, or automated content enrichment and analytical tools based on “big-data science”.

Instead, it had to do with Small Data, data that is selected by human intellect instead of an algorithm as being important enough to be translated and worth the time of a human translator to do so, even without compensation for a significant amount of work.

To come back to the ATA Chronicle article, the final conclusions in Don DePalma’s article are actually quite hopeful when it comes to prospects for human translators … although I suspect that he might have cunningly incorporated them into the article largely to placate us little translators who read the magazine, and to get us used to the idea that Big Data is really good for us:

Even if machines generated the lion’s share of translation [meaning pseudo-translation, Mad Patent Translator] and humans did a smaller percentage, the sheer absolute volume of human translation would increase for high-value sector such as life sciences, other precise sectors, and belles letters. In turn, the perceived value of human translation would increase. Why? Because when you bring in a live human, it means the transaction is very, very important …

As interlingual communication becomes transparent, we predict that the number of situations where high-value transactions occur – i.e. those requiring human translators and interpreters – will go up, not down. If provider rates increase and companies use MT to address a larger percentage of their linguistic needs, human translators could benefit as they are paid well to render the most critical content supporting the customer experience and other high-value interactions […]

Although it has not happened yet, we speculate that MT driven by these phenomena could remove the, “cloak of invisibility” from translators, giving them greater recognition and status.”

Up until now, the overall impact of what is referred to as language technology in the working environment for human translators, meaning mostly Computer Assisted Translation (CAT) and machine translation, has been largely negative.

Initially, translators were promised a pie in the sky in the form of CATs that would dramatically increase the number of words that they would be able to translate per day, leading to a much higher compensation for their work. Instead, they had to spend a considerable amount of money – I understand Trados software costs 800 Euros – and an even more considerable amount of uncompensated time while learning and using this software, only to be told in the end that they must provide discounts to “the translation industry” for what are called “fuzzy matches” and “full matches”, a disgraceful invention of “the translation industry” that amounts to nothing more and nothing less than extortion and wage theft.

The unfortunate fact that the rates paid for translation to translators by most translation agencies are generally lower than 10 or 15 years ago is clearly due not only to globalization and corporatization of “the translation industry”, but also to the impact of “fuzzy matches” and “full matches”.

I doubt very much that MT and Big Data will “remove the ‘cloak of invisibility’ from translators, giving them greater recognition and status.” The opposite has been happening so far, as “the translation industry” is definitely interested in keeping translators invisible rather than making them more visible. That is why “the translation industry” also invented the term “Language Service Provider” and replaced the term “translation agency” with the acronym “LSP” to make it appear as if it were translation agencies who are in reality acting as brokers, not the translators, who provide the languages services.

I don’t know how Don DePalma came up with this idea, but it makes no sense to me.

But I do agree with his other conclusion, ” […] We predict that the number of situations where high-value transactions occur – i.e. those requiring human translators and interpreters – will go up, not down.”

I am also hoping that the following conclusion may be correct, “If provider rates increase and companies use MT to address a larger percentage of their linguistic needs, human translators could benefit as they are paid well to render the most critical content supporting the customer experience and other high-value interactions”.

If I try to project this expectation on my field, namely patent translation, I can see how technological changes, mostly the availability of machine translation for most types of patent applications, have gradually changed the type of materials that I am translating now as opposed to what I was translating some 15 years ago.

Since machine translations of patent applications were not available 15 years ago, there was more work available to me in that area than today, and I believe that some, possibly a substantial amount of this work, were translations of patents that were not really required.

It was impossible to know anything about the content of these patent applications (if they were for instance referenced as cited literature in a search report), they had to be translated for example for litigation purposes, whereas it is now possible to take a look at an MT file and the figures in a patent application to eliminate patent applications that are not directly applicable to the issue at hand, even if the original document is in a language that is completely incomprehensible to most people, like Japanese.

I believe that this is why a higher percentage of utility models (“lesser inventions”) need to be translated now, because machine translations are not available for utility models, either in Japanese or in German. Moreover, since older Japanese utility models are often poorly legible, it is basically impossible to convert a PDF format, (the only format in which they are available) to a digital file that would not make any sense whatsoever once it has been run through a machine translation program.

Another change that I see in my field is that while the number of translations of existing patent applications that are needed for prior art research has decreased to some extent, again probably due to the availability of machine translations that are “good enough” to establish the basic points of a patent application, the number of patent translations for filing, for example of German, Japanese and French patent applications to be filed in English, is increasing because more and more patents are being filed all the time.

So how would I answer my own question: is marriage between Big Data and little translators a marriage made in heaven, or hell?

Well, I think that it depends on whether we, the little translators, allow Big Data to abuse us in this marriage. If we simply submit ourselves to Big Data’s demands and to the capricious whims of “the translation industry” and go along with whatever is demanded from us by a nasty spouse, it will be a marriage made in hell, as for the most part, we will be performing and thought of as mere post-processors of the MT gunk who can also do actual translations as required.

But it could be also a marriage kind of made in heaven for translators, if we, the little translators, forget the notion that we are powerless against the brute we married and concentrate on the most critical content, i.e. the highest value-added content dug out from infinite oceans of Big Data by ” big-data science”, assuming there is such a thing.

In conclusion, my advice to most translators is: you don’t need to be married to Big Data or to “the translation industry”. Stay or become single and try to work mostly for direct clients. Why stay in an abusive relationship with the “translation industry” or Big Data when with a little bit of work and thinking, you can eventually become happily divorced.

Steve Vitek

About Steve Vitek

Translation of patents from Japanese, German, French, Russian, Czech Slovak and Polish since 1987. Blogs at, website at

4 thoughts on “Big Data and Little Translators – a Marriage Made in Heaven or Hell?

  1. Interesting post, Steve! I don’t really understated the concept of big data and what it has to do with translation and to be honest I don’t really care. As I work in a relatively creative field (video game localization) this trends don’t affect me much. Well, maybe they will eventually have some effect on my sector, but I’m not sure I’ll be left without work. On the contrary, I see globalization and increase in the volume of content as an opportunity for me to present myself as an expert, whose role is working closely with clients and helping them translate what truly matters. I hope that the role of professional translators will transform even more in the future and become even more significant. Because only humans can make sense in all those Big Datas and Content Tsunamis. You can’t replace human brain and creativity.

    Report comment
    1. I would like to agree with your conclusion, Dmitry, that you don’t have to worry about Big Data, but let me play Devil’s advocate for a moment here.

      Let’s say that all plots of all computer games that you are now translating into Russian are automatically “translated” with MT into a whole host of languages, including Russian, and sent to employees in Russia who may then be asked to localize the text into Russian into Russian as a part of their job. Even though these low-paid employees may not know any English, if they are good enough writers in Russian, and if the MT is good enough, it would probably work. And employees in Russia must be quite a bit cheaper than free birds like you or me.

      If somebody can make money on such a concept, you will be out of business pretty soon, my friend because the business world is driven basically by one thing and one thing only: greed.

      As the Bible puts it:”Greed is the root of all evil.” It also happens to be the basis on which our economic system is built. There is no such thing as common good when it comes to business, not anymore.

      Or as Woody Allen put it: “Money is better than poverty, if only for financial reasons.”

      Report comment
      1. It might happen, Steve, but here’s one very important piece of the puzzle that “translation industry” often forgets about: end consumers of such translation. And let me tell you that: Russian gamers have extremely sensitive bullshit radar. In fact, it is so sensitive, that they even created a YouTube show with hundreds of thousands of views (!) where people who have no ties to the world of translation and localization make fun of all the numerous epic fails in the translation of modern video games.

        In this show they dissect localization of large and popular video games. Those video games have been localized by big companies with hundreds of human translators. They have very rigorous process with revisions, proofreading, QA, testing, etc. Yet, those teams still produce a result that cannot completely satisfy the end consumer.

        Now imagine what the said consumer will say about post-edited machine translation? They will simply crucify the developers, the localizers and translators publicly. This will be a PR catastrophe and it most likely affect the sales of such video games and not in a good way (because most of the Russian gamers don’t speak the language very well and they expect localization to be of the highest quality possible to the point where they won’t buy the game unless it’s properly localized).

        Report comment
  2. Dear Steve,

    unfortunately I agree with your concerns. I am working with agencies, in particular with a big one, that is investing a lot of resources on the implementation of Artificial Intelligence connected with the MTs and TMs … In the pharmaceutical field, the language is sometimes very technical, so my translations are full of “full matches” and in these cases my intervention is minimal (as well as the payment).
    Different is the situation with localization and transcreation, where the human intervention is still needed. At least by now.
    Thanks anyway for the interesting insight

    Report comment

Leave a Reply

The Open Mic

Where translators share their stories and where clients find professional translators.

Find Translators OR Register as a translator