Big Data and Little Translators A Friendship with Benefits

Greater than 12 minutes, my friend!

In August 2016, @nkatris recommended an @OpenMicXL8 blog post by @VitekSteve entitled BigData and Little Translators – a Marriage Made in Heaven or Hell? on Twitter, which the astute reader will have recognised from names beginning with an at sign. I hoped it would provide interesting insights on how lower-case translators could benefit from information generated by upper-case Big Data (probably even #BigData) … and was disappointed that it turned out as a rant against “propagandistic” ATA Chronicle articles by non-translating stakeholders of the translation industry, i.e. larger agencies that often act as mere resellers with significantly more translation project managers than in-house translators.

Aside: The German term is “Umtüter” (m.), almost equal to EN re-packager, but derived from “Tüte” (f.) – the common, cheap plastic shopping bag heavily contributing to Marine Litter. Which evokes just the right subtext, I’d say. Of course, there are both good and bad agencies, but it’s not en vogue to talk about agencies one likes to work with. Might attract competition, y’know?

Back to that Twitter origin story: When I complained: Insights on how #bigdata could help #xl8 (e.g. corpus > style guide; biz data > prospect)? Got a rant. 🙁, Dmitry Kornyukhov answered justly: Well, that’s Steve’s writing style for you. It’s not for everyone, but I enjoy it 🙂 But it would be awesome if you could publish your own thoughts on the subject on , Christopher!
Challenge accepted.
Which left me to decide which subject to take up – the title topic of “Big Data and Little Translators” or the “Big Data is Just Another Excuse of Large LSPs to Lower Translator Rates” article. Since I think much has already been written on they-vs-us, including by Steve (also on his blog since quite a while), I will rather try and have a look at what Big Data applications and use-cases there are for freelance translators and “boutique” agencies in late 2016.

From 8 Big Data Solutions for Small Businesses and Power of Big Data for SMEs to 5 Ways for Small Business to Jump on the Big Data Train or Small Business, Big Data: A Practical Approach, the marketing and IT Services/Cloud sectors are obviously interested in including small businesses (down to one-man shows?) into their customer base, concentrating on providing information about prospects and customers either by processing information from readily available online sources or by selling us tracking software for our websites. Big Data for SME was even a topic of Germany’s largest IT fair CeBIT in 2015. Even research has had an eye on how Big Data and small enterprises might come together, for example in this Stanford study.

So, What Does Big Data Even Mean?

Let us – language lovers that we are – begin by agreeing on what the term Big Data actually denotes. The all-encompassing wisdom of the world starts with:

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. The term often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.

However, things are not so clear. In 2014, Forbes asked “12 Big Data Definitions: What’s Yours?” but in my opinion, a shortlist of 5 and a bit will be sufficient:

  1. Any large amount of data, often with the added statement that “its manipulation and management present significant logistical challenges” (OED), “traditional data processing applications are inadequate” (Wikipedia, see above) or that its”size is beyond the ability of typical database software tools to capture, store, manage, and analyze” (McKinsey), which is rightly criticised as being subjective, imprecise, ambiguous marketing blabber.
    • Also: “The broad range of new and massive data types that have appeared over the last decade or so.” (Tom Davenport in BigData@Work) Big Data platforms will usually try to integrate structured (Databases, data from forms or industrial machines, XML, JSON, …) and unstructured (written text, sound, video, …) data from multiple sources into a uniform, machine-readable format.
  2. The act of collecting, processing and analysing data on a large scale to gain insights and value and to support (or make) decisions.
    • Also: The belief that more data (ideally from multiple sources) automatically produces more insights and answers and leads to better decisions.
  3. The tools to find relevant data and analyse its implications, i.e. software, algorithms, the practice of enriching data with machine-readable metadata such as microdata or of tracking people using social networks, cookies and fingerprinting.
  4. The historic / societal development of consumers giving up privacy for comfort and sharing data with enterprises – and of companies gaining “external data”.
  5. The historic / societal development of individuals creating and publishing information instead of just consuming it, which apparently raises the amount of available data by the seventh power of LOTS.

In my eyes, definitions #2 and #3 are most helpful when asking “What can it do for us translators?” Which leads us to:

Possible Big Data Goals for Small Translators

Freelance translators seldom generate that much data on their own, one might think. Even if we are typing a lot. But think about definition #2 above: It starts with us collecting data that would be of use to answer our questions.

So, if the question is “Do I have an accurate estimation of how how many words per hour I can translate for projects of type X? Do my per-word rates thus translate to an acceptable hourly income?” then I first need to calculate what my “acceptable hourly income” is – the usual maths stuff: (Estimated) costs per year, including costs of living, rent, professional and private insurances, retirement schemes, holidays, a little extra to keep our beloved partner happy … divided by, say, 200-250 8-hour “paid work” days per year … for me, here in Germany, anything below EUR 60/h risks becoming unprofitable, which is why all craftsmen from electricians to plumbers that I have ever met demand EUR 80+/h, sometimes hidden in call-out charges or material costs. Having a university degree and the associated self-aggrandising attitude, I wouldn’t demand less.

And then, we start “amassing large amounts of data whose manipulation and management present significant logistical challenges” (definition #1) to us: In a plaintext file, Excel spreadsheet, time-keeping app, one or another specialised solution or in any other convenient way, we start keeping track of the type of text, its size in characters/words/standard lines, the agreed rate and the actual time taken – perhaps even separately for “time translating”, “time proofreading” and “time project administration” if we want to know how much “overhead” we have with each project and if/where we should set a minimum wage.

Voilà: Big Data on project efficiency from 100s of our projects, possibly stored in a way that we will never look at it again (def #1). Now, the big promise of Big Data solutions (def #3) is that they will analyse data that is too big to handle on our own and provide an actionable decision basis (def #2). A true Big Data application would proceed to draw further data from other sources (i.e. the web) to contextualise our own data: What rates have been published by other translators for comparative project specifications? What about income surveys from census data and job portals? Do LSP or trade association websites mention “words per hour” measures within the context of “translation” – and what is their average? This list is non-exhaustive.

Businesses successfully mining Big Data are cross-referencing their internal information – pricing histories, customer traffic patterns – with multiple outside sources to increase revenue by understanding customers’ behavior better, reducing costs by eliminating inefficiencies and human bias, strengthening client bonds by anticipating clients’ needs, enriching service offerings with new knowledge, and giving employees new tools to perform their jobs better.

In fact, providing external data on our customers or the market will probably be the most-offered Big Data application for small businesses. Most of us simply don’t use elaborate CRM platforms to manage our customer base. I know many colleagues who do their accounting work in a simple Excel spreadsheet instead of an ERP solution. Also, most Big Data providers will see those as “small data” of no concern to them. Furthermore, corporate translation buyers will most likely not provide us with interfaces to their internal information – as is common among big players. We will thus necessarily look to web or cloud services as available tools to sift and condense “everything available on the web”, aka publicly available information, for us.

The offers might or might not fulfil our expectations. When looking for Big Data applications, I can perceive 3 goals of interest to Small Translators:

Goal 1: Understanding The Market

  • Finding prospects and turning them into customers
  • Gaining information on payment practice and credit worthiness and overall customer reputation (employee satisfaction? / working atmosphere)
  • Gaining information on prices for language combinations, text types, industries, seasonal influences, etc. and determine which projects to accept
  • Keeping tabs on established customers and survey customer satisfaction

Goal 2: Optimize Business Operations

  • Assessing our speed with certain types of projects, assessing profitability and effective earnings / adapt our hourly wage
  • Assessing how well our website works based on server logs or analytics tools such as Google Analytics and Piwik – and improve it to lure more people into clicking the “contact me” button. Which kind of information do our prospects search for? What pages do they surf most? Where do they stay to read and what is being clicked away?

Goal 3: Optimize Our Translation Work

  • Speed up our work
  • Improve translation quality
  • Gaining relevant context information on idiomatic writing, the customer’s style and terminology or applicable industry standards
  • Leveraging Machine Translation or include web-based and possibly collaborative translation memories and glossaries (cloud services for CAT tools)

Available Big Data Tools

The Big G – Google immediately springs to mind when looking for a company offering Big Data information to the whole world “for free” (for ads, that is). Next to its “true” Big Data offering for enterprise customers and its widely known WWW search engine, which provides a wealth of information when fed the right keywords, operators and options, Google places a number of more specialised tools at our hands, among them:

  • Google Alerts to notify us of any new occurrence of our specified keywords on “the Internets” – useful to stay up-to-date on our clients, but also on developments in our target industries. There are free and paid alternatives, for example Talkwalker Alerts from Luxembourg, from Germany, from France, etc.
  • Google Ngrams which searches large volumes of data from books for chosen collocations – useful to find out which phrasing is more common.
  • Google Trends notifies us of trending topics which fall into our speciality and point us at companies that might need our services right now.
  • Google Public Data offers access to large public databases on economic indicators, which helps assessing if a regional market is solid enough, as a whole, to seek out clients there. With the mother tongue principle in mind, we might start to look more for customers abroad who are more likely to need high quality translations into our country’s language than local businesses. There are, of course, other aggregators of public data which can help you find out more about a topic or target demographic for your advertising, one well-known provider is Statista.
  • Google Translate API must be on the list, since statistical MT is a true Big Data application. Google made this a paid service some time ago, but Microsoft Translator API is still without (monetary) cost even if they are currently migrating from the MS Data Market to the MS Azure platform. And of course, SDL offers its own range of MT products. In all cases, watch out for privacy and confidentiality issues.

There is also the very powerful If This Then That (IfTTT) with which you can fine-tune automatic reactions to any kind of web-based events; this blog post on will give you an impression of what is possible.

A powerful desktop tool to analyse large volumes of text is the concordancer AntConc, which can tell you about the frequency of certain collocations in a corpus or compare a text with, say, a Wikipedia dump to find out what the core concepts of your text are. It’s also good at finding term candidates – however, you should know a bit about actual linguistics to understand the program, so that somewhat excludes all those “linguist” translators who aren’t linguists. Duh.

Unfortunately, the above tools will mostly help with Goal 3, with a speckle of Goal 1 from the news aggregator sites. Goal 2 seems to be served mostly by using your good old CAT tool, which either has time-keeping built-in, as a plugin or should be used in tandem with a time-keeping app. I happen to like the Android app Timesheet because it doesn’t clutter my desktop with yet another window, but there are also desktop apps or even web services for time tracking and pay calculation, some of them including reminders to make you stand up and stretch your legs in compliance with your local occupational health regulations. Just take the effort to log the time(s) you took for each and every project to amass the necessary project data for truly helpful adjustments. Compensation portals such as PayScale will give you the Big Data side of the equation to set your rates. Ideally, you will try a local payment portal to get a feeling for what is paid in your corner of the world – some countries supply the aggregated data through their statistics offices.

There are industry-specific mailing lists and portals on payment practices, namely the BlueBoard and the – make no mistake, those are aggregating lots of data and make it accessible to you, so they are Big Data services. Sometimes, you will also want to check the trade register entries of the country where your customer is based or conduct a news search to see if any red flags or opportunities for further business come up.

All in all, there seem to be ways to access contextual Big Data for almost any business operation. What lacks today is a service provider who provides all of the above information to translators in one neatly arranged format (might be a market niche!). Which essentially means we are back to definition #1 – it often seems too much work to find out. For specific questions, however, we can and should take Big Data sources into account when making business decisions.

A Friendship with Benefits

Even if his line of reasoning eventually went in another direction, Steve Vitek concluded in BigData and Little Translators – a Marriage Made in Heaven or Hell? that his “advice to most translators is: you don’t need to be married to Big Data or to ‘the translation industry’. Stay or become single and try to work mostly for direct clients.” I would point out that there are a number of Big Data tools out there that we can use at no cost or at low cost to answer specific questions concerning Small Translators. Those are friends with benefits: we like to spend time with them, sometimes they are helpful, sometimes they are funny, but there are no strings attached. If we need other kinds of answers, we just might consider a serious relationship – and yes, customer lock-in (aka “marriage”) might become a problem then. Or maybe not, if we are satisfied with the value we get.

Steve is certainly right in his assessment that certain kinds of translations will be taken over by pure MT or low-paying PEMT jobs. This will impact the translation market, but I subscribe more to the group of translators who think it will mostly thin out the “bottom-feeder” market. Quality content for use cases in which real money hinges on quality translations or transcreations will continue to remain a human domain for quite some time. If the Industry 4.0 gurus are right, the Fourth Industrial Revolution will change all kinds of jobs, but humanity will adapt, as it always does. If most jobs are taken over by machines, this could end up in civil distribution wars or it could end up in high corporate taxes paying a base salary to each citizen, so that humans can turn to more attractive endeavours… or a myriad of other scenarios. The future will come early enough. Until then, there’s money to be made in translation.

However, if you’re really married to a Big Data “translation industry” brute, there’s a German saying going: “Andere Väter haben auch hübsche Söhne/Töchter” (“other fathers have cute sons/daughters, too”), which I happen to find more appealing than the English saying featuring cold, slimy, flopping fish polluted with heavy metals and microplastics. Big Data can be “leveraged” by Small Translators, but the latter do not necessarily need to let themselves get abused by it. In this, I’m totally with Steve.

Did you like this article? Then leave a comment, head over to or drop me a line.

Yours truly

Christopher Köbel

About Christopher Köbel

The IT, Automation and Industry 4.0/Smart Industry expert for German, French and English technical and marketing translations and website localization: DeFrEnT …it’s different!

7 thoughts on “Big Data and Little Translators A Friendship with Benefits

  1. Awesome post, Christopher! Really loved the detailed overview of Big Data with clear explanations and use cases.

    I love collecting data, but I’m a bit at loss when it comes to analyzing it and making right conclusions. I think it’s one of the challenges that the big data presents to us, small translators. There’s definitely a gap in the market for a company that can analyze the big data for translators and present them with actionable insights. I think we’ll see more development in this field in the upcoming years.

    It would be great to have a solution that could visualize data for translators and help them with all the areas that you’ve described (analyzing work patterns, adjusting prices, understanding client behavior and how they find/interact with us).

    Do you think we’ll see more development tailored to the needs of individual translators rather than big companies?

    Report comment
  2. At the moment, I see a range of new cloud offers aimed at translators – from CAT to customer management up to invoicing and tax services, e.g. link to is a German cloud tax consultancy targeted at SME down to freelancers. Felix1 is a young start-up that has partnered with established medium-sized companies to provide its services, for example eurodata who provide secure data centres in Germany (marketed as a data security plus) and Industry 4.0 business consulting. So the entrepreneurs rely on available know-how to bring forth new business models.
    For transparency’s sake, I should add that I learned about felix1 because eurodata is one of my regular clients, but the idea to get rid of accounting, invoicing and tax tasks is quite compelling.

    So, *if* there is going to be a Big Data analyst offering strategic insights to translators or other solo enterprises, it will probably come from such a team-up of entrepreneurs with a great idea with established companies who can provide technical know-how and infrastructures.

    It would certainly be a market if such a service could prove that their data can actually generate additional revenue or business opportunities.

    Report comment
    1. Further transparency & another example:

      I have been made aware that the new Felix1 company is really a subsidiary of the large accounting group ETL, which, as a holding, also controls eurodata. I previously had the impression they were entrepreneurs partnering with the “smart services” provider eurodata to convert their usually paper-based tax consulting business into a digitally refined cloud-based form. So the example was good from the big data side of things, but not necessarily from the small company side of the example. Sorry!

      A better example for a true “grassroots” start-up relying on the digital services offered by established companies might be 20-man show – they offer mounts to convert any bike into a “smart” bike, providing their user’s smartphones with the bike’s GPS / routing / weather / fitness / … data. As can be expected, they have partnered with Asian manufacturers to mass-produce their invention, and with large big data/cloud service providers to make their hardware “smart”.

      With us translators, it is both more simple and more difficult: We are offering a service without any tangible product – if you count out paper. Mass-delivery to lots of customers is out of the question for us freelancers, because we offer very individual, tailor-made solutions. So our relationship to Big Data will probably always be one that puts us as the buyers of Big Data services, not as providers. But if the idea to turn the much-criticized “post-editing machine translation” (PEMT) upside down and use MT to improve our writing turns out as a brilliant one, as cloud CAT provider Lilt is promising (see Jost Zetzsche’s demo on ), we just might ramp up our output to reach more customers without compromising on quality.

      Report comment
      1. “So our relationship to Big Data will probably always be one that puts us as the buyers of Big Data services, not as providers.” – yes, that’s what I thought if we talk about individual freelancers.

        “We just might ramp up our output to reach more customers without compromising on quality” – it sounds enticing, but don’t you think it’s more focused on agencies and teams rather than individual freelancers.

        Report comment
        1. If I understand the technology correctly, it uses massive open corpora for basic MT training and then proceeds to retrain the MT engine with user-specific corpora (e.g. translation memories), gradually adapting to each translator’s style as he uses the system. So it *is* suited to individual translators. There’s another catch:

          I am more worried with handing my customer’s texts to a third party for processing – this raises concerns of privacy, data security, possibly even copyright and would likely need a “data processing agreement” (DE: “Auftragsdatenverarbeitungsvertrag” – yes, that is one word!) under German data protection law. So, as much as I find the technology cool, I won’t be able to use it for legal reasons – especially as the “US-EU Privacy shield” will surely crumble under a new lawsuit just as the “Safe Harbor” agreement did.

          Report comment
  3. Hi. This subject is quite new to me (I am a very small translator), but you opened a world for me and I’m going to discover some of the tools and the articles you mentioned.
    I am particularly intrigued by “Industry 4.0” and the possible scenarios after machines will take most of the jobs we are now performing. Of course, working for an agency, I noticed that some translations I proofread are made by an MT, while other jobs already disappeared, so I am happy that someone is thinking about what we are going to do, what the possible solutions are.

    Report comment

Leave a Reply

The Open Mic

Where translators share their stories and where clients find professional translators.

Find Translators OR Register as a translator