Greater than 12 minutes, my friend!
In August 2016, @nkatris recommended an @OpenMicXL8 blog post by @VitekSteve entitled BigData and Little Translators – a Marriage Made in Heaven or Hell? on Twitter, which the astute reader will have recognised from names beginning with an at sign. I hoped it would provide interesting insights on how lower-case translators could benefit from information generated by upper-case Big Data (probably even #BigData) … and was disappointed that it turned out as a rant against “propagandistic” ATA Chronicle articles by
non-translating stakeholders of the
translation industry, i.e. larger agencies that often act as mere resellers with significantly more translation project managers than in-house translators.
Aside: The German term is “Umtüter” (m.), almost equal to EN re-packager, but derived from “Tüte” (f.) – the common, cheap plastic shopping bag heavily contributing to Marine Litter. Which evokes just the right subtext, I’d say. Of course, there are both good and bad agencies, but it’s not en vogue to talk about agencies one likes to work with. Might attract competition, y’know?
Back to that Twitter origin story: When I complained:
Insights on how #bigdata could help #xl8 (e.g. corpus > style guide; biz data > prospect)? Got a rant. , Dmitry Kornyukhov answered justly:
Well, that’s Steve’s writing style for you. It’s not for everyone, but I enjoy it But it would be awesome if you could publish your own thoughts on the subject on @OpenMicXL8, Christopher!
Which left me to decide which subject to take up – the title topic of “Big Data and Little Translators” or the “Big Data is Just Another Excuse of Large LSPs to Lower Translator Rates” article. Since I think much has already been written on they-vs-us, including by Steve (also on his blog since quite a while), I will rather try and have a look at what Big Data applications and use-cases there are for freelance translators and “boutique” agencies in late 2016.
From 8 Big Data Solutions for Small Businesses and Power of Big Data for SMEs to 5 Ways for Small Business to Jump on the Big Data Train or Small Business, Big Data: A Practical Approach, the marketing and IT Services/Cloud sectors are obviously interested in including small businesses (down to one-man shows?) into their customer base, concentrating on providing information about prospects and customers either by processing information from readily available online sources or by selling us tracking software for our websites. Big Data for SME was even a topic of Germany’s largest IT fair CeBIT in 2015. Even research has had an eye on how Big Data and small enterprises might come together, for example in this Stanford study.
So, What Does Big Data Even Mean?
Let us – language lovers that we are – begin by agreeing on what the term Big Data actually denotes. The all-encompassing wisdom of the world starts with:
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. The term often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.
However, things are not so clear. In 2014, Forbes asked “12 Big Data Definitions: What’s Yours?” but in my opinion, a shortlist of 5 and a bit will be sufficient:
- Any large amount of data, often with the added statement that “its manipulation and management present significant logistical challenges” (OED), “traditional data processing applications are inadequate” (Wikipedia, see above) or that its”size is beyond the ability of typical database software tools to capture, store, manage, and analyze” (McKinsey), which is rightly criticised as being subjective, imprecise, ambiguous marketing blabber.
- Also: “The broad range of new and massive data types that have appeared over the last decade or so.” (Tom Davenport in BigData@Work) Big Data platforms will usually try to integrate structured (Databases, data from forms or industrial machines, XML, JSON, …) and unstructured (written text, sound, video, …) data from multiple sources into a uniform, machine-readable format.
- The act of collecting, processing and analysing data on a large scale to gain insights and value and to support (or make) decisions.
- Also: The belief that more data (ideally from multiple sources) automatically produces more insights and answers and leads to better decisions.
- The tools to find relevant data and analyse its implications, i.e. software, algorithms, the practice of enriching data with machine-readable metadata such as microdata or of tracking people using social networks, cookies and fingerprinting.
- The historic / societal development of consumers giving up privacy for comfort and sharing data with enterprises – and of companies gaining “external data”.
- The historic / societal development of individuals creating and publishing information instead of just consuming it, which apparently raises the amount of available data by the seventh power of LOTS.
In my eyes, definitions #2 and #3 are most helpful when asking “What can it do for us translators?” Which leads us to:
Possible Big Data Goals for Small Translators
Freelance translators seldom generate that much data on their own, one might think. Even if we are typing a lot. But think about definition #2 above: It starts with us collecting data that would be of use to answer our questions.
So, if the question is “Do I have an accurate estimation of how how many words per hour I can translate for projects of type X? Do my per-word rates thus translate to an acceptable hourly income?” then I first need to calculate what my “acceptable hourly income” is – the usual maths stuff: (Estimated) costs per year, including costs of living, rent, professional and private insurances, retirement schemes, holidays, a little extra to keep our beloved partner happy … divided by, say, 200-250 8-hour “paid work” days per year … for me, here in Germany, anything below EUR 60/h risks becoming unprofitable, which is why all craftsmen from electricians to plumbers that I have ever met demand EUR 80+/h, sometimes hidden in call-out charges or material costs. Having a university degree and the associated self-aggrandising attitude, I wouldn’t demand less.
And then, we start “amassing large amounts of data whose manipulation and management present significant logistical challenges” (definition #1) to us: In a plaintext file, Excel spreadsheet, time-keeping app, one or another specialised solution or in any other convenient way, we start keeping track of the type of text, its size in characters/words/standard lines, the agreed rate and the actual time taken – perhaps even separately for “time translating”, “time proofreading” and “time project administration” if we want to know how much “overhead” we have with each project and if/where we should set a minimum wage.
Voilà: Big Data on project efficiency from 100s of our projects, possibly stored in a way that we will never look at it again (def #1). Now, the big promise of Big Data solutions (def #3) is that they will analyse data that is too big to handle on our own and provide an actionable decision basis (def #2). A true Big Data application would proceed to draw further data from other sources (i.e. the web) to contextualise our own data: What rates have been published by other translators for comparative project specifications? What about income surveys from census data and job portals? Do LSP or trade association websites mention “words per hour” measures within the context of “translation” – and what is their average? This list is non-exhaustive.
Businesses successfully mining Big Data are cross-referencing their internal information – pricing histories, customer traffic patterns – with multiple outside sources to increase revenue by understanding customers’ behavior better, reducing costs by eliminating inefficiencies and human bias, strengthening client bonds by anticipating clients’ needs, enriching service offerings with new knowledge, and giving employees new tools to perform their jobs better.
In fact, providing external data on our customers or the market will probably be the most-offered Big Data application for small businesses. Most of us simply don’t use elaborate CRM platforms to manage our customer base. I know many colleagues who do their accounting work in a simple Excel spreadsheet instead of an ERP solution. Also, most Big Data providers will see those as “small data” of no concern to them. Furthermore, corporate translation buyers will most likely not provide us with interfaces to their internal information – as is common among big players. We will thus necessarily look to web or cloud services as available tools to sift and condense “everything available on the web”, aka publicly available information, for us.
The offers might or might not fulfil our expectations. When looking for Big Data applications, I can perceive 3 goals of interest to Small Translators:
Goal 1: Understanding The Market
- Finding prospects and turning them into customers
- Gaining information on payment practice and credit worthiness and overall customer reputation (employee satisfaction? / working atmosphere)
- Gaining information on prices for language combinations, text types, industries, seasonal influences, etc. and determine which projects to accept
- Keeping tabs on established customers and survey customer satisfaction
Goal 2: Optimize Business Operations
- Assessing our speed with certain types of projects, assessing profitability and effective earnings / adapt our hourly wage
- Assessing how well our website works based on server logs or analytics tools such as Google Analytics and Piwik – and improve it to lure more people into clicking the “contact me” button. Which kind of information do our prospects search for? What pages do they surf most? Where do they stay to read and what is being clicked away?
Goal 3: Optimize Our Translation Work
- Speed up our work
- Improve translation quality
- Gaining relevant context information on idiomatic writing, the customer’s style and terminology or applicable industry standards
- Leveraging Machine Translation or include web-based and possibly collaborative translation memories and glossaries (cloud services for CAT tools)
Available Big Data Tools
The Big G – Google immediately springs to mind when looking for a company offering Big Data information to the whole world “for free” (for ads, that is). Next to its “true” Big Data offering for enterprise customers and its widely known WWW search engine, which provides a wealth of information when fed the right keywords, operators and options, Google places a number of more specialised tools at our hands, among them:
- Google Alerts to notify us of any new occurrence of our specified keywords on “the Internets” – useful to stay up-to-date on our clients, but also on developments in our target industries. There are free and paid alternatives, for example Talkwalker Alerts from Luxembourg, kuerzr.com from Germany, Alert.io from France, etc.
- Google Ngrams which searches large volumes of data from books for chosen collocations – useful to find out which phrasing is more common.
- Google Trends notifies us of trending topics which fall into our speciality and point us at companies that might need our services right now.
- Google Public Data offers access to large public databases on economic indicators, which helps assessing if a regional market is solid enough, as a whole, to seek out clients there. With the mother tongue principle in mind, we might start to look more for customers abroad who are more likely to need high quality translations into our country’s language than local businesses. There are, of course, other aggregators of public data which can help you find out more about a topic or target demographic for your advertising, one well-known provider is Statista.
- Google Translate API must be on the list, since statistical MT is a true Big Data application. Google made this a paid service some time ago, but Microsoft Translator API is still without (monetary) cost even if they are currently migrating from the MS Data Market to the MS Azure platform. And of course, SDL offers its own range of MT products. In all cases, watch out for privacy and confidentiality issues.
There is also the very powerful If This Then That (IfTTT) with which you can fine-tune automatic reactions to any kind of web-based events; this blog post on bufferapp.com will give you an impression of what is possible.
A powerful desktop tool to analyse large volumes of text is the concordancer AntConc, which can tell you about the frequency of certain collocations in a corpus or compare a text with, say, a Wikipedia dump to find out what the core concepts of your text are. It’s also good at finding term candidates – however, you should know a bit about actual linguistics to understand the program, so that somewhat excludes all those “linguist” translators who aren’t linguists. Duh.
Unfortunately, the above tools will mostly help with Goal 3, with a speckle of Goal 1 from the news aggregator sites. Goal 2 seems to be served mostly by using your good old CAT tool, which either has time-keeping built-in, as a plugin or should be used in tandem with a time-keeping app. I happen to like the Android app Timesheet because it doesn’t clutter my desktop with yet another window, but there are also desktop apps or even web services for time tracking and pay calculation, some of them including reminders to make you stand up and stretch your legs in compliance with your local occupational health regulations. Just take the effort to log the time(s) you took for each and every project to amass the necessary project data for truly helpful adjustments. Compensation portals such as PayScale will give you the Big Data side of the equation to set your rates. Ideally, you will try a local payment portal to get a feeling for what is paid in your corner of the world – some countries supply the aggregated data through their statistics offices.
There are industry-specific mailing lists and portals on payment practices, namely the ProZ.com BlueBoard and the PaymentPractices.net – make no mistake, those are aggregating lots of data and make it accessible to you, so they are Big Data services. Sometimes, you will also want to check the trade register entries of the country where your customer is based or conduct a news search to see if any red flags or opportunities for further business come up.
All in all, there seem to be ways to access contextual Big Data for almost any business operation. What lacks today is a service provider who provides all of the above information to translators in one neatly arranged format (might be a market niche!). Which essentially means we are back to definition #1 – it often seems too much work to find out. For specific questions, however, we can and should take Big Data sources into account when making business decisions.
A Friendship with Benefits
Even if his line of reasoning eventually went in another direction, Steve Vitek concluded in BigData and Little Translators – a Marriage Made in Heaven or Hell? that his “advice to most translators is: you don’t need to be married to Big Data or to ‘the translation industry’. Stay or become single and try to work mostly for direct clients.” I would point out that there are a number of Big Data tools out there that we can use at no cost or at low cost to answer specific questions concerning Small Translators. Those are friends with benefits: we like to spend time with them, sometimes they are helpful, sometimes they are funny, but there are no strings attached. If we need other kinds of answers, we just might consider a serious relationship – and yes, customer lock-in (aka “marriage”) might become a problem then. Or maybe not, if we are satisfied with the value we get.
Steve is certainly right in his assessment that certain kinds of translations will be taken over by pure MT or low-paying PEMT jobs. This will impact the translation market, but I subscribe more to the group of translators who think it will mostly thin out the “bottom-feeder” market. Quality content for use cases in which real money hinges on quality translations or transcreations will continue to remain a human domain for quite some time. If the Industry 4.0 gurus are right, the Fourth Industrial Revolution will change all kinds of jobs, but humanity will adapt, as it always does. If most jobs are taken over by machines, this could end up in civil distribution wars or it could end up in high corporate taxes paying a base salary to each citizen, so that humans can turn to more attractive endeavours… or a myriad of other scenarios. The future will come early enough. Until then, there’s money to be made in translation.
However, if you’re really married to a Big Data “translation industry” brute, there’s a German saying going: “Andere Väter haben auch hübsche Söhne/Töchter” (“other fathers have cute sons/daughters, too”), which I happen to find more appealing than the English saying featuring cold, slimy, flopping fish polluted with heavy metals and microplastics. Big Data can be “leveraged” by Small Translators, but the latter do not necessarily need to let themselves get abused by it. In this, I’m totally with Steve.