Greater than 6 minutes, my friend!
CAT tools and Microsoft Office documents: how to solve common problems by using TransTools software
Today, documents of Microsoft Office formats – mainly Microsoft Word, Excel and PowerPoint – make a significant part of work in the translation market. Because these formats are so common, it is impossible to find a CAT program which cannot handle them. Support for basic formatting, translation preview, working with revisions are just some of the capabilities implemented for Word, Excel and PowerPoint formats in the modern CAT tools.
However, despite the high level of support for Microsoft Office formats, modern CAT tools are still incapable of solving some of the issues that frequently arise during translation of these formats:
- Excessive tags in documents created with OCR (optical character recognition) and PDF conversion software, which make translation with CAT tools significantly more difficult.
- Incorrect paragraph breaks in the middle of sentences, which cause wrong segmentation in a CAT tool.
- Difficulties in selective translation of text when a document needs to be translated partially.
- Partially visible text in the target document after it is exported from a CAT tool.
To address these issues today, one needs to process documents carefully both before and after translation. These tasks can be simplified with TransTools software suite, a set of plug-ins for Microsoft Word, Excel, PowerPoint and Visio as well as AutoCAD. TransTools contains a number of tools for processing of documents before and after translation. Some of these tools are described in this post.
Issue 1 – excessive tags
Excessive tags are such a common issue in dealing with CAT tools that there is even a special term for them in the English language – tag soup. Usually, excessive tags are found in Microsoft Word and Microsoft PowerPoint documents which were converted from PDF files or scanned images using OCR or conversion software (e.g., FineReader, PDF Transformer, OmniPage, etc.). Such documents are often full of minor formatting changes that are practically invisible to the naked eye, but not to a CAT tool.
When working in tag-rich documents, the translator spends time to understand the meaning of each tag, transfer the tags into the target segments, and check their order.
TransTools includes 2 different tools for cleaning of excessive tags:
- Document Cleaner, a special tool included in TransTools for Microsoft Word. This integrated tool helps to fix various problems that are characteristic of documents converted with OCR programs and PDF converters. Tag Cleaner command, which is included in the tool, removes excessive formatting from a Word document while keeping intact all significant formatting (bold, underlined, italics, etc.). Once the processed document is imported into the CAT tool, excessive tags disappear, allowing the translator to concentrate on the core process – translation.
- Tag Cleaner, a special tool included in TransTools for PowerPoint. Similar to Document Cleaner in Word, this tool will help you remove most excessive tags caused by formatting differences or specific issues inherent to the PowerPoint format.
Issue 2 – Incorrect paragraph breaks in the middle of sentences
A common situation in dealing with Microsoft Word and PowerPoint documents is paragraph breaks inserted in the middle of sentences. This is common both in documents prepared manually and in documents produced by PDF converters or OCR tools. When such documents are imported into a CAT tool, the affected sentences are split into several segments.
Incorrect segmentation makes it difficult to find matches in a translation memory and contaminates translation memories with incomplete sentences, making it difficult to leverage them later.
Modern CAT tools offer several ways to resolve this issue. Some CAT tools (memoQ, Wordfast Pro) can merge segments that come from different paragraphs. Several common CAT tools (SDL Trados Studio, Déjà Vu X) cannot do this, so translators have to translate individual sentence fragments or use the source segment editing function to re-assemble fragments into a complete sentence. However, in all these cases incorrect paragraph breaks are preserved in the translated document and cause wrong text flow or empty paragraphs. Therefore, after the document is exported from the CAT tool, it is still necessary to locate all incorrect paragraph breaks and remove them manually, which is time-consuming.
While it is possible (although tiresome and time-consuming) to find incorrect paragraph and line breaks manually in Microsoft Word, it is much more difficult to do the same in PowerPoint presentations because Microsoft PowerPoint lacks a special option to display paragraph and line breaks.
The issue of incorrect paragraph breaks can be resolved using Unbreaker for Word and Unbreaker for PowerPoint which are part of TransTools. These tools automatically find incorrect paragraph breaks and make it easy to remove them quickly from the document before it is imported into the CAT program.
Issue 3 – Selective translation
Quite frequently documents need to be translated partially. For example, you may need to translate only the text which is marked with a specific color, or do the opposite – translate the entire document except text marked with a certain color.
To hide non-translatable text in a Word document from most CAT tools, this text needs to be formatted with hidden formatting attribute. “Hiding” text is a manual process which can take a lot of time depending on the document’s complexity and the number of text fragments that need hiding.
TransTools for Word includes a special tool called Hide / Unhide Text which makes it very easy to hide text from the CAT program. The tool includes two options for hiding text:
- Hide everything except the text marked with a certain highlight color or font color.
- Hide the text marked with a certain highlight color or font color.
After the document is translated, the tool should be used again to unhide all text in the document before submitting it to the client.
Hide / Unhide Text tool is especially useful if you need to create a dual-language document. In this case, prior to translation, the original text is duplicated in a second column or after a custom separator. To translate such a document in a CAT program, the most convenient method is to mark one part of the document with a specific color and then hide it using this tool. A single-language document can be converted into a dual-language translation-ready document using another tool called Dual Language Document Assistant, which is also part of TransTools for Microsoft Word.
Issue 4 – Partially visible text after translation
Due to expansion during translation, some text may no longer fit inside table cells, frames or text boxes and becomes partially invisible. Manually finding all invisible text is tiresome and time-consuming.
To resolve this issue, TransTools offers several tools:
- Cell Resize Wizard for Microsoft Excel. This tool locates all text that does not fit inside cells and makes it possible to change row height or column width for these cells automatically.
- Autoformat tool included in Document Cleaner (Microsoft Word). This tool helps you apply automatic row height to table rows, remove frames (which often occur in documents produced by PDF converters and OCR tools), and locate textboxes in Word documents.
***
Although CAT programs have a lot of benefits, they have one major flaw: after a document is imported, we can no longer fix any formatting issues we find, as it is possible during manual translation in a text-editing program. For this reason, proper processing of a document before and after translation in a CAT tool is very important, and even more so if the translator cannot correct the source document, e.g., when a server-based CAT tool is used or when bilingual files such as XLIFF are exchanged. TransTools will allow you to streamline this process while addressing the main issues that arise during translation with CAT tools.
Hello Stanislav, glad to have you on TheOpenMic!
Thank you for introducing TransTools to a broader audience.
I’ve been using the tools for more than three years and can recommend them wholeheartedly.
Keep up the great work!
Thank you for introducing me to TheOpenMic, Patrick! After reading lots of insightful posts here, it’s great to finally share my first story with TheOpenMic community.
Stanislav, thanks! I also have been using your tools for a few years, and I recommend Document Cleaner/TransTools to everyone I know who uses Studio. It has saved my sanity more times than I can count.
Hello, Tracy! It’s great to see TransTools users on The Open Mic! Thank you very much for spreading the word about TransTools.
This is great stuff Stanislav!
Thanks for sharing, I will give it a try!
Hi Stanislav,
So you are the face behind TransTools! It’s always nice to see the person working behind such important tools. I’ve been using TransTools for a while now, and recommending to everyone in need. It’s a fantastic tool.
Congratulations!
Hi Thiago. Thanks for your comment about TransTools and for sharing information about it. I really appreciate it. If you have any ideas about making TransTools better, let me know!