CAT tools and Microsoft Office documents: how to solve common problems by using TransTools software




Greater than 6 minutes, my friend!

Today, documents of Microsoft Office formats – mainly Microsoft Word, Excel and PowerPoint – make a significant part of work in the translation market. Because these formats are so common, it is impossible to find a CAT program which cannot handle them. Support for basic formatting, translation preview, working with revisions are just some of the capabilities implemented for Word, Excel and PowerPoint formats in the modern CAT tools.

However, despite the high level of support for Microsoft Office formats, modern CAT tools are still incapable of solving some of the issues that frequently arise during translation of these formats:

  • Excessive tags in documents created with OCR (optical character recognition) and PDF conversion software, which make translation with CAT tools significantly more difficult.
  • Incorrect paragraph breaks in the middle of sentences, which cause wrong segmentation in a CAT tool.
  • Difficulties in selective translation of text when a document needs to be translated partially.
  • Partially visible text in the target document after it is exported from a CAT tool.

To address these issues today, one needs to process documents carefully both before and after translation. These tasks can be simplified with TransTools software suite, a set of plug-ins for Microsoft Word, Excel, PowerPoint and Visio as well as AutoCAD. TransTools contains a number of tools for processing of documents before and after translation. Some of these tools are described in this post.

Issue 1 – excessive tags

Excessive tags are such a common issue in dealing with CAT tools that there is even a special term for them in the English language – tag soup. Usually, excessive tags are found in Microsoft Word and Microsoft PowerPoint documents which were converted from PDF files or scanned images using OCR or conversion software (e.g., FineReader, PDF Transformer, OmniPage, etc.). Such documents are often full of minor formatting changes that are practically invisible to the naked eye, but not to a CAT tool.

Excessive tags in a Word document

Excessive tags in a Word document converted with FineReader, after importing into memoQ

 

Excessive tags in a PowerPoint document

Excessive tags in a PowerPoint document converted from a PDF

When working in tag-rich documents, the translator spends time to understand the meaning of each tag, transfer the tags into the target segments, and check their order.

TransTools includes 2 different tools for cleaning of excessive tags:

  1. Document Cleaner, a special tool included in TransTools for Microsoft Word. This integrated tool helps to fix various problems that are characteristic of documents converted with OCR programs and PDF converters. Tag Cleaner command, which is included in the tool, removes excessive formatting from a Word document while keeping intact all significant formatting (bold, underlined, italics, etc.). Once the processed document is imported into the CAT tool, excessive tags disappear, allowing the translator to concentrate on the core process – translation.
    Tag Cleaner command in Document Cleaner

    Tag Cleaner command in Document Cleaner

  2. Tag Cleaner, a special tool included in TransTools for PowerPoint. Similar to Document Cleaner in Word, this tool will help you remove most excessive tags caused by formatting differences or specific issues inherent to the PowerPoint format.

Issue 2 – Incorrect paragraph breaks in the middle of sentences

A common situation in dealing with Microsoft Word and PowerPoint documents is paragraph breaks inserted in the middle of sentences. This is common both in documents prepared manually and in documents produced by PDF converters or OCR tools. When such documents are imported into a CAT tool, the affected sentences are split into several segments.

Incorrect segmentation in SDL Trados Studio

Incorrect paragraph breaks in a Microsoft Word document and segmentation in SDL Trados Studio

Incorrect segmentation makes it difficult to find matches in a translation memory and contaminates translation memories with incomplete sentences, making it difficult to leverage them later.

Modern CAT tools offer several ways to resolve this issue. Some CAT tools (memoQ, Wordfast Pro) can merge segments that come from different paragraphs. Several common CAT tools (SDL Trados Studio, Déjà Vu X) cannot do this, so translators have to translate individual sentence fragments or use the source segment editing function to re-assemble fragments into a complete sentence. However, in all these cases incorrect paragraph breaks are preserved in the translated document and cause wrong text flow or empty paragraphs. Therefore, after the document is exported from the CAT tool, it is still necessary to locate all incorrect paragraph breaks and remove them manually, which is time-consuming.

While it is possible (although tiresome and time-consuming) to find incorrect paragraph and line breaks manually in Microsoft Word, it is much more difficult to do the same in PowerPoint presentations because Microsoft PowerPoint lacks a special option to display paragraph and line breaks.

The issue of incorrect paragraph breaks can be resolved using Unbreaker for Word and Unbreaker for PowerPoint which are part of TransTools. These tools automatically find incorrect paragraph breaks and make it easy to remove them quickly from the document before it is imported into the CAT program.

Unbreaker finds and removes incorrect breaks

Unbreaker makes it easy to find and remove incorrect paragraph and line breaks in Word and PowerPoint documents

Issue 3 – Selective translation

Quite frequently documents need to be translated partially. For example, you may need to translate only the text which is marked with a specific color, or do the opposite – translate the entire document except text marked with a certain color.

To hide non-translatable text in a Word document from most CAT tools, this text needs to be formatted with hidden formatting attribute. “Hiding” text is a manual process which can take a lot of time depending on the document’s complexity and the number of text fragments that need hiding.

TransTools for Word includes a special tool called Hide / Unhide Text which makes it very easy to hide text from the CAT program. The tool includes two options for hiding text:

  • Hide everything except the text marked with a certain highlight color or font color.
  • Hide the text marked with a certain highlight color or font color.
Using Hide / Unhide Text tool

Using Hide / Unhide Text tool to hide everything except text highlighted in yellow

After the document is translated, the tool should be used again to unhide all text in the document before submitting it to the client.

Hide / Unhide Text tool is especially useful if you need to create a dual-language document. In this case, prior to translation, the original text is duplicated in a second column or after a custom separator. To translate such a document in a CAT program, the most convenient method is to mark one part of the document with a specific color and then hide it using this tool. A single-language document can be converted into a dual-language translation-ready document using another tool called Dual Language Document Assistant, which is also part of TransTools for Microsoft Word.

Issue 4 – Partially visible text after translation

Due to expansion during translation, some text may no longer fit inside table cells, frames or text boxes and becomes partially invisible. Manually finding all invisible text is tiresome and time-consuming.

To resolve this issue, TransTools offers several tools:

  1. Cell Resize Wizard for Microsoft Excel. This tool locates all text that does not fit inside cells and makes it possible to change row height or column width for these cells automatically.
    Cell Resize Wizard finds and resizes partially visible cells

    Cell Resize Wizard helps to locate and resize cells which do not fit all the text

  2. Autoformat tool included in Document Cleaner (Microsoft Word). This tool helps you apply automatic row height to table rows, remove frames (which often occur in documents produced by PDF converters and OCR tools), and locate textboxes in Word documents.

***

Although CAT programs have a lot of benefits, they have one major flaw: after a document is imported, we can no longer fix any formatting issues we find, as it is possible during manual translation in a text-editing program. For this reason, proper processing of a document before and after translation in a CAT tool is very important, and even more so if the translator cannot correct the source document, e.g., when a server-based CAT tool is used or when bilingual files such as XLIFF are exchanged. TransTools will allow you to streamline this process while addressing the main issues that arise during translation with CAT tools.

Stanislav Okhvat

About Stanislav Okhvat

I am a developer of TransTools, a set of plug-ins for Microsoft Office, Visio and AutoCAD designed for translators, proofreaders and editors; technical translator specializing in oil & gas engineering

7 thoughts on “CAT tools and Microsoft Office documents: how to solve common problems by using TransTools software

  1. Hello Stanislav, glad to have you on TheOpenMic!
    Thank you for introducing TransTools to a broader audience.
    I’ve been using the tools for more than three years and can recommend them wholeheartedly.
    Keep up the great work!

    Report comment
  2. Stanislav, thanks! I also have been using your tools for a few years, and I recommend Document Cleaner/TransTools to everyone I know who uses Studio. It has saved my sanity more times than I can count.

    Report comment
  3. Hi Stanislav,
    So you are the face behind TransTools! It’s always nice to see the person working behind such important tools. I’ve been using TransTools for a while now, and recommending to everyone in need. It’s a fantastic tool.
    Congratulations!

    Report comment

Leave a Reply

The Open Mic

Where translators share their stories and where clients find professional translators.

Find Translators OR Register as a translator