DITA 1.2 Feature Article
Using XLIFF to Translate DITA Projects
An OASIS DITA Adoption Technical Committee Publication
On behalf of the DITA Adoption Technical Committee
Rodolfo Raya, Bryan Schnabel, and JoAnn Hackos
21 May 2012
OASIS (Organization for the Advancement of Structured Information
Standards) is a not-for-profit, international consortium that drives the
development, convergence, and adoption of e-business standards. Members
themselves set the OASIS technical agenda, using a lightweight, open process
expressly designed to promote industry consensus and unite disparate efforts.
The consortium produces open standards for Web services, security, e-business,
and standardization efforts in the public sector and for application-specific
markets. OASIS was founded in 1993. More information can be found on the OASIS
website at http://www.oasis-open.org.
The OASIS DITA Adoption Technical Committee members collaborate to
provide expertise and resources to educate the marketplace on the value of the
DITA OASIS standard. By raising awareness of the benefits offered by DITA, the
DITA Adoption Technical Committee expects the demand for, and availability of,
DITA conforming products and services to increase, resulting in a greater
choice of tools and platforms and an expanded DITA community of users,
suppliers, and consultants.
DISCLAIMER: All examples presented in this article were produced
using one or more tools chosen at the author's discretion and in no way reflect
endorsement of the tools by the OASIS DITA Adoption Technical Committee.
This white paper was produced and approved by the OASIS DITA Adoption
Technical Committee as a Committee Draft. It has not been reviewed and/or
approved by the OASIS membership at-large.
Copyright © 2012 OASIS. All rights reserved.
All capitalized terms in the following text have the meanings assigned
to them in the OASIS Intellectual Property Rights Policy (the "OASIS IPR
Policy"). The full Policy may be found at the OASIS website. This document and
translations of it may be copied and furnished to others, and derivative works
that comment on or otherwise explain it or assist in its implementation may be
prepared, copied, published, and distributed, in whole or in part, without
restriction of any kind, provided that the above copyright notice and this
section are included on all such copies and derivative works. However, this
document itself may not be modified in any way, including by removing the
copyright notice or references to OASIS, except as needed for the purpose of
developing any document or deliverable produced by an OASIS Technical Committee
(in which case the rules applicable to copyrights, as set forth in the OASIS
IPR Policy, must be followed) or as required to translate it into languages
other than English. The limited permissions granted above are perpetual and
will not be revoked by OASIS or its successors or assigns. This document and
the information contained herein is provided on an "AS IS" basis and OASIS
DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP
RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Revision |
Date |
Author |
Summary |
First Draft |
29 August 2011 |
Hackos |
Draft of the Committee Note -- Feature Article |
Draft |
7 October 2011 |
Hackos, Schnabel, Raya |
Draft for committee vote |
Draft |
7 March 2012 |
Raya |
Added workflow diagrams |
Committee Approved Draft |
21 May 2012 |
Hackos (Chair) |
Adoption TC approved final draft |
Maintenance workflow
As products and processes are updated, you will update some of your topics, write new
ones, and need to update your translations. At that point, you will see the benefits
of translation reuse with these techniques:
- reuse In-Context Exact (ICE) matches
- recover translations of similar text from Translation Memory (TM)
- generate updated translations using Example-Based Machine Translation
(EBMT)
The maintenance workflow proceeds as follows:
- Convert the updated DITA map and topics to XLIFF.
- Using your translation tool, compare the new XLIFF with the one you previously
had translated and recover In-Context Exact (ICE) matches.
This step recovers the translations of text that has not changed since last
translation cycle.
- After recovering all ICE matches, mark all translated segments as
untranslatable ("do not translate") in the translation tools.
Translations will remain visible in the XLIFF as context information for the
translator but will not need to be changed.
- If you have not yet updated your Translation Memory with the translations from
the previous cycle, import the TMX into the Translation Memory of your
translation environment.
- Use your TM engine to retrieve matches for the segments that remain
untranslated.
A TM engine can evaluate the similarities between current text requiring
translation and entries that exist in its database. A match is a
perfect match when source text is exactly the same as the text
found in the translation memory. Entries identical to the text being translated
are considered perfect matches; a match is fuzzy when
source text is similar but not 100% equal to the text found in the
database.
- If your translation memory only contains entries from a very similar project,
you may want to accept all perfect matches as final.
Because there is no guarantee that these matches are the right translations,
you should let professional translators approve them. Nevertheless, you may ask
your LSP to set a special price for segments with good matches from your own
TM.
- Use EBMT and recover additional matches.
Sometimes the difference between the old text and the new one is simply an
updated number. Such a small change is something a good CAT (Computer-Aided
Translation) program can correct automatically using EBMT techniques. An EBMT
engine can also automatically correct the translation of known terms with the
aid of a terminology database.
You should now have an XLIFF file ready to send to an LSP for completing
the translation cycle.
- Send the XLIFF file, partially translated via the preceding steps, with an
updated PDF rendering.
- Receive back the translated XLIFF file and convert it back to the DITA map and
topics.
- Remember to update your TM engine when you receive the translated version
back.
- Finally, if your translation budget allows it, generate a PDF rendering of the
translated project and send it with a copy of the translated XLIFF to your LSP
for proofreading.
If the reviewer finds an error, it can be corrected in the XLIFF file
and sent back to you to update your topics and your TM.
The full maintenance workflow is shown in the following diagram: