DITA is an XML vocabulary, but not just any XML. It has certain particularities that are not easy to handle by an ordinary XML editor or a translation tool.
Like an XML editor that is good for authoring in DITA, a translation tool capable of properly handling DITA files should:
conrefattribute or the
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd"> <task id="task_hdj_drv_bh"> <title>Applying XSL Transformation</title> <taskbody> <steps> <step> <cmd>Open the document to transform.</cmd> </step> <step> <cmd>In <uicontrol conref="ui_reference.dita#uiref/xsl_menu"/> menu, select <uicontrol conref="ui_reference.dita#uiref/xsl_trans"/>.</cmd> </step> <step> <cmd>Select the appropriate XSL Stylesheet</cmd> </step> <step> <cmd>Click the <uicontrol conref="ui_reference.dita#uiref/xsl_apply"/> button.</cmd> </step> </steps> </taskbody> </task>
Listing 1 - DITA topic that uses
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="uiref"> <title>UI Elements</title> <conbody> <p><uicontrol id="xsl_menu">Transformation</uicontrol>: program menu that contains all transformation options.</p> <p><uicontrol id="xsl_trans">XSL Transformation</uicontrol>: applies an XSL Stylesheet to an XML document.</p> <p><uicontrol id="xsl_apply">Apply Transformation</uicontrol>: applies the selected XSL Stylesheet to the current open document.</p> </conbody> </concept>
Listing 2 - DITA topic that contains referenced text
An XML editor able to resolve the
conref attributes in would
display that file in WYSIWYG mode as
shown in Figure 1.
For a technical writer working with DITA, it is important that the chosen
XML editor resolves
conref attributes and displays the
For a translator it is also essential to see the text being translated in
a complete representation. If
conref content is not
resolved when translatable text is extracted from the DITA file, the
translator will lack the necessary context for performing the
In Figure 2 below you can see translatable text from Listing 1 extracted by a Computer Aided Translation (CAT) tool that supports DITA content referencing. In Figure 3 and Figure 4 you see the same text extracted by two tools that treat DITA documents as regular XML.
The pictures shown above include markers that represent the original DITA
markup. In one case (Figure 2) you can see the
actual text referenced by
conref attributes; in the other
picture you see just markers.
By using tools that extract complete sentences from your DITA sources, you give translators the context they need. Although this adds to the price you pay if your Localization Service Provider (LSP) charges you by words, the cost increase should be compensated by an improvement in translation quality that would require less review work.
DITA includes a set of DTDs and XML Schemas that contain almost all elements and attributes needed in a standard documentation project. Nevertheless, sometimes the standard set of elements and attributes is not enough and custom extensions are needed.
DITA has a standard extension mechanism known as "specialization". DITA users are allowed to modify the default set of DTDs and XML Schemas, following certain rules, to incorporate the pieces they need.
As DITA is becoming more and more popular, many translation tool vendors include configuration files for the XML filters of their tools that facilitate text extraction from standard DITA documents. Unfortunately, not all tools allow support for DITA specializations.
If you use specialization in your DITA projects, the translation tool used to process your files should:
Even if you don't use specializations, you may still require customized
translations. For example, the standard
element is normally used for internal consumption and readers of the
published documentation almost never see its content. Thereafter, the
<draft-comment> is usually treated as
untranslatable by CAT tools. However, you may still need a translation
<draft-comment> for your content reviewers. Only if
you or your LSP use customizable CAT tools you will be able to get the
Sometimes you will include portions of text in your DITA files that
should not be translated. To mark those pieces as untranslatable you
simply set the value of the
translate attribute to
no, as shown below in Listing
<p translate="no">Warning: this text should not be translated.</p>
Listing 3 - Untranslatable text
Some translation tools simply ignore the
and extract the text for translation anyway.
Notice that the
translate attribute should be used with
block level elements (those that contain full paragraphs or sentences),
<p>. Setting the
no in an element that appears in the middle of a
sentence is a bad idea, as the translator working with the surrounding
text still needs to see the element content for context. Listing 4 shows how you
can safely protect
untranslatable text that appears in the middle of a sentence by
referencing a copy stored in an untranslatable element.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="locking"> <title translate="no">Untranslatable Title</title> <conbody> <p>This sentence contains <ph conref="#locking/lock"/> text.</p> <draft-comment translate="no"><ph id="lock">untranslatable</ph></draft-comment> </conbody> </concept>
Listing 4 - Untranslatable inline text protected in
A translation tool parsing Listing 4 should be able to:
<draft-comment>element with nothing to translate in it.
Make sure your translation tool can ignore block elements that have the
translate attribute set to
A DITA project may contain hundreds of small files. That's not unusual but normally makes file handling somewhat annoying.
When working with a large number of files, DITA teams may opt for using a Content Management System (CMS) or a version management system like CVS or SVN. A CMS is not really required for working with DITA but it may simplify project management.
A CMS may help you separate the files referenced by a DITA map and prepare a package for translation. If you don't have a CMS, you may use a DITA-enabled translation tool for separating the files that need translation from those that don't.
A DITA-enabled translation tool should be able to parse a DITA map and resolve the references to all topics and subtopics, preparing a unified package that you can send to your LSP.
If your LSP charges you for file management, you can reduce cost by preparing a consolidated translation package in-house.
Rodolfo Raya is Maxprograms' CTO (Chief Technical Officer), where he develops multi-platform translation/localisation and content publishing tools using XML and Java technology. He can be reached at firstname.lastname@example.org.