An introduction to DITA

Author: Senior Consultant, Elkera Pty Ltd

Date: 21 July 2006

Presentation by Andrew Squire at Open Publish 2006, Sydney on 27 July 2006, originally under the title "Real world application development with DITA".

Many documentation managers have heard of DITA but find it difficult to work out if it is actually relevant to their needs. In this presentation, Andrew Squire explains the basic features of DITA, the problems it aims to solve, an overview of how it works and its main benefits.

An introduction to DITA

1 Introduction

1.1 What is DITA?

DITA stands for Darwin Information Typing Architecture.
DITA is an XML based architecture for authoring and publishing technical documentation.
DITA is a topic based architecture where content is broken up into self-contained chunks:
content is broken into self contained chunks
familiar to help and web content creators

1.2 DITA history

Developed by IBM in 2000 to replace IBMIDDoc for their large library of technical documentation.
IBM donated DITA to OASIS in March 2004.
Revised by OASIS and became an OASIS standard in May 2005.

2 What problems does DITA try to solve?

2.1 Content reuse

Some organizations maintain large document sets where multiple documents may contain copies of the same content.
A common approach to content reuse has been to copy the content and paste it into new documents. Multiple copies of the same content increase the cost of document maintenance when the common content needs to be changed, increase the cost of translation and increase the likelihood of copies becoming inconsistent.
XML natively supports content reuse using entities. Shared content can be stored in a separate file and included into the document using an entity reference. This can cause difficulties when editing fragments because they cannot contain an XML declaration or a DOCTYPE declaration.
W3C developed XInclude to enable content reuse. XInclude elements are added to an existing DTD such as DocBook. XInclude processing tools are integrated into publishing systems. Content can only be inserted where the XInclude element is valid. This can be inflexible unless the XInclude element is allowed everywhere. Processing errors may occur because there is no mechanism to control the structure that is inserted by XInclude. Inserted content may not be valid when resolved and this may cause problems with processing applications.
Automated content reuse solutions have been attempted with XML and word processor based applications.
XML based applications tend to be complex bespoke applications built around standard and proprietary DTDs. The cost of developing these applications and the complexity means they are not easily accessible to smaller organizations.
Non-XML based applications tend to be lock content into proprietary formats that are not accessible to other applications.
Non-XML based applications may have formatting problems because of a lack of context or may place unnecessary restrictions on content sharing to bypass this problem.

2.2 Content sharing and interchange

Organizations may wish to share or interchange content with partners or other organizations.
Organizations may share and interchange content using word processing files. File formats may not compatible between the organizations. If shared content is be reused, it may need to be restyled to suit its new use. Restyling may require considerable effort that needs to be repeated when new content is received.
Standard DTDs such as DocBook enable sharing and interchange to occur without formatting concerns. However, different organizations may need to customize the DTD in different ways. These customizations may not be compatible with processing applications.
Successful interchange of XML data normally requires data to be transformed by the receiving application or modifications to be made to receiving processing applications. Modifications are ongoing as DTDs rarely remain unchanged.

3 How does DITA solve these problems?

3.1 Topic orientated architecture enables content reuse

DITA uses a topic based architecture for content creation, management and publishing. The base information unit is called a 'topic'. This is a self contained stand-alone unit of information and is the basic unit for reuse in DITA.
Topics are organized into an output such as a document, help system or web site using a map. Topics are included into the map using a topicref element which identifies the topic to insert. The map can be used to specify the order and groupings of topics that form the output.
The map model allows topics to be reused in different documents without needing to make copies of the content.
Topics and maps each have their own DTD. Validation problems that may occur when using XML entities as a method of reuse to do not occur.
The topicref element in the map is similar in function to the XInclude element. However the topicref element can only reference topics. This behaviour is enforced by DITA compliant processing applications. DITA maps will always be resolved into a structure that can be processed by publishing applications.
DITA also provides content reuse at a lower level than the topic. All elements have a conref attribute that can be used to reference shared content. Processing applications replace the referencing element with the referenced element. The content referenced by the attribute must be the same type as the referencing element, either the same element or a specialization of the element. This ensures that conref will always be resolved into a valid structure and can be processed by publishing applications.
DITA provides a technical framework to support content reuse. DITA compliant applications will provide for content reuse, out-of-the-box. An organization must still manage their content architecture to fit with DITA's topic based architecture.

3.2 Specialization architecture enables content interchange

DITA provides an inheritance based method for extending the DTD called specialization.
Specialization involves creating a new element that is based on an existing element. The "specialized" element must maintain the same content constraints as the existing element. The specialization relationship is retained in the DTD for use by processing applications. Specialization results in an inheritance hierarchy.
The inheritance hierarchy allows processing applications to recognize and process new elements based on rules found for existing elements in the inheritance hierarchy. DITA can be customized without breaking processing applications. This enables interchange as customized DTDs can always be processed by DITA processing applications.
Specialization also helps to reduce the cost of maintenance. Adding new elements to the DTD does not always require changes to processing applications and style sheets.

4 Other benefits of DITA

DITA is a modern DTD with a relatively modest set of elements. Those familiar with XHTML will be familiar with many DITA elements. This should reduce the amount of training required when comparing DITA to other standard DTDs such as DocBook.
DITA is an OASIS standard. Standards create a larger community of users that make it viable for developers and vendors to get behind the standard and produce compliant tools. This reduces the need for bespoke applications which reduces the cost for smaller organizations.
Greater number of tools means that organizations can change tools as the market matures and better tools become available without needing to change the content behind the tools.
DITA which is well supported by the open source community which has produced the DITA Open Toolkit (DITA OT). This toolkit provides processing to generate multiple output formats such as PDF (via XSL-FO), HTML and a number of help formats.
It is likely that a community will develop around the standard to make it easier to find writers and developers who are experienced with it.

5 Who should use DITA?

Organizations with product ranges based on standard or common components can significantly reduce the volume of documentation needed.
The DITA topic and map model allows organizations to tailor content to suit the audience or output format. This is much more flexible than document centric applications where it is not easy to restructure content for different output formats without creating redundant content. The DITA Open Toolkit provides processing to generate the different output formats.
Organizations that need to translate their content into different languages can use the component based management and reuse features of DITA to reduce translation costs.

6 Maximising the benefits from DITA

6.1 Information architecture

Moving to DITA is not just a technical change it may involve change to your information architecture and processes to create content. If your existing architecture is document based there is substantial change to move to a topic based architecture. This change needs to be carefully planned.
Organizations need to develop a content architecture that suits their content and the users of their content. This requires extensive content analysis and identification of content suitable for reuse.
Organizations need to plan how to migrate existing content to the new architecture. An organization can expect to rework existing content so it is suitable for use with the new architecture.
The information architecture must be continually managed as new content is created. An information architect must ensure topics are written to be compatible with the architecture and suitable for reuse.
Writers must be trained in the new information architecture. Writers must be trained to create content that is suitable for reuse. This may be a signification change to the writers style.

6.2 Consider the writers

Traditional XML developments often overlook the needs of content writers. XML developments often fail when they do not adequately address the needs of writers. Writers should be involved from the planning stages of the development.
Authoring tool selection is critical for writer's acceptance. Writers should be fully involved in the process for selecting tools they will use on a daily basis. Consider customising tools to automate common tasks the writers perform. The move to structured authoring should be a benefit to the writers.

6.3 Off the shelf tools

Care needs to be taken when selecting XML based tools for use with DITA. Specialization relies on features provided by the XPath standard. These features may not be available in some XML tools. Organizations will need to check with vendors to ensure tools support DITA specialization.
The DITA Open Toolkit (DITA OT) is an open source toolkit for processing DITA content. This toolkit provides tools for processing and resolving DITA maps and conref attributes and transforming DITA content into PDF (via XSL-FO) and HTML as well as various Help formats. DITA OT may need to be integrated with other tools to create a production publishing system.
XML Authoring tools support for DITA is growing. XMetaL Author, Arbortext Editor (formerly Epic), Adobe Framemaker and other XML editors now provide specific customizations for DITA. However, DITA in only in it's infancy at the moment and some tools may be a bit raw at present.
Rendering applications are providing support for DITA. Commercial XSL-FO renderers can be plugged into the DITA OT. Elkera XML Print provides a DITA customization.
There will always be the need to do some custom development of off the shelf tools. At the very least, outputs will need to be customized to achieve to formatting required by an organization.

Page Options

 

  Print this page

 

  PDF Version

 

  Email this page

         Updated: 10-12-2006