Planning a single source publishing application for business documents

Author: Managing director, Elkera Pty Limited

This is the text of a paper presented by Peter Meyer at OpenPublish, Sydney on 29 July 2005.

Planning a single source publishing application for business documents

Introduction

This paper will discuss the critical steps in planning a single source publishing application for narrative business documents that have an important impact on ease of use of XML authoring systems for content authors.

While it is well accepted that the use of XML for single source publishing is an effective solution to the content publishing problem, many organisations have difficulty getting content into XML. The key thesis of this paper is that these difficulties arise from poorly designed DTDs or schema and inadequate planning for author needs in interface design. To overcome these problems, content publishing systems must be designed from the ground up with author usability in mind. If this is done, the switch to XML authoring can be a productivity gain to the enterprise and a further benefit in the business case for the single source publishing system.

What is single source publishing? This is the process that allows content to be maintained in one place and either used in multiple publications or published in multiple outputs (eg, print and web renditions). This might be to permit multiple versions of a publication for a product family with shared and custom content in different languages or it might be simply to allow print and web versions of a complete document.

What are narrative business documents? We are talking about such things as:

• product, process and training manuals
• technical specifications
• contracts
• regulatory policy materials
• simple marketing literature that has to be published in print and on the web
• articles, white papers and consultants reports.

In virtually all cases, it is possible to define standard publishing layouts for a class of documents and automatically apply those layouts as renditions are generated.

In all these cases, the use of XML for content management can be a useful foundation for a single source publishing strategy. Where documents are held and updated for long periods, the use of XML may also avoid technological obsolescence of the content due to changing software versions and publishing styles.

This paper is not concerned about business documents such as invoices, purchase orders and other similar structured data records. They are not narrative business documents.

Vision statement and business case

Setting up the project

The decision to develop a single source or publishing application needs to be based on a clear, comprehensive vision and business case. If this step is not completed correctly, things can go wrong very quickly.

Lets revisit the reasons for wanting to establish a single source publishing application. Assume that XYZ Enterprises develops computer software products and needs to produce software manuals for particular variations in the product. It wants to publish both print and on-line versions. Currently, they are using MS Word for document authoring. Content that is common to multiple publications is separately maintained in each publication. Inevitably, these become inconsistent. Considerable manual intervention is required to properly display documents in HTML on the web or else web publications are based on PDF documents. XYZ Enterprises wants to overcome the inaccuracies and delays in the current system. It wants to reduce the costs of manual intervention. It wants the full functionality of browseable, searchable HTML on its web site.

There are at least two ways that XYZ Enterprise might set out to handle this problem. One way, might be for someone in IT to say "We need a CMS". Some cost saving assumptions will be developed and various CMS systems costed. On this basis, a business case might be established.

The other way is that someone in IT might say "We need to work out exactly what we want to achieve here and then the best way to go about it. We need to make sure we identify all benefits and see how we can minimise the costs of gaining those benefits. We had better make sure we understand our data and our user's needs." This will set in train a more complex process but it is more likely to produce the best results for the enterprise.

There are many variations on these themes but these two represent the basic issues that need to be considered. The consequences of each approach are considered in the next section.

Don't start at the wrong end

The "we need a CMS" approach leads to serious problems for the project. It is the equivalent of a cargo cult mentality. It is possible that the project can be made to work if XYZ is lucky and has enough money but problems along the way may include:

• XYZ may not actually need an enterprise CMS. It may already have a file system or document management system that can be used as a suitable platform with some creative system design. It is quite possible that money will be spent on systems that are not really needed because of a lack of understanding about how XML publishing systems can be created.
• The features and mode of operation of the chosen CMS will define the way that XYZ's content is organised, rather than the other way around. The process can become one of attempting to push a square peg into a round hole.
• The mind set of starting with a CMS will divert attention from other analysis that should be undertaken.
• The CMS vendor will offer packaged schema and applications that may not be the best tools for XYZ. It may be difficult to obtain the vendor's assistance to look at alternative products.
• Many stakeholder needs may not be properly identified until the system is to be installed, causing delays, disruptions and budget overruns as problems are identified and dealt with.

One way in which enterprises can fall into the "we need a CMS" trap is during preliminary investigation of options. It can be very useful to understand the features provided by various enterprise and web CMS systems and other candidate software applications during early planning. This can help in the development of requirements, if handled carefully. However, it is easy to be captured by vendors at this stage. Ideally, an expert, independent consultant should be employed to assist during the planning stages to help prevent premature capture by a system vendor.

The analytical approach

If XYZ Enterprises starts by identifying all relevant users or stakeholders in the system and then by analysing their needs, there is a good prospect XYZ will able to identify the widest range of possible benefits. It is also more likely that the true costs of deriving those benefits will be identified. If XYZ does this correctly, it will have a robust vision and business case the will drive the rest of the project.

The initial analytical process should lead through these steps:

• Identify all system internal and external system users, interfaces to the system and relevant data flows.
• Identify the high level requirements for each interface (more detailed requirements will be gathered later).
• Understand the problems and needs of each set of users. What problems are going to be solved for those users?

In this process, it is necessary to re-consider work flows and work practices. There is no point in applying new technology to old, manual work practices.

Commonly, benefits will be derived from:

• quicker production time;
• improved accuracy of content; and
• reduced publishing costs, particularly by automation of various tasks.

Speeding up production time may have flow on benefits for other parts of the enterprise, permitting faster delivery of new products, for example.

Once the benefits are identified, these need to be related to real world work flows and work practices. Just what changes must be made to achieve those benefits? Functional estimates must be developed of the likely level of savings that are to be sought. From there, estimates of monetary savings or gains can be calculated.

The benefits side of the business case should define measurable goals that can be tested throughout the project and on completion.

The other side of the business case is to correctly identify the costs. This can only be done properly if all problems have been identified and provision made for their resolution in the overall strategy.

One of the major problems observed with some single source publishing system proposals is a complete failure to appreciate the needs of content authors in the early stages of planning. It seems to be assumed that authors will pick up XML content authoring with minimal change management issues. If this issue is not properly handled, many of the expected benefits will be eroded through delays in project roll out, high training and support costs and from problems dealing with inconsistent data.

There are two sources to this problem:

• a misplaced expectation that authors will become XML geeks and quickly learn a new paradigm; and
• a lack of appreciation of the role of the DTD or schema in making the work of the author simpler and in helping to ensure that data is not only valid but consistently marked up.

If the authoring team is only 2 or 3 technically minded people, these problems might be managed fairly easily. If there are 10, 20 or 30 authors, they can become a major impediment to the project. Both issues are discussed in detail in later sections.

Develop requirements

Once a business case is developed and a project established, it will be necessary to develop more detailed requirements in order to develop a comprehensive system architecture. In reality, a certain amount of high level requirements analysis will have been done for the business case. In this phase, more detail is needed.

It is likely that this requirements phase will be a middle layer in the requirements development process. Detailed requirements for specific components, such as the drafting and rendering applications may be done separately, later in the project.

Requirements must be clearly documented and be related to all the system users and stakeholders identified earlier.

Develop a strategic architecture and implementation plan

It is essential that a detailed strategic architecture is developed to determine the overall scope of the new system and how it will be integrated into existing systems. A certain amount of this is done during the business case development. In this stage, much more precision is required to guide the rest of the project.

This paper does not seek to further investigate the development of a strategic architecture.

Select or develop a schema

Off-the-shelf schema options

In an XML based single source publishing system, the DTD or schema is at the heart of every application component in the system. If the schema is replaced or changed significantly, the costs of changing or replacing software can be very high. It is essential that, before any development takes place, the schema is matched to the system requirements and that it comprehensively models all the enterprise content to produce the desired outputs for all stakeholders.

Ideally, for narrative business documents, it should not be necessary to develop a schema from scratch. The choice of available schema is growing. If an existing schema can be used, there is a good chance that some tools will be available for that schema, thereby reducing the cost of application development.

A comparison of four freely available schema that may be considered for use with narrative business documents is available on Elkera's web site at www.elkera.com/ in the white papers section under “Articles”.

Some CMS vendors provide their own proprietary schema. Use of a proprietary schema that is supported by only one vendor may represent a long term lock in for all dependant software components.

A schema has to provide the representation of data needed to support the content publishing and information management requirements. Those requirements must be carefully worked out before making a schema selection. Schema selection criteria may be broken into these facets:

• capacity to model the required data;
• capacity to provide for consistency of data; and
• ease of use for content authors.

Each of these points is discussed in the following sections. Unfortunately, it is common for system designers to concentrate on the first facet, pay insufficient attention to the second and ignore the third.

Capacity of the schema to model the content

Many off-the-shelf schema provide very flexible content models that can be used for a very wide range of documents. As discussed later, this can be a problem, as much as a benefit.

Issues that need to be considered when evaluating schema include:

• Does it provide an accurate representation of the basic document structure? Many narrative business documents contain distinct components, particularly in the front or back. Explicit markup of these components may be necessary for layout or component numbering purposes. If the schema does not provide elements for these structures, it may be necessary to create them. Subverting the use of other containers provided by the schema may be confusing and may create a misleading appearance of compatibility.
• If there is a significant amount of legacy content to be converted to XML markup, this will have a big impact on schema design. Documents created using word processor software are notoriously inconsistent in their structure. Conversion to XML is a difficult exercise. Invariably, there will be anomalous structures that do not suit the schema, unless the content models are very loose. Usually, it is necessary to choose a balance between rectifying the data and allowing a very loose schema. This requires a thorough analysis of the legacy data at the earliest stage of system planning.
• Does the schema provide adequate metadata or provision for adding metadata? It is almost always necessary to devise a metadata structure that meets specific enterprise requirements.
• Does the schema provide adequate granularity for content re-use?
• Does the schema provide a mechanism to either constrict or extend the schema to suit the specific enterprise requirements?
• Once changes are made, will those changes create incompatibilities with off-the-shelf tools for the schema, thereby negating many of the benefits of an off-the-shelf schema?

Capacity of the schema to provide for consistent markup of content

Some off-the-shelf schema provide a vast number of elements to cater for a wide range of possible uses. They may also permit content to be marked up using several different conceptual approaches. Unless the schema is drastically constricted and content models tightened, there are several serious problems with these schema:

• Different authors will almost certainly apply different markup to the same content. Often, one author will apply different markup to the same content within a document. It can be very difficult to determine if the markup differences are based on a genuine semantic differences in the content or if they are merely idiosyncratic. This can affect content reuse and publishing consistency.
• A schema with a very large number of elements and with loose content models permits a vast number of element contexts that must be handled by rendering applications. This makes it very difficult to accurately specify publishing layouts and to develop rendering applications. Rendering applications are likely to be expensive to develop and unreliable.
• There is a great deal for authors to understand about the schema, this creates author resistance and requires extensive training. This can frustrate XML content authoring or make it unnecessarily expensive.

Ease of use for content authors

XML authoring frees authors from many tedious tasks required by word processing software. They do not have to worry about document layouts nor how to create cover pages, contents listings, headers and footers. However, these benefits can be heavily outweighed by an inconvenient interface that requires authors to constantly find the correct element and insert it in the correct location before writing content.

It is possible to conceive of XML content authoring as a substantial productivity gain in itself. Unfortunately, very few system developers approach it in this way. Rather, it is seen as something that must be endured by content authors for the greater good of the enterprise. This is an unacceptable and wasteful approach. This problem is one of the major barriers to the more widespread adoption of single source publishing systems.

XML editing tools take many different approaches to try to assist content authors to create markup as they create content. Out-of-the-box, most XML editors require that authors must have a thorough understanding of the schema and that they must work in a tagged display to effectively insert new elements. Clearly, this is not satisfactory for the majority of content authors. The process of having to find the correct element from a list and insert it in the correct location interferes with the natural authoring process. It creates a barrier to the adoption of XML authoring. Where authors do learn to work with such a tool, productivity can be badly affected.

In any group, it can be expected that around one quarter of the authors will be able to work with a moderately complex schema without extensive training and support. Around half the group may learn to use the application but with varying degrees of efficiency. Around one quarter will likely have severe problems adapting to the new content authoring process.

Various approaches can be taken in XML editor design to minimise these problems. However, unless the schema is reduced to a very simple model that requires authors to understand only a few basic concepts, it will be difficult and costly to implement and support an XML authoring environment.

Planners of XML based single source publishing applications need to consider how to comprehensively overcome these problems and gain the full benefits of a simpler content creation system for their authors. To do so requires careful planning from the start of schema development but it will produce major benefits for the project.

Evaluate & select applications

The need for requirements

Software applications can be evaluated only on the basis of clear requirements. Many of the issues discussed affecting schema design and author usability will have a direct impact on specific requirements for applications.

Editor selection

A critical issue in editor selection will be whether authors are to create content directly in XML or, whether content will be created in another environment and converted to XML. If authors are to work natively in XML, it is necessary to determine exactly how authors will operate with the selected schema and how authors will be trained and supported.

It is suggested that all out-of-the-box XML editors require extensive customisation if acceptable usability levels are to be achieved. Schema designs vary considerably. No XML editor can provide a universal approach that works for all schema. Different content demands different approaches to the way authors will work with that content and the schema.

Planning for the XML editor must include a strategy to tailor the editor to the chosen schema and author needs so as to simplify the work of content authors. While this may involve an additional layer of development or expense, it ought to enhance the business case by minimising training effort, reducing ongoing support costs and improving overall author productivity.

The costs of failing to pay proper attention to author convenience are never factored into the business case and represent a time bomb that may only detonate after the system is exposed to authors. By then it is usually too late to properly fix the problem.

Rendering application selection

Rendering application development is directly affected by schema complexity. Simplification of the schema and elimination of unnecessarily loose content models will reduce the cost of application development.

When planning rendering applications it is necessary to develop functional requirements and formal output specifications to ensure that rendering applications are comprehensive and reliable from the outset. Formal specifications will greatly facilitate ongoing application maintenance.

CMS selection

Enterprise CMS selection involves a clear understanding of the application requirements and system architecture. Some of the critical issues that need to be considered include:

• integration with existing document management, web and intranet publishing systems, if any;
• document storage, retrieval and version management;
• records management and archive requirements;
• client and project information storage and relationships with documents;
• user rights and privileges;
• work flow support;
• integration with XML authoring applications and rendering applications;
• the level of granularity of storage of XML documents;
• web publishing support, including automated link creation from XML markup and link validation;
• metadata management and thesaurus or taxonomy support;
• searching and file format support;
• collaborative editing support, particularly involving persons outside the enterprise; and
• technical performance, scalability and security issues.

Extensive single source publishing functionality can be created without an enterprise CMS. Careful design of the schema and selection of suitable applications can greatly minimise reliance on a complex, expensive enterprise CMS. Where an enterprise CMS is needed, planning for it will benefit from a user and content driven approach to requirements development.

Develop applications

Clearly, application development and configuration is a major part of the project. If the project has been carefully planned, requirements fully developed and the schema carefully designed to model the data, application development should proceed efficiently. If the preparatory steps are incomplete, application development will be severely disrupted. Beyond that observation, application development is not further considered in this paper.

Rollout

One of the objectives of careful planning and attention to user needs in the planning and design stages is that application roll out problems will be greatly reduced. Inadequate attention to author needs has been one of the major failings of many single source publishing projects.

The potential difference between a project that places high importance on author needs compared to one that does not is shown in the following figure.

In this figure, the blue line shows the relative productivity of authors using an author centric XML editing system. The orange line shows that of authors using a conventional XML authoring system.

The key assumptions behind this figure are:

• The site has upwards of around 10 to 15 authors.
• In the author centric system, the schema is designed for author convenience, strictly limiting the options available to authors and the concepts they need to understand.
• In the author centric system, the XML editor interface has been carefully tailored to allow most, if not all, authoring tasks without the need to show XML tags.
• In the conventional system, the application is based on a loose schema with many elements. The XML editor is essentially 'out-of-the-box' and has undergone little, if any tailoring to provide a tags off interface for authors.

In the figure, the region between the two curves is the cost of not taking an author centric approach to the system planning and design. The quantification of this cost will depend on many factors in each case. The costs of a conventional XML authoring system are quite high and require a lengthy period to bring authors up to even the same level as a word processor based system. They cannot achieve the full potential of an XML content authoring system.

Page Options

 

  Print this page

 

  PDF Version

 

  Email this page

         Updated: 10-12-2006