Version 1.0
Last Revised: June 14, 2002

Part I: Introduction to Structured Markup

These guidelines are written to assist producers of talking books. They focus on how the textual content file of a DAISY Digital Talking Book (DTB) is to be marked up or "tagged" to clearly define the structure of the book. They are applicable to DTBs produced in accordance with DAISY 3, the ANSI/NISO Z39.86-2002 standard, "Specifications for the Digital Talking Book". Hereafter, the standard will be referred to as "DAISY 3."

The objective of these guidelines is to recommend how this tagging should be done to provide the end user an understanding of the structure of the book, and to introduce means to navigate through the DTB.

The guidelines will show how to recognize the structure elements (e.g., prefaces, chapters, sections, sidebars, lists, etc.) in a print or electronic text book, and how to tag those elements using Extensible Markup Language (XML). Throughout these guidelines "book" refers to any type of document, including magazines, workbooks, reference works, etc.

The amount of structure and the number of links set by the DTB producer determine the level of access available to the end user. The more structure and links there are, the greater the access and navigation points. These guidelines offer producers great flexibility in selecting the most suitable approach. However, once the degree of the markup is chosen, it is strongly recommended that directions for tagging provided in these guidelines be followed closely to ensure the markup is "valid." Markup is said to be valid when it strictly follows a set of rules called a Document Type Definition (DTD), described below.

The DAISY Structure Guidelines describe how to correctly apply the tags from the DTBook DTD, version 1.1.0 (dtbook110.dtd), developed as part of DAISY 3.

Familiarity with XML markup and with authoring and validation tools is required for those working at a detailed level with structuring according to DAISY 3.

This introduction presents some of the general aspects of structuring a DTB. Examples of various categories of DAISY DTBs are given - from the simplest (audio and title only) to the most complex (audio and full text). Markup is explained, and the order in which content is presented to the end user is discussed. Hierarchy, nesting, and navigation are examined.

The DAISY Digital Talking Book

The DAISY DTB is a collection of digital files (from this point onward referred to simply as "files") which provides an accessible representation of the printed book for blind, visually-impaired, and print-disabled users. These files may contain digital audio recordings of human speech, marked-up text, and a range of machine-readable files.

The structure of the book is designated by the XML tags and is accessible to the reader by use of a browser or a playback device. The DAISY DTB utilizes the technology of the Internet with some specialized applications added to provide greatly improved access to the information.

DAISY 3 supports any of the following classes of DTB:

XML provides the producer with the ability to structure a book in great detail. Compared to HTML markup, XML increases markup options and makes more detailed structure and proper nesting possible.

A DTB produced under DAISY 3 consists of some or all of the following files:

DTBook DTD

The XML Document Type Definition (DTD) used for the textual content files of digital talking books is the DTBook DTD. Its filename is dtbook110.dtd. It is a machine-readable list of allowable tags, the attributes that may be applied to them, and rules on where the tags may be used. For example, sentence tags (<sent>) can be used inside paragraph tags (<p>), but not the other way around. To verify that a document has been marked up in accordance with a DTD, one runs a program called a validating parser that compares the markup with the DTD and lists any errors in applying tags, attributes, etc.

Because the DTD is written to be read by a computer it is not particularly easy for the average person to understand. An HTML "Expanded DTBook DTD" is available, containing the same information as the DTBook DTD but in a more user-friendly format. It contains a discussion of DTDs, an alphabetical list of the elements (tags) included in the DTBook DTD, clear statements of what tags can be used inside a given element, where, in turn, each element can be used, and information on the attributes allowed for each element. Attributes that must be used whenever a specific tag is used are marked as "required" for that tag. Those that are optional are labeled "implied." The latest version of the DTD and its associated Expanded DTD can be found at www.loc.gov/nls/z3986.

The NCX

The NCX (Navigation Control File for XML Applications) is a critical component of the user interface of the book in that it provides a view of all the points in a text to which a user may navigate. Each navigation point in the NCX is linked through the SMIL file to the corresponding location in the audio and XML textual content files, providing direct access to that location. The NCX may not be identical to the table of contents (TOC) of the printed edition. (It will usually contain more elements of the book than the TOC does.) For DTBs containing an XML textual content file the NCX is generated from the XML markup. The way in which the markup is applied will determine what is contained within the NCX.

Why Mark Up?

An analogue book on cassette without tone indexing does not allow the end user to navigate to points within the book. A digital talking book without markup is equally inaccessible.

When a book is prepared for recording for analogue cassette format, a chapter and an appendix usually fit in the same level of the tone index hierarchy and are therefore treated in the same way. In terms of access, distinguishing these elements as different from each other is unimportant. Each is identified by a tone or a set number of tones.

This is not the case when producing a DTB. In the digital world, distinguishing one structural element from another is of great importance; when an element is identified and marked up, properties special to that element can be assigned to it, resulting in increased flexibility and enhanced navigation for the end user. For example, in an analogue recording the narrator pronounces or spells out an acronym, as appropriate. In a DTB containing a text file that may be accessed by a browser with synthetic speech it is important for the markup to indicate if the acronym should be spelled out or pronounced. Whether the acronym is to be spelled or pronounced is a property assigned to the acronym tag.

Furthermore, when elements are identified they can be displayed according to user needs. A user may not want to hear the sidebars in a book. If the sidebars are identified and marked up with the sidebar tag the end user can choose to skip them, listen to them as they occur, or even listen only to them.

In short, markup is the identification and tagging of the components of a text. The more detailed the markup, the greater the access provided to the end user.

Markup Tags

Tags are the elements used to mark up a book. A tag is basically text that a computer program can understand. Tags are surrounded by angle brackets, (< and > -- less than and greater than) which tell the computer that the text within the brackets is information upon which it needs to act. Tags are generally used in pairs to mark the start and end of a tagged item. Note that the end tag contains a slash "/". In the following example, the <q> tag is used to mark a short quotation:

As Yogi Berra said, <q>"It ain't over 'til it's over".</q>

The <q> tag indicates the beginning of the quote and the </q> tag indicates the end of the quote.

Please note that in a DTB the tags are not displayed or rendered in any way.

Attributes

An attribute functions somewhat like an adjective to provide more information about the structure a tag identifies. One of the most commonly used attributes is "class". In the following example, class="chapter" indicates that the "h1" tag is marking a chapter heading: <h1 class="chapter" >Darwin's Formative Years</h1>. The attribute "id" is heavily used to uniquely identify each structural element of the book. Other uses of attributes include indicating whether or not an item may be "turned off" as part of a group of items the user wishes to skip, and indicating if an acronym should be pronounced as a word or spelled out letter by letter, as mentioned earlier.

An attribute, if used, must appear in the start tag and the value of the attribute (in the above case, chapter) must be in quotes. In most cases the use of attributes is optional. Tags for which they are required will be clearly identified in Part II of these guidelines.

One attribute which requires special mention is "smilref." It is used to synchronize the textual content file and the SMIL file when a user moves between navigation controlled by the SMIL file and navigation controlled by the textual content file. DAISY 3 requires that it be valued for each element in the textual content file that is referenced by a SMIL file. Both the SMIL file and textual content file must be present before these attributes can be valued, so they will normally be generated by software reading both files.

Required Tags

The following tags are required for a book to be valid to dtbook110.dtd. The complete DAISY Digital Talking book is surrounded by the <dtbook> and </dtbook> tags. Within these, the <head> and </head> and <book> and </book> tags must also be present. The <head> tags identify information about the book that is separate from the content. The <book> tags enclose the whole of the book. The following example illustrates how these tags are used.

<dtbook>
<head>
Information About the Book
</head>
<book>
The entire content of the book, including cover information, etc.
</book>
</dtbook>

Front Matter, Body Matter, and Rear Matter

Within "<book>" the content should generally be divided into three sections called front matter, body matter and rear matter, presented in that order and tagged with the elements <frontmatter>, <bodymatter>, and <rearmatter>.

Front Matter

The front matter consists of information found in the preliminary pages of a book (e.g., title, author, book jacket material, foreword, acknowledgements, dedication, and table of contents) as well as information added by the talking book producer (e.g., date of recording, narrator, studio, special copyright message). See Information Object: Front Matter in Part II(a): Major Structures.

Body Matter

The body matter of a book consists of the basic content of the document as distinguished from prefatory and supplementary materials. The body matter may be divided into parts, chapters, sections, etc. See Information Object: Body Matter in Part II(a): Major Structures.

Rear Matter

The rear matter consists of material following the main body of the book. Examples are: appendices, bibliographies, alphabetical indexes, etc. These items should be presented in the sequence found in the printed book. See Information Object: Rear Matter in Part II(a): Major Structures.

In summary, the following example shows the use of <frontmatter>, <bodymatter>, and <rearmatter> tags within <book>:

<book>
<frontmatter>
Title, Author, Book jacket information, Dedication, Table of contents, etc.
</frontmatter>
<bodymatter>
Part 1, Chapters 1 - 3, Part 2, Chapters 4 - 6, etc.
</bodymatter>
<rearmatter>
Glossary, Appendices, Bibliography, Index
</rearmatter>
</book>

Structure and Hierarchy

The main elements of a document, such as parts, chapters, sections, stanzas, etc., and their interrelationships, constitute its primary structure. These are ordinarily arranged hierarchically. For example, a novel consisting of an introduction and ten chapters has a very simple structure of eleven elements all at the same hierarchical level. On the other hand, a textbook containing parts, chapters, and sections has a more complex structure with text elements at three hierarchical levels: parts at the highest level, chapters at the middle level, and sections at the lowest level. Appropriate markup is used to identify the proper hierarchical structure of a document.

Levels

Levels describe the relative position of the major structural elements of a book. The hierarchy they define provides the end user with the ability to navigate within the DTB. Therefore it is critical that the markup of levels is correct.

Two methods of marking up levels are allowed by dtbook110.dtd. The first uses six tags: <level1>, <level2>, <level3>, etc., up through <level6>, with the highest level of a book tagged as <level1>. The second method uses a single <level> tag to mark all levels, with differences between the levels defined by the "depth" attribute. (See "Attributes" above; see also Alternative Markup in Part II(a): Major Structures). In the following examples and discussion, only the level1 through level6 method is described.

A level is marked up in the following way:

  1. Determine at which level the structural component (part, chapter, section, etc.) falls in the original document and use the class attribute to name it.

<level1 class="chapter">

If the highest level of the book is "Part" the tag should read <level1 class="part"> and if the next level consists of "Chapter", the second level should read <level2 class="chapter">. If a book is made up of chapters which contain sections, the chapters should be tagged <level1 class="chapter"> and the sections tagged as level <level2 class="section">. In a book with one level (chapters) only the <level1 class="chapter"> tag would be used.

It is not necessary to use the class attribute names shown in the examples in these guidelines (part, chapter, section, etc.). A level can be called anything that doesn't violate basic naming conventions (spaces, colons, commas, and periods cannot be used in attribute names). <level1 class="kazong"> is a valid name, even if it is not very descriptive. DTB producers should assign names to levels using their local language. For example: <level1 class="kapitel"> (Nordic for chapter).

If the structural component has a heading in the print book, mark it using the tags <h1> through <h6>. The number of the level and the heading must be identical (h1 for level1, h2 for level2, etc.). The class attribute value used in the level tag must also be used within the heading tag. For example:

<level1 class="chapter">
<h1 class="chapter">Darwin's Formative Years</h1>

The level tags are the container for the part, chapter, etc., while the h1 to h6 tags mark the heading for that part, chapter, etc.

At the end of the structural component being contained by the level it is necessary to insert the appropriate end tag: </level2> (end of level 2), </level1> (end of level1). For example:

<level1 class="chapter">
<h1 class="chapter">Darwin's Formative Years</h1>
...content of chapter...
</level1>

For further discussion of levels, see Information Object: Levels in Part II(a) Major Structures.

Nesting

In a DTB produced according to the DAISY Structure Guidelines, components at different levels in the hierarchy must be nested, that is, contained one within the other. This means that a component at a lower level must fit completely inside the higher level. In other words, when a second tag is opened before the previous tag is closed, proper nesting must be observed -- the second tag must be closed before the first is closed.

Valid markup: <level1> <level2> </level2> </level1>

Invalid markup: <level1> <level2> </level1> </level2>

In addition, when marking up levels using the level1 to level6 tags, the tags must be used in sequence. For example, a level 3 must be preceded by a level 2, and a level 2 by a level 1. A level 3 element (e.g., section) that is not inside a level 2 element (e.g., chapter) will be invalid to the DTD. If the document is run through a validation process (via a parser) the invalid markup will be flagged.

Navigation and Hierarchy

The hierarchy in the DTB reflects the hierarchy in the print book. The markup used in the DTB to represent the hierarchy determines the extent of navigation available to the end user.

In most cases, only structural components with headings should be identified using the level1 to level6 tags. Components such as acknowledgements or dedication sometimes appear in the print book without a heading, in which case they should be marked up with the <div> tag (See Major Structures).

The producer must impose a structural scheme in the digital talking book when it is absent from the print book. In some cases where the structural scheme is unclear, it may be necessary to promote a level, add a level, or flatten the hierarchy. As long as the final result is a well-structured digital talking book the producer has the flexibility to do this. For example, sometimes there is a discrepancy between the appearance of a heading, as indicated by typography, and the apparent hierarchy in the printed book. A subheading printed in the same typeface as level 3 headings in that book may appear as the first heading following a chapter heading at level 1. This could be due to various reasons. First, there may be no true hierarchy in the book and the typography used could reflect an aspect of content rather than hierarchy. In such cases it would be possible to flatten the hierarchy in the DTB, placing such headings at an appropriate superior level. Second, there may be a hierarchy in the book which is not correctly represented by the typography. In this case the actual hierarchy should be reflected in the DTB regardless of the typography.

Sequential Structure

The contents of a DTB should generally be presented to the end user in the order in which they appear in the printed book. That sequence does not necessarily relate to the physical location of the digital information in a DTB (that is, items that follow each other in the book may be located in different files in the DTB), or to the order in which the contents were recorded (that is, a note that is read at the end of a sentence in the DTB may in fact have been recorded on a different day than the sentence was). Proper sequence is especially important for the end user who does not navigate randomly through the DTB, but instead listens to it from beginning to end.

A presentation sequence should be established where none exists in the original document. For example, some material such as pictures, sidebars, boxes etc. "float" within the surrounding text. That is, they do not fall at a clearly identified point within the text. They are positioned on the print page for visual effect and may not be meant to be read at a single specific point within the surrounding text. The talking book producer must establish the sequence in which such elements are presented within the surrounding text.

Such floating information may be vital for the understanding of the continuous text, but text and floating information may function more or less independently of each other. When such material is inserted into the text this should be done as closely as possible to existing reference points or relevant text, without disrupting the flow of the content. Because sighted listeners and low vision customers may use the audio of a DTB as support for visual reading, floating information should be included on the same page as it occurs in the printed book, if possible.

Some books rely strongly on visual presentation and have no continuous text. When there is no apparent order in the printed book an order must be established for the DTB. This is done according to the conventions of the producing country. For example, in the western world a left to right, top to bottom sequence would be appropriate.

In a DTB it may sometimes be beneficial to move selected material, (e.g., picture captions) from its location in the print book and gather it into a section created for the DTB. This section should be placed at an appropriate point within the overall sequential structure, often as part of the rear matter.

Page Identification

General Instructions

It is not a requirement that pages be individually tagged in a DTB, but it is strongly recommended for textbooks or books that may be used for study purposes. When tagging the pages of a book, all pages should be included. Pages that are not numbered and are not part of the pagination sequence are tagged without being assigned a page number.

To aid in navigation, each page number should be placed preceding the first text on the page, regardless of its location in the printed book. This allows the "Go To Page" feature of most players to navigate directly to the top of a given page, so that playback begins with the requested page and includes the full content of that page. When a chapter or section, for example, begins at the top of a page, the level tag would occur first, followed by the page number, the heading (if present in the print book), and then the text.

Books Containing Blank Pages

Blank pages should be tagged and assigned the appropriate page number regardless of whether it appears in the print book. The listener should also receive aural confirmation of the existence of blank pages. For example: "Page 43 -- blank page."

See Inline Elements: Information Object: Page Numbers for a more complete discussion of this topic.

Producer's Notes

Information added to the DTB by the producer should be tagged as producer's notes using the <prodnote> tag. Producer's notes may be used in the front matter, body matter or rear matter of a DTB. In the front matter, producer's notes may contain information such as a special copyright message, differences between the DTB and the print book, narrator's name and production date, etc. In addition, "How to use this DTB" may contain a general description of the structure of the book, number of levels, navigation features, etc., without reference to any specific playback platform. The body matter will often contain producer's notes incorporating descriptions of pictures, diagrams, maps, etc. Producer's notes in the rear matter may contain information about the production of the DTB that is not included in the front matter.

The producer should classify producer's notes as either required or optional via the "render" attribute. The end user may then choose to have the material that is classified as optional turned on or off. "How to use this DTB" would typically be classified as required whereas descriptions of pictures could be classified as optional.

Copyright © 2002 DAISY Consortium