DAISY 3 Structure Guidelines
Last Revised: June 4, 2008
These guidelines are written to assist producers of talking books. They focus on how the textual content file of a DAISY Digital Talking Book (DTB) is to be marked up or "tagged" to clearly define the structure and semantics of the book. They are applicable to DTBs produced in accordance with the DAISY/NISO 2005 standard, "Specifications for the Digital Talking Book", also known as DAISY 3, and ANSI/NISO Z39.86.
The objective of these guidelines is to recommend how this tagging should be done to provide the end user with an understanding of the structure of the book, and to introduce means to navigate through the DTB.
The guidelines will show how to recognize the structure elements (e.g., prefaces, chapters, sections, sidebars, lists, etc.) in a print or electronic text book, and how to tag those elements using Extensible Markup Language (XML). Throughout these guidelines "book" refers to any type of document, including magazines, workbooks, reference works, etc.
The DTB structure set by the producer determines the level of navigability available to the end user. The greater the structural markup, the greater the number of navigation points available to the reader. This is of course constrained by the structure of the source material and the resources available to produce the DTB. These guidelines offer producers great flexibility in selecting the most suitable approach. However, once the degree of the markup is chosen, it is strongly recommended that directions for tagging provided in these guidelines be followed closely to help ensure the markup is "valid." Markup is said to be valid when it strictly follows a set of rules called a Document Type Definition (DTD), described below. All finished content must be valid according to the DTBook DTD.
The DAISY Structure Guidelines describe how to correctly apply the tags from the current DTBook DTD (found in the Standards area of the DAISY Web site), developed as part of the DAISY Standard.
Familiarity with XML markup and with authoring and validation tools is required for those working at a detailed level with structuring according to the DAISY Standard.
This introduction presents some of the general aspects of structuring a DTB. Examples of various categories of DAISY DTBs are given - from the simplest (audio and title only) to the most complex (audio and full text). Markup is explained, and the order in which content is presented to the end user is discussed. Hierarchy, nesting, and navigation are examined.
The DAISY DTB is a collection of digital files (from this point onward referred to simply as "files") that provides an accessible representation of the printed book for individuals who are blind, visually-impaired, or print-disabled. These files may contain digital audio recordings of human or synthetic speech, marked up text, and a range of machine-readable files.
The structure of the book is designated by the XML tags and is accessible to the reader by use of a browser or a playback device. The DAISY DTB utilizes the technology of the Internet with some specialized applications added to provide greatly improved access to the information.
There are three basic types of DAISY DTBs:
XML provides the producer with the ability to structure a book in great detail. Compared to HTML markup, XML increases markup options and makes more detailed structure and proper nesting possible.
A DTB produced to the DAISY Standard consists of some or all of the following files:
The XML Document Type Definition (DTD) used for the textual content files of DAISY DTBs is the DTBook DTD. It is a machine-readable list of allowable tags, the attributes that may be applied to them, and rules on where the tags may be used. For example, sentence tags (<sent>) can be used inside paragraph tags (<p>), but not the other way around. To verify that a document has been marked up in accordance with a DTD, one runs a program called a validating parser that compares the markup with the DTD and lists any errors that may be present in tags, attributes, etc.
The current version of the DTD can be found at http://www.daisy.org/z3986/2005/. Please note that as DTDs are machine-readable, it requires considerable knowledge of DTDs for a human to be able to intelligently interpret the information within the file.
The NCX (Navigation Control File for XML Applications) is a critical component of the user interface of the book in that it provides a view of all the points in a text to which a user may navigate. Each navigation point in the NCX is linked through the SMIL file to the corresponding location in the audio and XML textual content files, providing direct access to that location. The NCX may not be identical to the table of contents (TOC) of the printed edition. (It will usually contain more elements of the book than the TOC does.) For DTBs containing an XML textual content file the NCX is generated from the XML markup. The way in which the markup is applied will determine what is contained within the NCX.
An analogue book on cassette without tone indexing does not allow the end user to navigate to points within the book. A DTB without markup is equally inaccessible.
When a book is prepared for recording for analogue cassette format, a chapter and an appendix usually fit in the same level of the tone index hierarchy and are therefore treated in the same way. In terms of access, distinguishing these elements as different from each other is unimportant. Each is identified by a tone or a set number of tones.
This is not the case when producing a DTB. In the digital world, distinguishing one structural element from another is of great importance; when an element is identified and marked up, properties special to that element can be assigned to it, resulting in increased flexibility and enhanced navigation for the end user. For example, in an analogue recording the narrator pronounces or spells out an acronym, as appropriate. In a DTB containing a text file that may be accessed by a browser with synthetic speech it is important for the markup to indicate if the acronym should be spelled out or pronounced. Whether the acronym is to be spelled or pronounced is a property assigned to the acronym tag.
Furthermore, when elements are identified they can be displayed according to user needs. A user may not want to hear the sidebars in a book. If the sidebars are identified and marked up with the sidebar tag, the end user can choose to skip them, listen to them as they occur, or even listen only to them.
In short, markup is the identification and tagging of the components of a text. The more detailed the markup, the greater the access provided to the end user.
XML markup components are variously referred to as elements and tags, although we attempt to maintain a distinction: A tag is XML code, surrounded by angle brackets (< and >). All tags are either opening (as in <p>
), closing (as in </p>
) or self-closing (as in <br/>
). In the following example, the q
tags are used to mark a short quotation within a paragraph:
<p>As Yogi Berra said, <q>"It ain't over 'til it's over."</q></p>
<q>
indicates the beginning of the quote and </q>
indicates the end of the quote; <p>
and </p>
wrap the entire paragraph. Note that to be well-formed XML, tags must be closed in the reverse order in which they are opened. Tags are not normally displayed in a DTB.
An element is a matched pair of tags (opening and closing), attributes in the opening tag, and all text and tags contained between the matched tags. An element can also be a self-closing tag and its attributes.
Tags and elements are not normally displayed in a DTB.
An attribute functions somewhat like an adjective, providing more information about the structure a tag identifies. It is a qualifier on an XML tag that provides additional information. One of the most commonly used attributes is "class". In the following example, class="chapter" indicates that the "level" tag begins a chapter section:
<level1 class="chapter">...</level1>
The attribute "id" is heavily used to uniquely identify each structural element of the book, and is usually inserted automatically by DTB production software. Other uses of attributes include indicating whether or not an item may be "turned off" as part of a group of items the user wishes to skip, and indicating if an acronym should be pronounced as a word or spelled out letter by letter, as mentioned earlier.
An attribute, if used, must appear in the start tag and the value of the attribute (in the above example, "chapter") must be in quotes. In most cases the use of attributes is optional. Tags for which they are required will be clearly identified in Part II of these guidelines.
One attribute that requires special mention is "smilref." It is used to synchronize the textual content file and the SMIL file when a user moves between navigation controlled by the SMIL file and navigation controlled by the textual content file. The DAISY Standard requires that it be present and have a value for each element in the textual content file that is referenced by a SMIL file. Normally, both the SMIL files and the smilref attributes will be created by the DTB production software.
The following tags are required for a book to be valid to the current DTBook DTD. The complete DAISY DTB is surrounded by the <dtbook>
and </dtbook>
tags. Within these, the <head>
and </head>
and <book>
and </book>
tags must also be present in this order as shown, and as required by the DTD. The <head>
tags identify information about the book that is separate from the content. The <book>
tags enclose the content of the book. The following example illustrates how these tags are used.
<dtbook> <head>
(Information About the Book)</head> <book>
(The entire content of the book, including cover information, etc.)</book> </dtbook>
This element, <link>
, appears in the <head>
section of a document. It establishes the relationships between the current document and other documents, useful in cases where the content has been divided into separate DTBook documents. The <link>
element conveys relationship information (for example, "next" and "previous") that may be rendered by user agents in a variety of ways. <link>
is implemented similarly as in XHTML; for information on its use, consult sources on "link" within XHTML, such as this W3C tip sheet on link, or the link
element section in the XHTML 2.0 spec
Meta provides information about the book and occurs in <head>
. It is not a part of the body or content of the book itself. Meta contains the metadata elements and is the container for the Dublin Core attributes and the additional DTBook attributes. As a minumum the dc:Title and dtb:uid are required. Complete, accurate metadata should be included in all DAISY DTBs.
Within <book>
the content should generally be divided into three sections called front matter, body matter and rear matter, presented in that order and tagged with the elements <frontmatter>
, <bodymatter>
, and <rearmatter>
.
The front matter consists of information found in the preliminary pages of a book (e.g., title, author, book jacket material, foreword, acknowledgements, dedication, and table of contents) as well as information added by the talking book producer (e.g., date of recording, narrator, studio, special copyright message). See Information Object: Front Matter in Part II(a): Major Structures.
The body matter of a book consists of the basic content of the document as distinguished from prefatory and supplementary materials. The body matter may be divided into parts, chapters, sections, etc. See Information Object: Body Matter in Part II(a): Major Structures.
The rear matter consists of material following the main body of the book. Examples are: appendices, bibliographies, alphabetical indexes, etc. These items should be presented in the sequence found in the printed book. See Information Object: Rear Matter in Part II(a): Major Structures.
In summary, the following list shows content belonging to frontmatter, bodymatter, and rearmatter:
The main elements of a document, such as parts, chapters, sections, stanzas, etc., and their interrelationships, constitute its primary structure. These are ordinarily arranged hierarchically. For example, a novel consisting of an introduction and ten chapters has a very simple structure of eleven elements all at the same hierarchical level. On the other hand, a textbook containing parts, chapters, and sections has a more complex structure with text elements at three hierarchical levels: parts at the highest level, chapters at the middle level, and sections at the lowest level. Appropriate markup is used to identify the proper hierarchical structure of a document.
Levels describe the relative position of the major structural elements of a book. The hierarchy they define provides the end user with the ability to navigate within the DTB. Therefore it is critical that the markup of levels be correct.
Two methods of marking up levels are allowed. The first uses six tags: <level1>
, <level2>
, <level3>
, etc., through <level6>
, with the highest level of a book tagged as <level1>
. The second method uses a single <level>
tag to mark all levels, with differences between the levels defined by nesting hierarchy, and optionally with the "depth" attribute. (See Alternative Markup in Part II(a): Major Structures). In the following examples and discussion, only the level1 through level6 method is described.
A level is marked up in the following way. Determine at which level the structural component (part, chapter, section, etc.) occurs in the original document. The class attribute may be used to name (identify) it. The use of class attributes is not required, however, in some players they may provide additional information to the user.
<level1 class="chapter">
If the highest level of the book is "Part" the tag might read <level1 class="part">
and if the next level consists of "Chapter", the second level might read <level2 class="chapter">
. If a book is made up of chapters which contain sections, the chapters might be tagged <level1 class="chapter">
and the sections tagged as level <level2 class="section">
. In a book with one level (chapters) only the <level1 class="chapter">
tag would be used.
It is not necessary to use the class attribute names shown in the examples in these guidelines (part, chapter, section, etc.). A level can be called anything that doesn't violate basic naming conventions (spaces, colons, commas, and periods cannot be used in attributes). <level1 class="kazong">
is a valid name, even if it is not very descriptive. DTB producers should assign names to levels using their local language. For example: <level1 class="kapitel">
(Nordic for chapter).
If the structural component has a heading in the print book, mark it using the tags <h1>
through <h6>
. The numbers of the level tag and of the heading tag must be identical (h1 for level1, h2 for level2, etc.). The class attribute value used in the level tag may also be used within the heading tag. For example:
<level1 class="chapter"> <h1 class="chapter">Darwin's Formative Years</h1>
In the remaining examples in this document, the class attribute value is not used in the heading tags.
The level tags are the container for the part, chapter, etc., while the h1 to h6 tags mark the heading for that part, chapter, etc.
At the end of the structural component being contained by the level it is necessary to insert the appropriate end tag: </level2>
(end of level 2), </level1>
(end of level1). For example:
<level1 class="chapter"> <h1>Darwin's Formative Years</h1> <!-- content of chapter --> </level1>
For further discussion of levels, see Information Object: Levels in Part II(a) Major Structures.
In a DTB that is valid to the DTD and the DAISY Standard, (and thus produced according to the requirements of XML), components at different levels in the hierarchy must be nested, that is, contained one within the other. See the W3C Extensible Markup Language 2004 Recommendation. This means that a component at a lower level must fit completely inside the higher level. In other words, when a second tag is opened before the previous tag is closed, proper nesting must be observed - the second tag must be closed before the first is closed.
Valid markup:
<level1> <level2> </level2> </level1>
Invalid markup:
<level1> <level2> </level1> </level2>
Note also that the invalid markup shown above is also not well-formed XML.
In addition, when marking up levels using the level1 to level6 tags, the tags must be used in sequence. For example, a level 3 must be preceded by a level 2, and a level 2 by a level 1. A level 3 element (e.g., section) that is not inside a level 2 element (e.g., chapter) will be invalid to the DTD. If the document is run through a validation process (via a parser) the invalid markup will be flagged.
The hierarchy in the DTB should reflect the hierarchy in the print book. The markup used in the DTB to represent the hierarchy determines the extent of the "global" navigation (from heading to heading) available to the end user.
In most cases, only structural components with headings should be identified using the level1 to level6 tags. Components such as acknowledgements or dedication sometimes appear in the print book without a heading, in which case they should be marked up with the <div>
tag (See Major Structures).
The producer should impose a structural scheme in the DAISY DTB when it is absent from the print book. In some cases where the structural scheme is unclear, it may be necessary to promote a level, add a level, or flatten the hierarchy. As long as the final result is a well-structured DAISY DTB the producer has the flexibility to do this. For example, sometimes there is a discrepancy between the appearance of a heading, as indicated by typography, and the apparent hierarchy in the printed book. A subheading printed in the same typeface as level 3 headings in that book may appear as the first heading following a chapter heading at level 1. This could be due to various reasons. First, there may be no true hierarchy in the book and the typography used could reflect an aspect of content rather than hierarchy. In such cases it would be possible to flatten the hierarchy in the DTB, placing such headings at an appropriate superior level. Second, there may be a hierarchy in the book that is not correctly represented by the typography. In this case the actual hierarchy should be reflected in the DTB regardless of the typography.
The contents of a DTB should generally be presented to the end user in the order in which they appear in the printed book. That sequence does not necessarily relate to the physical location of the digital information in a DTB (that is, items that follow each other in the book may be located in different files in the DTB), or to the order in which the contents were recorded (that is, a note that is read at the end of a sentence in the DTB may in fact have been recorded on a different day than the sentence was). Proper sequence is especially important for the end user who does not navigate randomly through the DTB, but instead listens to it from beginning to end.
A presentation sequence should be established where none exists in the original document. For example, some material such as pictures, sidebars, boxes etc. "float" within the surrounding text. That is, they do not fall at a clearly identified point within the text. They are positioned on the print page for visual effect and may not be meant to be read at a single specific point within the surrounding text. The talking book producer must establish the sequence in which such elements are presented within the surrounding text, and presentation should be consistent throughout the DTB.
Such floating information may be vital for the understanding of the continuous text, but text and floating information may function more or less independently of each other. When such material is inserted into the text this should be done as closely as possible to existing reference points or relevant text, without disrupting the flow of the content. Floating information should be placed logically and consistently throughout the book. Wherever possible it should be included on the same page as it occurs in the printed book (some users may use the audio of a DTB as support or reinforcement with visual reading).
Some books rely strongly on visual presentation and have no continuous text. When there is no apparent order in the printed book an order must be established for the DTB. This is done according to the conventions of the producing country. For example, in the western world a left to right, top to bottom sequence would be appropriate.
In a DTB it may sometimes be beneficial to move selected material, (e.g., picture captions) from its location in the print book and gather it into a section created for the DTB. This section should be placed at an appropriate point within the overall sequential structure, often as part of the rear matter. Any divergence (such as this) from the print book should always be described in the producer's introduction to the DAISY DTB or in a specific Producer's Note.
It is not a requirement that pages be individually tagged in a DTB, but it is strongly recommended for textbooks or books that may be used for study purposes. When tagging the pages of a book, all pages should be included. Pages that are not numbered and are not part of the pagination sequence are tagged without being assigned a page number.
To aid in navigation, each page number should be placed preceding the first text on the page, regardless of its location in the printed book. This allows the "Go To Page" feature of most players to navigate directly to the top of a given page, so that playback begins with the requested page and includes the full content of that page. When a chapter or section, for example, begins at the top of a page, the level tag would occur first, followed by the page number, the heading (if present in the print book), and then the text.
Sentences often span a page break. A standard procedure for placement of the page tag should be developed by producing organizations, and the tagging should be handled consistently in all books produced. Some organizations consider it is best practice to place the page tag at the the end of the sentence, rather than mid-sentence. For example, if only the first one or two words appear on the previous page, the page tag would be placed at the beginning of the sentence (on the page prior to the actual page break). Other organizations may place the page tag exactly where it appears in the sentence. It would not however be appropriate to place a page tag mid-word if a word is broken and hyphenated at the page break.
Blank pages should be tagged and assigned the appropriate page numbers regardless of whether a printed number appears on the pages of the print book. The listener should also receive aural confirmation of the existence of blank pages. For example: "Page 43, blank page."
See Inline Elements: Information Object: Page Numbers for a more complete discussion of this topic.
Information added to the DTB by the producer should be tagged as producer's notes using the <prodnote>
tag. Producer's notes may be used in the front matter, body matter or rear matter of a DTB. In the front matter, producer's notes may contain information such as a special copyright message, differences between the DTB and the print book, narrator's name and production date, etc.
In addition, the producer may create a whole section, ideally near the beginning of the frontmatter, titled "How to use this DTB," that may contain a general description of the structure of the book, number of levels, navigation features, etc., without reference to any specific playback platform.
The body matter will often contain producer's notes incorporating descriptions of pictures, diagrams, maps, etc. Producer's notes in the rear matter may contain information about the production of the DTB that is not included in the front matter.
The producer must classify producer's notes as either required or optional via the "render" attribute. The end user may choose to have the material that is classified as optional turned on or off. "How to use this DTB" would typically be classified as required whereas descriptions of pictures could be classified as optional.