The Z39.98 SSML Integration Feature recasts a subset of the W3C Speech Synthesis Markup Language (SSML) Version 1.1 as a Feature for incorporation in Z39.98-AI Profiles.
The Z39.98 SSML Integration Feature is designed to be used in authoring contexts where speech output is targeted. The feature's content model definitions are designed so that a processing agent can safely ignore or filter out the SSML fragments when speech-related information is not relevant.
Each element contributed by the feature inherits the corresponding semantics as defined by SSML 1.1, unless specified otherwise.
This feature is maintained by the ANSI/NISO Z39.98 advisory committee under the auspices of NISO.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this section are to be interpreted as described in RFC2119.
This resource directory represents version
1.0
of the SSML Integration feature:
This release may not be the most recently published (current) version of the SSML Integration feature. The current version should always be obtained from the static URI: http://www.daisy.org/z3998/2012/auth/features/ssml/current/
This feature must be identified as
ssml
in Z39.98-AI document feature declarations.
The canonical identity URI is:
http://www.daisy.org/z3998/2012/auth/features/ssml/1.0/
This version of the feature is compliant with the Z39.98-2012 Specification.
The normative
RelaxNG schema for version
1.0
of the SSML Integration feature is
z3998-feature-ssml.rng.
Note - this feature schema does not represent an entire document model; it is intended for inclusion in host profiles.
The normative schema includes a number of modules and/or subschemas, which are listed in Appendix 1.
This feature makes the following components available for inclusion in host profiles:
This section defines processing agent behaviors that extend the default behaviors defined in Processing agent conformance definition.
If a processing agent supports this feature, it must comply to the following:
phoneme
The processing agent must support the SSML
phoneme
element, and process it as dictated by the SSML specification.
Upon encountering the
ssml:ph
and associated
ssml:alphabet
attributes on a non-SSML namespace element, the processing agent must process the element equally to the SSML
phoneme
element.
token
The processing agent must support the SSML
token
element, and process it as dictated by the SSML specification.
The
w
(word) element from the Z39.98-AI default namespace must be regarded a synonym of the SSML
token
element.
lexicon
The processing agent must support the SSML
lexicon
element, and process it as dictated by the SSML specification.
The processing agent must support the PLS (
application/pls+xml
) media type, and process PLS documents as dictated by
PLS.
prosody
,
say-as
,
sub
and
break
The processing agent should support the SSML
prosody
,
say-as
,
sub
and
break
elements, and process them as dictated by the SSML specification.
If it does not support these elements, the processing agent must employ the behavior defined in Ignore below.
Note that the
expansion
and
name
elements when referenced from an
abbr
element have semantics that mean that they provide content that can be used synonomously to the content of the
alias
attribute on the
ssml:sub
element.
Processing agents should support the
X-SAMPA
phonetic alhabet.
If a processing agent recognizes but does not support this feature, it must employ one of the following behaviors:
Upon encountering a document instance with this feature enabled, the processing agent issues a notification, and then aborts the processing.
While traversing the document tree, the processing agent ignores any encountered XML element in the SSML namespace, and continues processing its children.
Any encountered attributes in the SSML namespace occuring on non-SSML namespace elements are also ignored.
The processing agent discards all elements and attributes contributed by this feature;
The
abort
behavior is the default; the
ignore
and
discard
behaviors must only be employed when the processing agent is explicitly instructed to do so by the client.
Processing agents that employ the
ignore
or
discard
behaviors should issue a notification.
If a processing agent does not recognize this feature, it must, as dictated in Processing agent conformance definition, abort processing and issue an error message.
The component definitions provided below follow the conventions used in Core Modules.
Provides a subset of the W3C Speech Synthesis Markup Language (SSML) Version 1.1 element set, suitable for integration in Z39.98-2012 Profiles.
Name | Default values | Default usage context |
---|---|---|
strength
|
'none' | 'x-weak' | 'weak' | 'medium" | 'strong' | 'x-strong'
|
ssml:break
|
time
|
TimeValue
|
ssml:break
|
ssml:onlangfailure
|
'changevoice" | 'ignoretext' | 'ignorelang' | 'processorchoice'
|
Document.attrib
,
Phrase.attrib
and
Text.attrib
|
ph
|
PhoneticExpression
|
ssml:phoneme
|
alphabet
|
alphabet
|
ssml:phoneme
|
pitch
|
'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' |
RelativeChange |
PitchExpression
|
ssml:prosody
|
contour
|
PitchContour
|
ssml:prosody
|
range
|
'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' |
RelativeChange |
PitchExpression
|
ssml:prosody
|
rate
|
'x-slow' | 'slow' | 'medium' | 'fast' | 'x-fast' | 'default' |
NonNegativePercentage
|
ssml:prosody
|
duration
|
TimeValue
|
ssml:prosody
|
volume
|
'silent' | 'x-soft' | 'soft' | 'medium' | 'loud' | 'x-loud' | 'default' |
VolumeExpression
|
ssml:prosody
|
interpret-as
|
'date' | 'time' | 'telephone' | 'characters' | 'cardinal' | 'ordinal'
|
ssml:say-as
|
format
|
text
|
ssml:say-as
|
detail
|
text
|
ssml:say-as
|
alias
|
text
|
ssml:sub
|
uri
|
URI
|
ssml:lexicon
|
type
|
MediaType
|
ssml:lexicon
|
break
element
Controls the pausing or other prosodic boundaries between tokens.
Refer to SSML 1.1 for further information.
Local name |
break
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
strength?,
time?,
xml:id?
|
Default content model |
empty
|
Optionality | This element must not be omitted when activating this module. |
phoneme
element (Phrase)
Provides a phonemic/phonetic pronunciation for the contained text.
The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document.
Refer to SSML 1.1 for further information.
Local name |
phoneme
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
ph,
alphabet?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class |
Phrase.class)+
|
Optionality | This element must not be omitted when activating this module. |
The following model restrictions apply to this element:
The
ssml:phoneme
element must not have
ssml
namespace element or attribute descendants.
The
ssml:phoneme
element must neither be empty nor contain only whitespace.
phoneme
element (Text)
Provides a phonemic/phonetic pronunciation for the contained text.
The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document.
Refer to SSML 1.1 for further information.
Local name |
phoneme
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Text.class
|
Default attribute model |
ph,
alphabet?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class)+
|
Optionality | This element must not be omitted when activating this module. |
prosody
element (Phrase)
Permits control of the pitch, speaking rate and volume of speech output.
Refer to SSML 1.1 for further information.
Local name |
prosody
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
pitch?,
contour?,
range?,
rate?,
duration?,
volume?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
|
Optionality | This element must not be omitted when activating this module. |
The following model restrictions apply to this element:
The
ssml:prosody
element must not have
ssml:prosody
descendants.
The
ssml:prosody
element must neither be empty nor contain only whitespace.
prosody
element (Text)
Permits control of the pitch, speaking rate and volume of speech output.
Refer to SSML 1.1 for further information.
Local name |
prosody
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Text.class
|
Default attribute model |
pitch?,
contour?,
range?,
rate?,
duration?,
volume?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
|
Optionality | This element must not be omitted when activating this module. |
say-as
element (Phrase)
Provides information on the type of text construct contained within the element to help specify the level of detail for rendering the contained text.
Refer to SSML 1.1 for further information.
Local name |
say-as
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
interpret-as,
format?,
detail?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class |
Phrase.class)+
|
Optionality | This element must not be omitted when activating this module. |
The following model restrictions apply to this element:
The
ssml:say-as
element must neither be empty nor contain only whitespace.
say-as
element (Text)
Provides information on the type of text construct contained within the element to help specify the level of detail for rendering the contained text.
Refer to SSML 1.1 for further information.
Local name |
say-as
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Text.extern.class
|
Default attribute model |
interpret-as,
format?,
detail?,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class)+
|
Optionality | This element must not be omitted when activating this module. |
sub
element (Phrase)
Indicates that the text in the alias attribute value replaces the contained text for pronunciation.
Refer to SSML 1.1 for further information.
Local name |
sub
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
alias,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class |
Phrase.class)+
|
Optionality | This element must not be omitted when activating this module. |
The following model restrictions apply to this element:
The
ssml:sub
element must neither be empty nor contain only whitespace.
sub
element (Text)
Indicates that the text in the alias attribute value replaces the contained text for pronunciation.
Refer to SSML 1.1 for further information.
Local name |
sub
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Text.class
|
Default attribute model |
alias,
z3998.Core.attrib,
z3998.I18n.attrib
|
Default content model |
(text |
Text.class)+
|
Optionality | This element must not be omitted when activating this module. |
token
element (Phrase)
Indicates that the content is a token in order to to eliminate token (word) segmentation ambiguities of a synthesis processor.
Refer to SSML 1.1 for further information.
Local name |
token
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Phrase.class
|
Default attribute model |
Phrase.attrib
|
Default content model |
(text |
Text.class)+
|
Optionality | This element must not be omitted when activating this module. |
The following model restrictions apply to this element:
The
ssml:token
element must neither be empty nor contain only whitespace.
token
element (Text)
Indicates that the content is a token in order to to eliminate token (word) segmentation ambiguities of a synthesis processor.
Refer to SSML 1.1 for further information.
Local name |
token
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Text.class
|
Default attribute model |
Text.attrib
|
Default content model |
(text |
Text.class)+
|
Optionality | This element must not be omitted when activating this module. |
lexicon
element
Specifies a reference to a lexicon document.
Refer to SSML 1.1 for further information.
Local name |
lexicon
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context | The lexicon element is allowed in the document
head
.
|
Default attribute model |
uri,
xml:id,
type?
|
Default content model |
empty
|
Optionality | This element must not be omitted when activating this module. |
strength
attribute
Indicates the prosodic strength of the break in the speech output.
Refer to SSML 1.1 for further information.
Local name |
strength
|
---|---|
Namespace | None |
Default usage context |
ssml:break
|
Default value(s) |
'none' | 'x-weak' | 'weak' | 'medium" | 'strong' | 'x-strong'
|
Optionality | This attribute must not be omitted when activating this module. |
time
attribute
Indicates the duration of a pause to be inserted in the output in seconds or milliseconds.
Refer to SSML 1.1 for further information.
Local name |
time
|
---|---|
Namespace | None |
Default usage context |
ssml:break
|
Default value(s) |
TimeValue
|
Optionality | This attribute must not be omitted when activating this module. |
onlangfailure
attribute
Describes the desired behavior of a synthesis processor upon language speaking failure. The value of this attribute is inherited by descendants.
Refer to SSML 1.1 for further information.
Local name |
ssml:onlangfailure
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context |
Document.attrib
,
Phrase.attrib
and
Text.attrib
|
Default value(s) |
'changevoice" | 'ignoretext' | 'ignorelang' | 'processorchoice'
|
Optionality | This attribute must not be omitted when activating this module. |
ph
attribute
Specifies a phonemic/phonetic pronunciation for the text contained in the current element.
Refer to SSML 1.1 for further information.
Local name |
ph
|
---|---|
Namespace | None |
Default usage context |
ssml:phoneme
|
Default value(s) |
PhoneticExpression
|
Optionality | This attribute must not be omitted when activating this module. |
alphabet
attribute
Specifies which phonemic/phonetic pronunciation alphabet is used in the
ph
attribute.
If omitted, the implicit value
x-SAMPA
is assumed.
Refer to SSML 1.1 for further information.
Local name |
alphabet
|
---|---|
Namespace | None |
Default usage context |
ssml:phoneme
|
Default value(s) |
alphabet
|
Optionality | This attribute must not be omitted when activating this module. |
pitch
attribute
Specifies the baseline pitch for the contained text.
The labels
x-low
through
x-high
represent a sequence of monotonically non-decreasing pitch levels.
Refer to SSML 1.1 for further information.
Local name |
pitch
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' |
RelativeChange |
PitchExpression
|
Optionality | This attribute must not be omitted when activating this module. |
contour
attribute
Sets the pitch contour for the contained text.
Refer to SSML 1.1 for further information.
Local name |
contour
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
PitchContour
|
Optionality | This attribute must not be omitted when activating this module. |
range
attribute
Specifies the pitch range (variability) for the contained text.
Refer to SSML 1.1 for further information.
Local name |
range
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' |
RelativeChange |
PitchExpression
|
Optionality | This attribute must not be omitted when activating this module. |
rate
attribute
Specifies a change in the speaking rate for the contained text.
The values
x-slow
through
x-fast
represent a sequence of monotonically non-decreasing speaking rates.
Refer to SSML 1.1 for further information.
Local name |
rate
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
'x-slow' | 'slow' | 'medium' | 'fast' | 'x-fast' | 'default' |
NonNegativePercentage
|
Optionality | This attribute must not be omitted when activating this module. |
duration
attribute
Specifies a value in seconds or milliseconds for the desired time to take to read the contained text.
Refer to SSML 1.1 for further information.
Local name |
duration
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
TimeValue
|
Optionality | This attribute must not be omitted when activating this module. |
volume
attribute
Specifies the volume for the contained text.
If omitted, the implicit value
+0.0dB
is assumed.
Refer to SSML 1.1 for further information.
Local name |
volume
|
---|---|
Namespace | None |
Default usage context |
ssml:prosody
|
Default value(s) |
'silent' | 'x-soft' | 'soft' | 'medium' | 'loud' | 'x-loud' | 'default' |
VolumeExpression
|
Optionality | This attribute must not be omitted when activating this module. |
interpret-as
attribute
Indicates the content type of the contained text construct.
Refer to SSML 1.1 for further information.
Local name |
interpret-as
|
---|---|
Namespace | None |
Default usage context |
ssml:say-as
|
Default value(s) |
'date' | 'time' | 'telephone' | 'characters' | 'cardinal' | 'ordinal'
|
Optionality | This attribute must not be omitted when activating this module. |
format
attribute
In addition to interpret-as, provides further hints on the precise formatting of the contained text for content types that may have ambiguous formats.
Refer to SSML 1.1 for further information.
Local name |
format
|
---|---|
Namespace | None |
Default usage context |
ssml:say-as
|
Default value(s) |
text
|
Optionality | This attribute must not be omitted when activating this module. |
detail
attribute
Indicates the level of detail to be read aloud or rendered.
Refer to SSML 1.1 for further information.
Local name |
detail
|
---|---|
Namespace | None |
Default usage context |
ssml:say-as
|
Default value(s) |
text
|
Optionality | This attribute must not be omitted when activating this module. |
alias
attribute
Specifies the string to be spoken instead of the string in the sub element.
Refer to SSML 1.1 for further information.
Local name |
alias
|
---|---|
Namespace | None |
Default usage context |
ssml:sub
|
Default value(s) |
text
|
Optionality | This attribute must not be omitted when activating this module. |
uri
attribute
Identifies the location of the lexicon document.
Refer to SSML 1.1 for further information.
Local name |
uri
|
---|---|
Namespace | None |
Default usage context |
ssml:lexicon
|
Default value(s) |
URI
|
Optionality | This attribute must not be omitted when activating this module. |
type
attribute
Specifies the media type of the lexicon document. The implicit value of this attribute is
application/pls+xml
, the media type associated with the
Pronunciation Lexicon Specification.
Refer to SSML 1.1 for further information.
Local name |
type
|
---|---|
Namespace | None |
Default usage context |
ssml:lexicon
|
Default value(s) |
MediaType
|
Optionality | This attribute must not be omitted when activating this module. |
Schema | Language |
---|---|
ssml-11.rng | RelaxNG |
Activation of this module depends on the Core, datatypes, global-classes, I18n and ssml-datatypes modules also being activated.
The SSML Feature module depends on this module being activated.
Defines an adaption of the
SSML
phoneme
element as an attribute, enabling the provision of pronounciation information on elements that are not in the SSML namespace.
Name | Default values | Default usage context |
---|---|---|
ph
|
PhoneticExpression
|
Phrase.attrib and Text.attrib |
alphabet
|
alphabet
|
On elements where ssml:ph occurs. |
Specifies a phonemic/phonetic pronunciation for the text contained in the current element.
This attribute inherits the semantics of the
ph
attribute on the SSML
ssml:phoneme
element.
Note that this attribute is namespace qualified and intended for use on non-SSML namespace elements, as opposed to the default (non-qualified)
ph
attribute, which is only allowed on the
ssml:phoneme
element.
Consult Speech Synthesis Markup Language (SSML) Version 1.1 for further information.
Local name |
ph
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context | Phrase.attrib and Text.attrib |
Value(s) |
PhoneticExpression
|
Value alterability | The defined value(s) or datatype(s) are fixed, and must not be altered when activating this module. |
Optionality | This attribute must not be omitted when activating this module. |
The following model restrictions apply to this attribute:
Elements with the
ssml:ph
attribute element must not have
ssml:phoneme
descendants, nor descendants with the
ssml:ph
attribute.
The
ssml:ph
attribute element must neither be empty nor contain only whitespace.
Specifies which phonemic/phonetic pronunciation alphabet is used in the value of the ssml:ph attribute.
Note that this attribute is namespace qualified and intended for use on non-SSML namespace elements in conjunction with the ssml:ph attribute.
If omitted, the implicit value
x-SAMPA
is assumed.
Consult Speech Synthesis Markup Language (SSML) Version 1.1 for further information.
Local name |
alphabet
|
---|---|
Namespace |
http://www.w3.org/2001/10/synthesis
|
Default usage context | On elements where ssml:ph occurs. |
Value(s) |
alphabet
|
Value alterability | The defined value(s) or datatype(s) are fixed, and must not be altered when activating this module. |
Optionality | This attribute must not be omitted when activating this module. |
Schema | Language |
---|---|
ssml-phoneme-attrib.rng | RelaxNG |
Activation of this module depends on the ssml-datatypes module also being activated.
The SSML Feature module depends on this module being activated.
This module defines a set of datatypes related to SSML
Name | Definition |
---|---|
PhoneticExpression
|
A phonetic or phonemic expression. |
alphabet
|
The name of a pronounciation alphabet. |
RelativeChange
|
A relative change expression, as defined in relative change. |
PitchExpression
|
A number followed by the string 'Hz'. |
PitchContour
|
A pitch contour expression, as defined in pitch contour. |
NonNegativePercentage
|
An unsigned number immediately followed by "%", as defined in Non-negative percentage. |
VolumeExpression
|
A number preceded by "+" or "-" and immediately followed by "dB", as defined in prosody Element. |
Schema | Language |
---|---|
ssml-datatypes.rng | RelaxNG |
The ssml-elements and ssml-ph-attribs modules depends on this module being activated.
Refer to the Z39.98-AI community portal for information on available software tools.
The below list represents the modules at the time of version 1.0 of this feature.
The occurrence of the keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in documentation fields embedded in these modules are to be interpreted as described in RFC2119.