view Side-By-Side changes
(MSML)
SIPPING A. Saleem
Internet Draft Y. Xin
Expires: April 25, 2007 G. Sharratt
Expires: December 24, 2006 Convedia
June 24,
Radisys
October 22, 2006
Media Server Markup Language (MSML)
draft-saleem-msml-01
draft-saleem-msml-02
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 24, 2006. April 25, 2007.
Copyright Notice
Copyright (C) The Internet Society (2006). All Rights Reserved.
Abstract
The Media Server Markup Language (MSML) is used to control and invoke
many different types of services on IP Media Servers. Clients can use
it to define how multimedia sessions interact on a Media Server and
to apply services to individuals or groups of users. MSML can be
used, for example, to control Media Server conferencing features such
as video layout and audio mixing, create sidebar conferences or
Saleem & Sharratt
Saleem, et al. Expires - December 2006 April 2007 [Page 1]
Internet-draft Media Server Markup Language June October 2006
(MSML)
personal mixes, and set the properties of media streams. As well,
clients can use MSML to define media processing dialogs, which may be
used as parts of application interactions with users or conferences.
Transformation of media streams to and from users or conferences as
well as IVR dialogs are examples of such interactions, which are
specified using MSML. MSML clients may also invoke dialogs with
individual users or with groups of conference participants using
VoiceXML.
Table of Contents
1. Introduction...................................................5 Introduction...................................................7
2. Conventions used in this document..............................6 document..............................8
3. Glossary.......................................................6 Glossary.......................................................8
4. MSML SIP Usage.................................................7 Usage.................................................9
4.1 SIP INFO..................................................10
4.2 SIP Control Framework.....................................11
4.2.1 Control Framework Package Names......................12
4.2.2 Control Framework Messages...........................13
4.2.3 Common XML Support...................................17
4.2.4 Control Message Body.................................18
4.2.5 REPORT Message Body..................................18
5. Language Structure.............................................9 Structure............................................18
5.1 Package Scheme.............................................9 Scheme............................................18
5.2 Profile Scheme............................................12 Scheme............................................22
6. Execution Flow................................................13 Flow................................................22
7. Media Server Object Model.....................................14 Model.....................................24
7.1 Objects...................................................15 Objects...................................................24
7.2 Identifiers...............................................17 Identifiers...............................................27
8. MSML Core Package.............................................19 Package.............................................29
8.1 <msml>....................................................20 <msml>....................................................29
8.2 <send>....................................................20 <send>....................................................30
8.3 <result>..................................................21 <result>..................................................30
8.4 <event>...................................................21 <event>...................................................31
9. MSML Conference Core Package..................................22 Package..................................31
9.1 Conferences...............................................22 Conferences...............................................31
9.2 Media Streams.............................................23 Streams.............................................32
9.3 <createconference>........................................24 <createconference>........................................34
9.3.1 <reserve>............................................26 <reserve>............................................35
9.3.1.1 <resource>......................................26 <resource>......................................36
9.4 <modifyconference>........................................27 <modifyconference>........................................36
9.5 <destroyconference>.......................................28 <destroyconference>.......................................38
9.6 <audiomix>................................................29 <audiomix>................................................38
9.6.1 <n-loudest>..........................................29 <n-loudest>..........................................39
9.6.2 <asn>................................................30 <asn>................................................39
9.7 <videolayout>.............................................30
9.7.1 <root>...............................................31
9.7.2 <region>.............................................31
9.7.3 <selector>...........................................35
9.7.3.1 <vas> Voice Activate Switching..................36
9.8 <join>....................................................37
9.9 <modifystream>............................................39
9.10 <unjoin>.................................................40
Saleem & Sharratt <videolayout>.............................................40
Saleem, et al. Expires - December 2006 April 2007 [Page 2]
Internet-draft Media Server Markup Language June October 2006
(MSML)
9.7.1 <root>...............................................40
9.7.2 <region>.............................................41
9.7.3 <selector>...........................................44
9.7.3.1 Voice Activate Switching (vas)..................46
9.8 <join>....................................................46
9.9 <modifystream>............................................48
9.10 <unjoin>.................................................49
9.11 <monitor>................................................41 <monitor>................................................50
9.12 <stream>.................................................41 <stream>.................................................51
9.12.1 Audio Stream Properties.............................42 Properties.............................52
9.12.1.1 <gain>.........................................43 <gain>.........................................52
9.12.1.2 <clamp>........................................43 <clamp>........................................53
9.12.2 Video Stream Properties.............................43 Properties.............................53
9.12.2.1 <visual>.......................................44 <visual>.......................................53
10. MSML Dialog Packages.........................................44 Packages.........................................54
10.1 Overview.................................................44 Overview.................................................54
10.2 Primitives...............................................46 Primitives...............................................56
10.3 Events...................................................48 Events...................................................57
10.4 MSML Dialog Usage with SIP...............................49 SIP...............................58
10.5 MSML Dialog Structure and Modularity.....................51 Modularity.....................60
10.6 MSML Dialog Core Package.................................51 Package.................................61
10.6.1 <dialogstart>.......................................52 <dialogstart>.......................................61
10.6.2 <dialogend>.........................................54 <dialogend>.........................................63
10.6.3 <send>..............................................55 <send>..............................................64
10.6.4 <exit>..............................................55 <exit>..............................................65
10.6.5 <disconnect>........................................56 <disconnect>........................................65
10.7 MSML Dialog Base Package.................................56 Package.................................65
10.7.1 <play>..............................................56 <play>..............................................66
10.7.1.1 <audio>........................................58 <audio>........................................68
10.7.1.2 <video>........................................59 <video>........................................69
10.7.1.3 <media>........................................61 <media>........................................70
10.7.1.4 <var>..........................................61 <var>..........................................70
10.7.1.5 <playexit>.....................................62 <playexit>.....................................71
10.7.2 <dtmfgen>...........................................62 <dtmfgen>...........................................71
10.7.2.1 <dtmfgenexit>..................................63 <dtmfgenexit>..................................72
10.7.3 <tonegen>...........................................63 <tonegen>...........................................72
10.7.3.1 <tone>.........................................63 <tone>.........................................73
10.7.3.2 <silence>......................................64 <silence>......................................74
10.7.3.3 <tonegenexit>..................................65 <tonegenexit>..................................74
10.7.4 <record>............................................65 <record>............................................74
10.7.4.1 <play>.........................................69 <play>.........................................78
10.7.4.2 <tonegen>......................................69 <tonegen>......................................78
10.7.4.3 <recordexit>...................................69 <recordexit>...................................79
10.7.5 <dtmf> or <collect>.................................69 <collect>.................................79
10.7.5.1 <play>.........................................71 <play>.........................................81
10.7.5.2 <pattern>......................................72 <pattern>......................................81
10.7.5.3 <detect>.......................................72 <detect>.......................................82
10.7.5.4 <noinput>......................................72 <noinput>......................................82
Saleem, et al. Expires - April 2007 [Page 3]
Internet-draft Media Server Markup Language October 2006
(MSML)
10.7.5.5 <nomatch>......................................73 <nomatch>......................................82
10.7.5.6 <dtmfexit>.....................................73 <dtmfexit>.....................................82
10.7.6 <moml>..............................................73 <moml>..............................................83
10.8 MSML Dialog Group Package................................74 Package................................83
10.8.1 <group>.............................................77 <group>.............................................86
10.8.2 <groupexit>.........................................77 <groupexit>.........................................87
10.9 MSML Dialog Transform Package............................77
Saleem & Sharratt Expires - December 2006 [Page 3]
Internet-draft Media Server Markup Language June 2006
(MSML) Package............................87
10.9.1 <vad>...............................................78 <vad>...............................................87
10.9.1.1 <voice>, <silence>, <tvoice>, <tsilence>.......78 <tsilence>.......88
10.9.2 <gain>..............................................79 <gain>..............................................88
10.9.3 <agc>...............................................79 <agc>...............................................89
10.9.4 <gate>..............................................80 <gate>..............................................89
10.9.5 <clamp>.............................................80 <clamp>.............................................90
10.9.6 <relay>.............................................80 <relay>.............................................90
10.10 MSML Dialog Speech Package..............................81 Package..............................90
10.10.1 <speech>...........................................81 <speech>...........................................90
10.10.1.1 <grammar>.....................................83 <grammar>.....................................92
10.10.1.2 <match>.......................................83 <match>.......................................92
10.10.1.3 <noinput>.....................................83 <noinput>.....................................92
10.10.1.4 <nomatch>.....................................83 <nomatch>.....................................93
10.10.1.5 <speechexit>..................................84 <speechexit>..................................93
10.10.2 <play>.............................................84 <play>.............................................93
10.10.2.1 <tts>.........................................84 <tts>.........................................93
10.11 MSML Dialog Fax Detection Package.......................84 Package.......................94
10.11.1 <faxdetect>........................................85 <faxdetect>........................................94
10.12 MSML Dialog Fax Send/Receive Package....................85 Package....................94
10.12.1 <faxsend>..........................................85 <faxsend>..........................................94
10.12.1.1 <sendobj>.....................................87 <sendobj>.....................................96
10.12.1.2 <hdrfooter>...................................87 <hdrfooter>...................................97
10.12.1.3 <rxpoll>......................................88 <rxpoll>......................................98
10.12.1.4 <faxstart>....................................89 <faxstart>....................................98
10.12.1.5 <faxnegotiate>................................89 <faxnegotiate>................................98
10.12.1.6 <faxpagedone>.................................89 <faxpagedone>.................................99
10.12.1.7 <faxobjectdone>...............................89 <faxobjectdone>...............................99
10.12.1.8 <faxopcomplete>...............................90 <faxopcomplete>...............................99
10.12.1.9 <faxpollstarted>..............................90 <faxpollstarted>..............................99
10.12.2 <faxrcv>...........................................90 <faxrcv>..........................................100
10.12.2.1 <rcvobj>......................................92 <rcvobj>.....................................101
10.12.2.2 <txpoll>......................................92 <txpoll>.....................................101
11. Response Codes...............................................92 MSML Audit Package..........................................102
11.1 MSML Audit Core Package.................................102
11.1.1 <audit>............................................102
11.1.2 <auditresult>......................................103
11.2 MSML Audit Conference Package...........................103
11.2.1 State Parameters...................................103
11.2.2 <auditresult>......................................104
11.2.2.1 confconfig....................................104
11.2.2.2 confconfig.audiomix.asn.......................105
Saleem, et al. Expires - April 2007 [Page 4]
Internet-draft Media Server Markup Language October 2006
(MSML)
11.2.2.3 confconfig.audiomix.n-loudest.................105
11.2.2.4 confconfig.videolayout........................105
11.2.2.5 confconfig.videolayout.root...................105
11.2.2.6 confconfig.videolayout.selector...............106
11.2.2.7 confconfig.controller.........................106
11.2.2.8 dialog........................................106
11.2.2.9 stream........................................106
11.3 MSML Audit Connection Package...........................106
11.3.1 State Parameters...................................107
11.3.2 <auditresult>......................................107
11.3.2.1 sipdialog.....................................108
11.3.2.2 sipdialog.localseq............................108
11.3.2.3 sipdialog.remoteseq...........................108
11.3.2.4 sipdialog.localuri............................108
11.3.2.5 sipdialog.remoteuri...........................108
11.3.2.6 sipdialog.remotetarget........................108
11.3.2.7 sipdialog.routeset............................109
11.3.2.8 localsdp......................................109
11.3.2.9 remotesdp.....................................109
11.3.2.10 dialog.......................................109
11.3.2.11 stream.......................................109
11.4 MSML Audit Dialog Package...............................109
11.4.1 State Parameters...................................109
11.4.2 <dialog>...........................................110
11.4.2.1 <duration>....................................110
11.4.2.2 <primitive>...................................110
11.4.2.3 <controller>..................................110
11.5 MSML Audit Stream Package...............................110
11.5.1 State Parameters...................................111
11.5.2 <stream>...........................................111
11.5.2.1 <clamp>.......................................112
11.5.2.2 <gain>........................................112
11.5.2.3 <visual>......................................112
12. Response Codes..............................................112
13. MSML Conference Examples.....................................94
12.1 Examples....................................114
13.1 Establishing a Dial-in Conference........................94
12.2 Conference.......................114
13.2 Example of a Sidebar Audio Conference....................98
12.3 Conference...................118
13.3 Example of Removing a Conference.........................99
12.4 Conference........................119
13.4 Example of Modifying Video Layout.......................100
13. Layout.......................119
14. MSML Dialog Examples........................................101
13.1 Announcement............................................101
13.2 Examples........................................120
14.1 Announcement............................................121
14.2 Voice Mail Retrieval....................................101
13.3 Retrieval....................................121
14.3 Play and Record.........................................102
13.4 Record.........................................122
14.4 Speech Recognition......................................104
13.5 Recognition......................................124
14.5 Play and Collect........................................105
13.6 Collect........................................124
14.6 User Controlled Gain....................................106
14. Change Summary..............................................107 Gain....................................126
15. Future Work.................................................107
16. XML Schema - MSML Core and MSML Conference Core Packages....108
Saleem & Sharratt Audit Examples.........................................127
15.1 Audit All Conferences...................................127
Saleem, et al. Expires - December 2006 April 2007 [Page 4] 5]
Internet-draft Media Server Markup Language June October 2006
(MSML)
16.1 msml.xsd................................................108
16.2 msml-datatypes.xsd......................................117
15.2 Audit Conference Dialogs................................128
15.3 Audit Conference Streams................................128
15.4 Audit All Connections...................................129
15.5 Audit Connection Dialogs................................130
15.6 Audit Connection Streams................................130
15.7 Audit Connection With Selective States..................131
16. Change Summary..............................................132
17. Future Work.................................................133
18. XML Schema - Schema..................................................133
18.1 MSML Dialog/Transform/Speech/Fax Packages......120
17.1 moml.xsd................................................120
17.2 moml-core-module.xsd....................................120
17.3 moml-datatypes.xsd......................................122
17.4 moml-basic-primitives-module.xsd........................123
17.5 moml-group-module.xsd...................................134
17.6 moml-transform-primitives-module.xsd....................135
17.7 moml-speech-module.xsd..................................137
17.8 moml-fax-module.xsd.....................................139
18. Security Considerations.....................................142 Core...............................................135
18.1.1 msml-core.xsd......................................135
18.1.2 msml-core-datatypes.xsd............................136
18.2 MSML Conference Core Package............................139
18.2.1 msml-conf-core.xsd.................................139
18.2.2 msml-conf-core-datatypes.xsd.......................139
18.3 MSML Dialog Packages....................................147
18.3.1 msml-dialog-core.xsd...............................147
18.3.2 msml-dialog-core-datatypes.xsd.....................147
18.3.3 msml-dialog-base.xsd...............................150
18.3.4 msml-dialog-base-datatypes.xsd.....................150
18.3.5 msml-dialog-transform.xsd..........................159
18.3.6 msml-dialog-transform-datatypes.xsd................160
18.3.7 msml-dialog-group.xsd..............................162
18.3.8 msml-dialog-group-datatypes.xsd....................162
18.3.9 msml-dialog-speech.xsd.............................163
18.3.10 msml-dialog-speech-datatypes.xsd..................163
18.3.11 msml-dialog-fax-detect.xsd........................164
18.3.12 msml-dialog-fax-detect-datatypes.xsd..............165
18.3.13 msml-dialog-fax-sendrecv.xsd......................165
18.3.14 msml-dialog-fax-sendrecv-datatypes.xsd............165
18.4 MSML Audit Packages.....................................168
18.4.1 msml-audit-core.xsd................................168
18.4.2 msml-audit-core-datatypes.xsd......................169
18.4.3 msml-audit-conf.xsd................................170
18.4.4 msml-audit-conf-datatypes.xsd......................170
18.4.5 msml-audit-conn.xsd................................171
18.4.6 msml-audit-conn-datatypes.xsd......................171
18.4.7 msml-audit-dialog-datatypes.xsd....................172
18.4.8 msml-audit-stream-datatypes.xsd....................173
19. IANA Considerations.........................................142 Security Considerations.....................................174
20. IANA Considerations.........................................174
21. URN Sub-Namespace Registration..............................142
21. Registration..............................175
22. XML Schema Registration.....................................142
22. References..................................................142
Acknowledgments.................................................144 Registration.....................................175
23. References..................................................175
Acknowledgments.................................................177
Authors' Addresses..............................................145 Addresses..............................................178
Intellectual Property Statement.................................145 Statement.................................178
Copyright Statement.............................................146 Statement.............................................179
Saleem, et al. Expires - April 2007 [Page 6]
Internet-draft Media Server Markup Language October 2006
(MSML)
Disclaimer of Validity..........................................146
Acknowledgement.................................................146 Validity..........................................179
Acknowledgement.................................................179
1.
Introduction
Media servers contain dynamic pools of media resources. Control
Agents and other users of media servers (called media server clients)
can define and create many different services based on how they
configure and use those resources. Often, that configuration and the
ways in which those resources interact will be changed dynamically
over the course of a call, to reflect changes in the way that an
application interacts with a user.
For example, a call may undergo an initial IVR dialog before being
placed into a conference. Calls may be moved from a main conference
to a sidebar conference and then back again. Individual calls may be
directly bridged to create small n-way calls or simple sidebars. None
of these change the SIP [1] dialog or RTP [15] session. Yet these do
affect the media flow and processing internal to the media server.
The Media Server Markup Language (MSML) is an XML [4] language used
to change the flow of and services on media streams within a media
server. It is used to invoke many different types of services on
individual sessions, groups of sessions, and conferences. MSML allows
the creation of conferences, bridging different sessions together,
and bridging sessions into conferences.
Saleem & Sharratt Expires - December 2006 [Page 5]
Internet-draft Media Server Markup Language June 2006
(MSML)
MSML may also be used to create user interaction dialogs and allows
the application of media transforms to media streams. Media
interaction dialogs created using MSML allow construction of IVR
dialog sessions to individual users as well as to groups of users
participating in a conference. Dialogs may also be specified using
other languages, VoiceXML [7], which support complete single-party
application logic to be executed on the Media Server.
MSML is a transport independent language, such that it does not rely
on underlying transport mechanisms and language semantics are
independent of transport. However, SIP is a typical and commonly used
transport mechanism for MSML, invoked using the SIP URI scheme. This
specification defines using MSML Dialogs using SIP as the transport
mechanism.
A network connection may be established with the media server using
SIP. Media received and transmitted on that connection will flow
through different media resources on the media server depending on
the requested service. Basic Network Media Services with SIP [9]
Saleem, et al. Expires - April 2007 [Page 7]
Internet-draft Media Server Markup Language October 2006
(MSML)
defines conventions for associating a basic service with a SIP
Request-URI. MSML allows services to be dynamically applied and
changed by a Control Agent during the lifetime of the SIP dialog.
MSML has been designed to address the control and manipulation of
media processing operations (e.g., announcement, IVR, play and
record, ASR/TTS, fax, video), as well as control and relationships of
media streams (e.g., simple and advanced conferencing). It provides a
general-purpose media server control architecture. MSML can
additionally be used to invoke other more complex IVR languages such
as VoiceXML.
2.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [7].
3.
Glossary
Media Server: a general-purpose platform for executing real-time
media processing tasks. This is a logical function that maps either
to a single physical device or to a portion of a physical device.
Media Server Client: an application which originates MSML requests to
a media server and also referred to as a Control Agent in this
specification.
Saleem & Sharratt Expires - December 2006 [Page 6]
Internet-draft Media Server Markup Language June 2006
(MSML)
Network Connection: a participant that represents the termination on
a media server of one or more RTP [15] sessions (for example audio
and video) associated with a call. Network connections are
established and removed using a session establishment protocol such
as SIP. An instance of a network connection is independent of MSML
processing instructions applied to it.
Dialog: an automated IVR participant. Examples of dialogs may be
announcement players, IVR interfaces, or voice recorders. Dialogs may
be defined in MSML or using VoiceXML [7].
Conference: an intermediary function that provides multimedia mixing
and other advanced conferencing services. This specification
currently considers conferences with audio and/or video media types,
but is extensible to other media types.
Identifier: a name that is used to refer to a specific instance of an
object on the media server, such as a conference or a dialog.
Identifiers are composed of one or more terms where each term
identifies an object class and instance.
Saleem, et al. Expires - April 2007 [Page 8]
Internet-draft Media Server Markup Language October 2006
(MSML)
Object: the generic term for a media server entity that terminates,
originates, or processes media. This specification defines four
classes of objects and specifies mechanisms to create them, join them
together, and destroy them.
Participant Object: an object in a media server that sources original
media in a call and/or receives and terminates media in a call.
Intermediary Object: an object in a media server that acts on media
within a call for the benefit of the participants.
Independent Object: an object that can exist on a media server
independent of other objects.
Operator: an intermediary object class that modifies or transforms a
media stream. Examples of operators may be audio gain controls, video
scaling, or voice masking. MSML defines operators as implicit
objects, which transform media when operations, such as gain control,
are applied to media streams.
Media Stream: a single media flow between two objects. A media stream
has a media type and may be unidirectional or bidirectional.
4.
MSML SIP Usage
SIP is used to create and modify media sessions with a media server
according to the procedures defined in RFC 3261 [1]. Often, SIP third
Saleem & Sharratt Expires - December 2006 [Page 7]
Internet-draft Media Server Markup Language June 2006
(MSML)
party call control [16] will be used to create sessions to a media
server on behalf of end users. MSML is used to define and change the
service which a user connected to a media server will receive. MSML
clients are application servers, softswitches, or other forms of
control agents, and SHOULD have an authorized security relationship
with the media server. MSML itself does not define authorization
mechanisms.
MSML transactions are originated based upon events that occur in the
application domain. These events may be independent from any media or
user interaction. For example, an application may wish to play an
announcement to a conference warning that its scheduled completion
time is approaching. Applications themselves are structured in many
different ways. Their structure and requirements contribute to their
selection of protocols and languages. To accommodate differing
application needs, MSML has been designed to be neutral to other
languages and independent of the transport used to carry it.
Many alternatives exist for a transport mechanism for MSML. There may
MSML language is purposely designed to be one or many transport channels used to carry MSML based upon independent. In
this release of the
requirements specification, SIP INFO [17] and structure of applications. SIP Control
Saleem, et al. Expires - April 2007 [Page 9]
Internet-draft Media Server Markup Language October 2006
(MSML)
Framework [28] have been chosen for transport mechanisms for MSML, as
described in the following sections.
4.1
SIP INFO
SIP INVITE and INFO [17] requests and responses have been chosen MAY be used to carry MSML in this release
of the specification.
MSML. INFO requests allow asynchronous mid-call messages within SIP
with few additional semantics. In addition, there are existing widely
deployed implementations of that method, it aids in initial
developments which are closely coupled with SIP session
establishment, and it allows MSML to be directly associated with user
dialogs when third party call control is used.
Although INFO is sometimes considered to not be a suitable general-
purpose transport mechanism for messages within SIP, there have been
proposals to make it more acceptable. MSML may evolve to include
other SIP usage and/or to work with other protocols or as a stand-
alone protocol established through SIP, in future releases of this
document.
MSML supports several models for client interaction. When clients use
3PCC to establish media sessions on behalf of end users, clients will
have a SIP dialog for each media session. MSML MAY be sent on these
dialogs. However the targets of MSML actions are not inferred from
the session associated with the SIP dialog. The targets of MSML
actions are always explicitly specified using identifiers as
previously defined.
An application, after interacting with a user, may want to affect
multiple objects within a media server. For example, tones or
messages are often played to a conference when connections are added
or removed. A separate message may also be played to a participant as
Saleem & Sharratt Expires - December 2006 [Page 8]
Internet-draft Media Server Markup Language June 2006
(MSML)
they are joined, or to moderators. Explicit identifiers, that is, not
inferred from a transport mechanism allow these multiple actions to
be easily grouped into a single transaction sent on any SIP dialog.
MSML also supports a model of dedicated control associations. This
supports decoupled application architectures where a client can
control media server services without also establishing all of the
media sessions itself. Control associations are created using SIP but
they do not have any associated media session. Although initially
INFO messages will be sent on this SIP dialog, just as with dialogs
associated with media sessions, it is possible that in the future,
the SIP dialog will be used to establish a separate control session
(defined in SDP [18]) that does not use SIP as the transport for MSML
messages.
Saleem, et al. Expires - April 2007 [Page 10]
Internet-draft Media Server Markup Language October 2006
(MSML)
A media server using MSML also sends asynchronous events to a client
using MSML scripts in SIP INFO. Events are sent based on previous
MSML requests and are sent within the SIP dialog on which the MSML
request that caused the event to be generated was received. If this
dialog no longer exists when the event is generated, the event is
discarded.
Events may be generated during the execution of a dialog created by a
<dialogstart> element. For example, dialogs can send events based on
user input. VoiceXML dialogs, on the other hand, generally interact
with other servers outside of MSML using HTTP.
An event is also generated when the execution of a dialog terminates,
either because of completion or failure. The exact information
returned is dependent on the dialog language, the capabilities of the
dialog execution environment, and what was requested by the dialog.
Both MSML and VoiceXML [7] allow information to be returned when they
exit. These events may be sent in a SIP INFO or a SIP BYE. SIP BYE is
used when the dialog itself specifies that the connection should be
disconnected, for example through the use of the <disconnect>
element.
Conferences may also generate events based upon their configuration.
An example of this is the notification of the set of active speakers.
5. Language Structure
5.1 Package Scheme
4.2
SIP Control Framework
The primary SIP Control Framework [28] MAY be used as a transport mechanism
for extending MSML is the "package". A package
is an integrated set of one or more XML schemas that define
additional features MSML.
The Control Framework provides a generic approach for establishment
and functions via new or extended use reporting capabilities of elements remotely initiated commands. The
framework utilizes many functions provided by the Session Initiation
Protocol [1] (SIP) for the rendezvous and attributes. Each package, except establishment of a reliable
channel for those defined in control interactions. Compared to SIP INFO, the current
Saleem & Sharratt Expires - December 2006 [Page 9]
Internet-draft Media Server Markup Language June 2006
(MSML)
document, SIP
Control Framework is defined in a separate standards document, e.g., an
Internet Draft or an RFC. All packages, that extend more general purpose transport mechanism and
one which is not constrained by limitations of the base MSML
functionality, MUST include references to SIP INFO
mechanism.
The Control Framework also introduces the MSML base set concept of
schemas provided in the Internet drafts. A schema in a package MUST
only extend MSML, this is, it must not alter Control
Package, which is an explicit usage of the existing
specification.
A Control Framework for a
particular interaction set. This specification has already specified
a list of packages for MSML script will include references to all the schemas
defining control the packages whose elements and attributes it makes use of.
A particular script MUST reference MSML base Media Server in many
aspects, including basic dialog, advanced conferencing, advanced
dialog and optionally extension
package(s). See IANA Considerations section. audit service. Each package MUST define its own namespace so that elements or
attributes with the same name in different of these packages do not conflict.
A script using has a particular element or attribute MUST prefix the
namespace name on that element or attribute's unique Control
Package name if it is defined assigned in a package (as opposed order for MSML to being defined in be used with the base).
MSML consists of a core package which provides structure without
support Control
Framework.
Saleem, et al. Expires - April 2007 [Page 11]
Internet-draft Media Server Markup Language October 2006
(MSML)
This section fulfills the mandatory requirement for any specific feature set. Additional packages, relying on information that
MUST be specified during the core package, provide functional features. Any combination definition of
additional packages may be used along with a Control Framework
Package, as detailed in SIP Control Framework [28].
4.2.1
Control Framework Package Names
The Control Framework [28] requires a Control Package definition to
specify and register a unique name.
MSML specification defines Control Package names using a hierarchical
scheme to indicate the core package. inherited relationship across packages. For
example, package "msml-x" is derived from package "msml", and package
"msml-x-y" is derived from package "msml-x".
The following describes the set is a list of Control Package names reserved by the MSML packages defined
specification.
"msml": this Control Package supports MSML Core package as
specified in section 8.
"msml-conf": this
document.
+----------------------------------------------+
| Control Package supports MSML Conference
Core |
+----------------------------------------------+
/ \
+--------+ +--------+
| package as specified in section 9.
"msml-dialog": this Control Package supports MSML Dialog | | Conf |
| Core | |
Core |
- - +--------+ - - - - - - - - +--------+
/ / \ \ \
+--------+ +--------+ +---------+ +------+ +------+
| Dialog | | package as specified in section 10.6.
"msml-dialog-base": this Control Package supports MSML
Dialog | |Dialog | |Dialog| |Dialog|
| Base | | package as specified in section 10.7.
"msml-dialog-transform": this Control Package supports MSML
Dialog Transform package as specified in section
10.9.
"msml-dialog-group": this Control Package supports MSML
Dialog Group | |Transform| |Speech| |Fax |
+--------+ +--------+ +---------+ +------+ +------+
o package as specified in section 10.8.
"msml-dialog-speech": this Control Package supports MSML Core
Dialog Speech package (Mandatory)
Describes the minimum base framework which MUST be implemented
to support additional core packages.
o as specified in section
10.10.
"msml-dialog-fax-detect": this Control Package supports MSML Conference Core
Dialog Fax Detection package (Conditionally Mandatory, for
Conferencing)
Saleem & Sharratt as specified in
section 10.11.
"msml-dialog-fax-sendrecv": this Control Package supports
MSML Dialog Fax Send/Receive package as specified
in section 10.12.
Saleem, et al. Expires - December 2006 April 2007 [Page 10] 12]
Internet-draft Media Server Markup Language June October 2006
(MSML)
Describes the audio and multimedia basic and advanced
conferencing package, which MAY be implemented.
o
"msml-audit": this Control Package supports MSML Dialog Audit Core package (Conditionally Mandatory, for Dialogs)
Describes the dialog core package which MUST be implemented for
any dialog services. However, systems supporting conferencing
only, MAY omit support for
Package as specified in section 11.1.
"msml-audit-conf": this Control Package supports MSML dialogs. The Audit
Conference Package as specified in section 11.2.
"msml-audit-conn": this Control Package supports MSML dialog core Audit
Connection Package as specified in section 11.3.
"msml-audit-dialog": this Control Package supports MSML
Audit Dialog Package as specified in section 11.4.
"msml-audit-stream": this Control Package supports MSML
Audit Stream Package as specified in section 11.5.
An Application Server using the Control Framework as transport for
MSML, MUST use one or multiple package specifies names, depending on the framework within which additional dialog
packages are supported.
service required from the Media Server. The MSML dialog base package MUST be
supported, while all other name(s) are
identified in the “Control-Packages” SIP header that is present in
the SIP INVITE dialog packages request that creates the control channel, as
specified in [28]. The “Control-Packages” value MAY be supported.
o MSML Dialog Base package (Conditionally Mandatory, for
Dialogs)
o MSML Dialog Group package (Optional)
o MSML Dialog Transform package (Optional)
o MSML Dialog Fax Detection package (Optional)
o MSML Dialog Fax Send/Receive package (Optional)
o MSML Dialog Speech package (Optional) re-negotiated
via the SIP re-INVITE mechanism.
4.2.2
Control Framework Messages
The formal process for defining extensions to usage of CONTROL, response and REPORT messages, as defined in
[28], by each Control Package defined in MSML Dialogs is to
define a new package. The new package MUST provide a text description
of what extensions are included different and how they work. It MUST also
define an XML schema file (if applicable) that defines
described separately in the new
package (which following sections.
MSML Core Package "msml"
The Application Server may be through extension, restriction of an existing
package, or send CONTROL message with a specific profile body of an existing package). Dependencies
upon other packages MUST be stated. For example a package that
extends or restricts has a dependency on
MSML request using following elements to the original package
specification. Finally, MS:
<msml>: the new package MUST be assigned root element that may contain a unique
name and version.
The types list of things child
elements which can be request a specific operation. The child
elements are defined in new extended packages are:
o new primitives
o extensions (eg. "msml-conf" and
"msml-dialog"). This element is also the root element which
contains MSML result and event.
<send>: sends an event to existing primitives (events, shadow variables,
attributes, content)
o new recognition grammars for existing primitives
o new markup languages for speech generation
o languages for specifying the specified recipient within the
Media Server. Specific event types are defined within the
extended packages.
The Media Server replies with a topology schema
Saleem & Sharratt response message containing a
MSML result using the following elements:
<result>: reports the results of an MSML transaction.
Saleem, et al. Expires - December 2006 April 2007 [Page 11] 13]
Internet-draft Media Server Markup Language June October 2006
(MSML)
o new pre-defined topology schemas
o new variables / segment types (sets & languages)
o new control flow elements
The Media Server MAY send MSML Packages are assembled together event to form the Application
Server, in a specific MSML profile
that is shared between different implementations. REPORT or CONTROL message, using element <event>.
The base MSML
Dialog profiles actual content of the <event> and which are Control Framework
message to use is defined in this document consist of within the extended packages.
MSML Conference Core package, Package "msml-conf"
This package extends the MSML Dialog Core package, MSML Dialog Base
package, MSML Dialog Group package, MSML Transform package, MSML Fax
packages, and the MSML Speech package.
MSML extension packages, which define primitives, MUST Package to define the
following a
framework for each primitive within the package:
o the function which the primitive performs
o the attributes which may be used to tailor its behavior
o the events which it is capable creation, manipulation and deletion of understanding
o the shadow variables which provide access to information
determined as a result
conference.
AS can send CONTROL message with a body of the primitive's operation.
The mechanism used MSML request which
contains one or multiple conference related commands to insure that MS. MS
then replies with a media server and its client share response message with a compatible set body of MSML
result to indicate if the request has been fulfilled or not.
During the lifetime of packages is not defined. Currently it is expected
that provisioning will be used, possibly coupled with a future
auditing capability. Additionally, when used in SIP networks,
packages could be defined using feature tags and conference, whenever an event occurs,
the procedures
defined for Indicating User Agent Capabilities in SIP [2] used Media Server MAY send CONTROL messages containing MSML
events to
allow notify the Application Server. The Application
Server SHOULD reply with a media server to describe its capabilities to other user
agents and its domain registrar.
5.2 Profile Scheme
Not all devices and applications using response message with no MSML will need body
to support acknowledge the event has been received.
This package does NOT use the REPORT message.
Dialog Core Package "msml-dialog"
This package extends the
entire MSML schema. For example, a media processing device might
support only audio announcements, only audio simple conferencing, or
only multimedia IVR. It is highly desirable Core Package to have a system define the
structural framework and abstractions for
describing what portion of MSML dialogs.
The Application Server MAY send CONTROL messages containing a particular
MSML request using following elements:
<dialogstart>: instantiate an MSML media processing device dialog on a
connection or Control Agent supports.
The Package scheme described earlier allows a conference.
<dialogend>: terminates a MSML functionality dialog.
<send>: sends an event and an optional namelist to be
functionally grouped, relying on the MSML core package. This scheme
allows a portion dialog,
dialog group, or dialog primitive.
<exit>: used by the dialog description language to cause the
execution of the complete MSML specification dialog to be
implemented, on a per package basis and also creates a framework for
future extension packages. However, within terminate.
For the <dialogstart> command, the response message MUST
contain a given package, in some
Saleem & Sharratt MSML result which indicates that the dialog has been
started successfully. The MSML result MAY contain <dialogid>
to return dialog identifier, if the identifiers was assigned
Saleem, et al. Expires - December 2006 April 2007 [Page 12] 14]
Internet-draft Media Server Markup Language June October 2006
(MSML)
cases, only a subset of
by the package functionality may Media Server. Subsequently, zero of more MSML events
MAY be required. In
order initiated by the Media Server in (update) REPORT
messages to support subsets of packages, with greater degree of
granularity than at report information gathered during the package level, dialog.
Finally, a profile scheme is required. MSML package profiles would identify a subset of event "msml.dialog.exit" SHOULD be generated
in a given (terminate) REPORT message when the dialog terminates
(eg. MSML package
with specific definitions execution of elements <exit>).
For the <dialogend> and attributes. Each <send> commands, the response message
contains the final MSML
package profile MUST be accompanied by one result which indicates that the
request has either been fulfilled or more corresponding
schemas. To use rejected.
Dialog Base Package "msml-dialog-base"
This package extends the examples above, there could MSML Dialog Core Package to define a
set of base functionality for MSML dialogs. The extension
defines individual media primitives, including <play>,
<dtmfgen>, <tonegen>, <record>, <dtmf> and <collect>, to be an audio
announcements profile
used as child element of <dialogstart>. This package does not
change the framework message usage as defined by the MSML
Dialog Base package, an audio
simple conferencing profile of Core Package.
Dialog Transform Package "msml-dialog-transform"
This package extends the MSML Conference Dialog Core package, and Package to define a multimedia IVR profile
set of transform primitives which works as filter on half
duplex media streams. The extension defines transform
primitives, including <vad>, <gain>, <agc>, <gate>, <clamp>
and <relay>, which MAY be used as child elements of
<dialogstart>. This package does not change the framework
message usage as defined by the MSML Dialog Base package.
MSML Core Package.
Dialog Group Package "msml-dialog-group"
This package profiles MUST be published separately from extends the MSML
specification, in one or more standards documents (e.g., Internet
Drafts or RFCs) dedicated to MSML package profiles. Profiles would
not be registered with IANA Dialog Core, Base and any organization would additionally
be free Transform
Packages to create its own profile(s) if required.
6. Execution Flow
MSML assumes a model where there is define a single control context within flow construct that
specifies concurrent execution of multiple media primitives.
The extension defines the <group> element which MAY be used as
a child element of <dialogstart> to enclose multiple media server for MSML processing. That context may have one or many
SIP [1] dialogs associated with it. It is assumed
primitives, such that any SIP
dialogs associated with they can be executed concurrently. This
package does not change the MSML control context have been
authorized, framework message usage as appropriate, defined
by mechanisms outside the scope of MSML.
A media server control context maintains information about MSML Dialog Core Package.
Dialog Speech Package "msml-dialog-speech"
Saleem, et al. Expires - April 2007 [Page 15]
Internet-draft Media Server Markup Language October 2006
(MSML)
This package extends the state
of all media objects and media streams within a media server. It
receives and processes all MSML requests from authorized SIP dialogs
and receives all events generated internally by media objects Dialog Core and
sends them on the appropriate SIP dialog. An MSML request is able Base
Package to
create new media objects and streams, define functionality which MAY be used for
automatic speech recognition and to modify or destroy any
existing media objects text-to-speech. The extension
extends the <dialogstart> and streams.
An MSML request may simply specify a single action for a media server
to undertake. In this case, the document is very similar to a simple
command request. Often, though, <play> elements.
For <dialogstart>, it may be more natural for defines a client new child element <speech> to request multiple actions at one time,
activate grammars or the client would like
several actions user input rules associated with speech
recognition. For <play>, it defines a new child element <tts>
to be closely coordinated initiate the text-to-speech service.
This package does not change the framework message usage as
defined by the media server.
Multiple MSML elements received in a single request MUST be processed
sequentially in document order.
An example of Dialog Core Package.
Dialog Fax Detection Package "msml-dialog-fax-detect"
This package extends the first scenario would be MSML Dialog Core Package to create define
primitives provide fax detection service. The extension
defines a conference and
join it with an initial participant. An example primitive <faxdetect> to be used as a child element
of <dialogstart>. This package does not change the second case
would be framework
message usage as defined by the MSML Dialog Core Package.
Dialog Fax Send/Receive Package "msml-dialog-fax-sendrecv"
This package extends the MSML Dialog Core Package to unjoin one or more participants from define
primitives which allow a main conference media server to provide fax send or
receive service. The extension defines new primitives
<faxsend> and join them <faxrcv>, to a sidebar conference. In the first scenario, network
latencies may not be an issue, but it is simpler for used as child element of
<dialogstart>. This package does not change the client framework
message usage as defined in MSML Dialog Core Package.
Dialog Audit Core Package "msml-audit"
This package extends the MSML Core Package to
combine define a
framework for auditing media resource(s) allocated on the
Media Server.
This package follows a simple request/response transaction,
allowing the Application Server to send CONTROL messages
containing MSML <audit> requests. In The Media Server MUST reply
with a response message containing the second case, result. The result is
contained within the added network latency
Saleem & Sharratt <auditresult> element, returning the
queried state information.
Saleem, et al. Expires - December 2006 April 2007 [Page 13] 16]
Internet-draft Media Server Markup Language June October 2006
(MSML)
between separate requests could mean perceptible audio loss to
This package does NOT use the
participant.
Each MSML request is processed as a single transaction. A media
server MUST ensure that it has REPORT message.
Dialog Audit Conference Package "msml-audit-conf"
This package extends the necessary resources available MSML Audit Core Package to
carry out the complete transaction before executing any elements of define
conference specific states which MAY be queried via the request. If it does not have sufficient resources, it MUST return
a 520 response
<audit> command and MUST NOT execute the transaction.
The MSML request corresponding response MUST be checked for well-formedness and validated
against
returned by the schema prior to executing any elements. <auditresult> element. This allows XML
[4] errors package does not
change the framework message usage as defined by the MSML
Audit Core Package.
Dialog Audit Connection Package "msml-audit-conn"
This package extends the MSML Audit Core Package to reported immediately and minimizes failures within a
transaction define
connection specific states which MAY be queried via the
<audit> command and the corresponding execution of only part of response MUST be
returned by the
transaction.
Each element is expected to execute immediately. Elements such <auditresult> element. This package does not
change the framework message usage as
<dialogstart>, which take an unpredictable amount of time, are
"forked" and executed in a separate thread (see defined by the MSML
Audit Core Package.
Dialog
packages). Once successfully forked, execution continues with the
element following Audit Dialog Package "msml-audit-dialog"
This package extends the </dialogstart>. As such, MSML does not provide
mechanisms Audit Core Package to sequence or coordinate other operations with define
dialog
elements.
Processing within a transaction specific states which MAY be queried via the <audit>
command and the corresponding response MUST stop if any errors occur.
Elements that were executed prior to be returned by the error are
<auditresult> element. This package does not rolled back.
It is the responsibility of change the client to determine appropriate
actions based upon
framework message usage as defined by the results indicated in MSML Audit Core
Package.
Dialog Audit Stream Package "msml-audit-stream"
This package extends the response. Most
elements MSML Audit Core Package to define
stream specific states which MAY contain an optional "mark" attribute. The value of that
attribute from the last successfully executed element MUST be
returned in an error response. Note that errors that occur during queried via the
execution of a dialog occur outside <audit>
command and the context of an MSML
transaction. These errors will be indicated in an asynchronous event.
Transaction results are corresponding response MUST returned as part of by the SIP request response.
The transaction results indicate
<auditresult> element. This package does not change the success or failure of
framework message usage as defined by the
transaction. MSML Audit Core
Package.
4.2.3
Common XML Support
The result XML schema described in [28] MUST also include identifiers for any objects
created be supported by a media server for which the client did not provide an
instance name. Additionally, if the transaction fails, the reason for all Control
Packages defined by MSML. However, the failure "connection-id" value MUST be returned, as well
constructed as an indication of how much of
the transaction was executed before defined by MSML (i.e. the failure occurred SHOULD be
returned.
7. Media Server Object Model
Media servers are general-purpose platforms for executing real-time
media processing tasks. These tasks range in complexity from simple
ones such as serving announcements, to complex ones, such as speech
Saleem & Sharratt identifier MUST contain
Saleem, et al. Expires - December 2006 April 2007 [Page 14] 17]
Internet-draft Media Server Markup Language June October 2006
(MSML)
interfaces, centralized multimedia conferencing, and sophisticated
gaming applications.
Calls are established to a media server using SIP. Clients will often
use
local dialog tag only, while the SIP third party call control (3PCC) [16] to establish calls to a
media server on behalf of end users. However MSML does not require
that 3PCC be used; only Control Framework [28] requires
that the client "connection-id" contain both local and remote dialog tags).
4.2.4
Control Message Body
A valid CONTROL body message MUST conform to the media server share a
common identifier MSML schema, as
included in this specification, for the call and its associated RTP [15] sessions.
Objects represent entities which source, sink, or modify media
streams. MSML package(s) used.
4.2.5
REPORT Message Body
A media streams is a bidirectional or unidirectional media
flow between objects on a media server. The following subsections
define the classes of objects that exist on a media server and valid REPORT body message MUST conform to the
way these are identified MSML schema, as
included in MSML.
7.1 Objects this specification, for the MSML package(s) used.
5.
Language Structure
5.1
Package Scheme
The primary mechanism for extending MSML is the "package". A media object package
is an endpoint integrated set of one or more media streams. It may be
a connection that terminates RTP sessions from the network or a
resource XML schemas that transforms define
additional features and functions via new or manipulates media. MSML defines four
classes extended use of media objects. elements
and attributes. Each class defines package, except for those defined in the basic properties of
how object instances are used within current
document, is defined in a media server. However most
classes require separate standards document, e.g., an
Internet Draft or an RFC. All packages, that extend the function of specific instances be defined by base MSML
functionality, MUST include references to the client, using MSML or other languages such as VoiceXML.
The following classes base set of media processing objects are defined. The
class names are given
schemas provided in parentheses:
o network connection (conn)
o conference (conf)
o dialog (dialog)
o operator (oper)
Network connection is an abstraction for the media processing
resources involved Internet drafts. A schema in terminating the RTP session(s) of a call. For
audio services a connection instance presents a full-duplex audio
stream interface within a media server. Multimedia connections have
multiple media streams of different media types, each corresponding
to an RTP session. Network connections get instantiated through SIP
[1]. package MUST
only extend MSML, this is, it must not alter the existing
specification.
A conference represents particular MSML script will include references to all the media resources schemas
defining the packages whose elements and state information
required for attributes it makes use of.
A particular script MUST reference MSML base and optionally extension
package(s). See IANA Considerations section.
Each package MUST define its own namespace so that elements or
attributes with the same name in different packages do not conflict.
A script using a single logical mix of each media type particular element or attribute MUST prefix the
namespace name on that element or attribute's name if it is defined
in a package (as opposed to being defined in the
conference (e.g. audio and video). base).
MSML models multiple mixes/views consists of a core package which provides structure without
support for any specific feature set. Additional packages, relying on
the core package, provide functional features. Any combination of
additional packages may be used along with the same media type as separate conferences. Each conference has
Saleem & Sharratt core package. The
following describes the set of MSML packages defined in this
document.
+--------------------------------------------------------+
Saleem, et al. Expires - December 2006 April 2007 [Page 15] 18]
Internet-draft Media Server Markup Language June October 2006
(MSML)
multiple inputs. Inputs may
| MSML Core |
+--------------------------------------------------------+
/ \ \
+--------+ +--------+ +-------+
| Dialog | | Conf | | Audit |
| Core | | Core | | Core |
+--------+ +--------+ +-------+
________\_______________________________________ |
/ \ \ \ \ \ |
+------+ +---------+ +------+ +------+ +------+ +-------+ |
|Dialog| |Dialog | |Dialog| |Dialog| |Dialog| |Dialog | |
|Base | |Transform| |Group | |Speech| |Fax | |Fax | |
+------+ +---------+ +------+ +------+ |Detect| |Send/ | |
+------+ |Receive| |
+-------+ |
________________________|
/ \ \ \
+-----+ +-----+ +------+ +------+
|Audit| |Audit| |Audit | |Audit |
|Conf | |Conn | |Dialog| |Stream|
+-----+ +-----+ +------+ +------+
o MSML Core package (Mandatory)
Describes the minimum base framework which MUST be divided into classes that allow an
application implemented
to request different media treatment support additional core packages.
Saleem, et al. Expires - April 2007 [Page 19]
Internet-draft Media Server Markup Language October 2006
(MSML)
o MSML Conference Core package (Conditionally Mandatory, for different
participants. For example,
Conferencing)
Describes the video streams audio and multimedia basic and advanced
conferencing package, which MAY be implemented.
o MSML Dialog Core package (Conditionally Mandatory, for some participants
may Dialogs)
Describes the dialog core package which MUST be assigned to fixed regions of implemented for
any dialog services. However, systems supporting conferencing
only, MAY omit support for MSML dialogs. The MSML dialog core
package specifies the screen framework within which additional dialog
packages are supported. The MSML dialog base package MUST be
supported, while those for all other
participants may only dialog packages MAY be shown when they are speaking.
A conference has a single logical output per media type. For each
participant, it consists of the audio conference mix, less any
contributed audio of the participant, and the video mix shared by all
conference participants. Video conferences using voice activated
switching have an optional ability to show supported.
o MSML Dialog Base package (Conditionally Mandatory, for
Dialogs)
o MSML Dialog Group package (Optional)
o MSML Dialog Transform package (Optional)
o MSML Dialog Fax Detection package (Optional)
o MSML Dialog Fax Send/Receive package (Optional)
o MSML Dialog Speech package (Optional)
o MSML Audit Core package (Conditionally Mandatory, for Auditing)
Describes the previous speaker audit core package which MUST be implemented to
the current speaker.
Conferences are instantiated using the <createconference> element.
support auditing services. The content of the <createconference> element MSML audit core package
specifies the
parameters of the audio and/or video mixes.
Dialogs are a class of objects that represent automated participants.
They framework within which additional audit packages
are similar to network connections from a media flow perspective supported.
o MSML Audit Conference package (Conditionally Mandatory, for
Auditing Conference, Conference Dialog and may have one or more media streams as the abstraction Conference Stream)
o MSML Audit Connection package (Conditionally Mandatory, for their
interface within a media server. Unlike connections however, dialogs
are created
Auditing Connection, Connection Dialog and destroyed through MSML, Connection Stream)
o MSML Audit Dialog package (Conditionally Mandatory, for
Auditing Dialog, and the media server itself
implements the dialog participant. Dialogs are instantiated through
the <dialogstart> element. Contents of the <dialogstart> element
define the desired MUST be used with either MSML Audit
Conference Package or expected dialog behavior. Dialogs may also MSML Audit Connection Package)
o MSML Audit Stream package (Conditionally Mandatory, for
Auditing Stream, and MUST be
invoked by referencing VoiceXML as the dialog description language.
Operators are implicit functions that are used to filter with either MSML Audit
Conference Package or transform
a media stream. MSML Audit Connection Package)
Saleem, et al. Expires - April 2007 [Page 20]
Internet-draft Media Server Markup Language October 2006
(MSML)
The function that an instance of an operator fulfills formal process for defining extensions to MSML Dialogs is defined as to
define a property new package. The new package MUST provide a text description
of what extensions are included and how they work. It MUST also
define an XML schema file (if applicable) that defines the media stream. Operators new
package (which may be
unidirectional through extension, restriction of an existing
package, or bidirectional and have a media type. Unidirectional
operators reflect simple atomic functions such as automatic gain
control, filtering tones from conferences, or applying specific gain
values to profile of an existing package). Dependencies
upon other packages MUST be stated. For example a stream. Unidirectional operators have package that
extends or restricts has a single media
input, which is connected to dependency on the media stream from one object, and a
single media output, which is connected to original package
specification. Finally, the media stream of new package MUST be assigned a
different object.
Bidirectional operators have two media inputs and two media outputs.
One media input and output is associated with the stream to one
object and the other input unique
name and output is associated with a stream to
a different object. Bidirectional objects may treat the media
differently in each direction. For example, an operator could version.
The types of things which can be defined which changed the media sent to a connection based upon
recognized speech or DTMF received from the connection. Operators are
implicitly instantiated when streams are created or modified using
the elements <join> element and elements <modifystream> respectively.
Saleem & Sharratt Expires - December 2006 [Page 16]
Internet-draft Media Server Markup Language June 2006
(MSML)
The relationships between the different object classes is shown in
the figure below.
+--------------------------------------+
| Media Server |
| |
|------+ ,---. |
| | +------+ / \ |
<== RTP ==>| conn |<---->| oper |<---->( conf ) |
| | +------+ \ / |
|------+ `---' |
| ^ ^ |
| | | |
| | +------+ +------+ | |
| | | | | | | |
| +-->|dialog| |dialog|<---+ |
| | | | | |
| +------+ +------+ |
+--------------------------------------+
A single, full-duplex instance of each object class is shown together
with common relationships between them. An operator (such as gain) is
shown between a connection and a conference and dialogs are shown
participating both with an individual connection and with a
conference. The figure is not meant to imply only one new packages are:
o new primitives
o extensions to one
relationships. Conferences will often have hundreds of participants,
and either connections or conferences may be interacting with more
than one dialog. For example, one dialog may be recording existing primitives (events, shadow variables,
attributes, content)
o new recognition grammars for existing primitives
o new markup languages for speech generation
o languages for specifying a
conference while other dialogs announce participants joining or
leaving the conference.
7.2 Identifiers
Objects are referenced using identifiers that topology schema
o new pre-defined topology schemas
o new variables / segment types (sets & languages)
o new control flow elements
MSML Packages are composed of one or
more terms. Each term specifies an object class and names assembled together to form a specific
instance within MSML profile
that class. is shared between different implementations. The object class and instance base MSML
Dialog profiles which are
separated by a colon ":" defined in an identifier term.
Identifiers are assigned to objects when they are first created. In
general, either this document consist of the
MSML client or a media server may specify Core package, MSML Dialog Core package, MSML Dialog Base
package, MSML Dialog Group package, MSML Transform package, MSML Fax
packages, and the
instance name for an object. Objects MSML Speech package.
MSML extension packages, which define primitives, MUST define the
following for each primitive within the package:
o the function which a client does not
assign an instance name will the primitive performs
o the attributes which may be assigned one by a media server. Media
server assigned instance names are returned used to tailor its behavior
o the client as a
complete object identifier in events which it is capable of understanding
o the response shadow variables which provide access to information
determined as a result of the request that
created the object.
Saleem & Sharratt primitive's operation.
Saleem, et al. Expires - December 2006 April 2007 [Page 17] 21]
Internet-draft Media Server Markup Language June October 2006
(MSML)
It is meaningful for some classes of objects
The mechanism used to exist independently
on insure that a media server. Network connections may be created through SIP at
any time. MSML can then server and its client share
a compatible set of packages is not defined. Currently it is expected
that provisioning will be used, possibly coupled with a future
auditing capability. Additionally, when used in SIP networks,
packages could be defined using feature tags and the procedures
defined for Indicating User Agent Capabilities in SIP [2] used to associate their
allow a media with other
objects as required server to create services. Conferences may be created describe its capabilities to other user
agents and have specific resources reserved waiting for participant
connections.
Objects from these two classes, connections its domain registrar.
5.2
Profile Scheme
Not all devices and conferences, are
considered independent objects since they can exist on a standalone
basis. Identifiers for independent objects consist of single term as
defined above. applications using MSML will need to support the
entire MSML schema. For example, identifiers for a conference and
connection could be "conf:abc" media processing device might
support only audio announcements, only audio simple conferencing, or "conn:1234" respectively. Clients
which choose to assign instance names to independent objects must use
globally unique instance names. One way to create globally unique
names
only multimedia IVR. It is highly desirable to include the domain name have a system for
describing what portion of MSML a particular media processing device
or Control Agent supports.
The Package scheme described earlier allows MSML functionality to be
functionally grouped, relying on the client as part MSML core package. This scheme
allows a portion of the
name.
Dialogs are created complete MSML specification to provide be
implemented, on a service to independent objects.
Dialogs may act as per package basis and also creates a participant framework for
future extension packages. However, within a given package, in some
cases, only a conference or interact subset of the package functionality may be required. In
order to support subsets of packages, with greater degree of
granularity than at the package level, a
connection similar to profile scheme is required.
MSML package profiles would identify a two participant call. Dialogs depend upon the
existence subset of independent objects a given MSML package
with specific definitions of elements and this is reflected in attributes. Each MSML
package profile MUST be accompanied by one or more corresponding
schemas. To use the
composition examples above, there could be an audio
announcements profile of their identifiers. Operators modify the media flow
between other objects, such as application MSML Dialog Base package, an audio
simple conferencing profile of gain between a
connection the MSML Conference Core package, and
a conference. As operators are merely media transform
primitives defined as properties multimedia IVR profile of the media stream, they are MSML Dialog Base package.
MSML package profiles MUST be published separately from the MSML
specification, in one or more standards documents (e.g., Internet
Drafts or RFCs) dedicated to MSML package profiles. Profiles would
not
represented by identifiers be registered with IANA and created implicitly.
Identifiers for dialogs are composed of any organization would additionally
be free to create its own profile(s) if required.
6.
Execution Flow
MSML assumes a structured list of slash
('/') separated terms. The left-most term of the identifier must
specify model where there is a conference or connection. This serves as the root single control context within a
media server for the
identifier. An example of an identifier for a dialog acting as a
conference participant could be:
conf:abc/dialog:recorder
All objects except connections are created using MSML. Connections
are created when media sessions get established through SIP. There
are several options clients and media servers can use to establish a
shared instance name for a connection and its media streams.
When media servers support multiple media types, the instance name
SHOULD be a call identifier that can be used to identify the
collection of RTP sessions MSML processing. That context may have one or many
SIP [1] dialogs associated with a call. When MSML it. It is used
in conjunction with assumed that any SIP and third party call control, the call
identifier MUST be
dialogs associated with the same MSML control context have been
authorized, as the local tag assigned appropriate, by mechanisms outside the media
server to identify the SIP dialog. This will be the tag the media
server adds to the "To" header in its response to an initial invite
transaction. RFC 3261 requires the tag values to be globally unique.
Saleem & Sharratt scope of MSML.
Saleem, et al. Expires - December 2006 April 2007 [Page 18] 22]
Internet-draft Media Server Markup Language June October 2006
(MSML)
An example of a connection identifier is: conn:74jgd63956ts.
With third party call control,
A media server control context maintains information about the MSML client acts as state
of all media objects and media streams within a back to back
user agent (B2BUA) to establish the media sessions. server. It
receives and processes all MSML requests from authorized SIP dialogs are
established between the client
and the receives all events generated internally by media server allowing the use
of objects and
sends them on the media server local tag as a connection identifier. If third
party call control is not used, a appropriate SIP event package MAY be used to
allow a media server dialog. An MSML request is able to notify
create new sessions media objects and streams, and to modify or destroy any
existing media objects and streams.
An MSML request may simply specify a client that has
subscribed to this information.
Identifiers as described above allow every object in single action for a media server
to be uniquely addressed. They can also be used undertake. In this case, the document is very similar to refer a simple
command request. Often, though, it may be more natural for a client
to request multiple
objects. There are two ways actions at one time, or the client would like
several actions to be closely coordinated by the media server.
Multiple MSML elements received in which this can currently a single request MUST be done:
wildcards
common instance names processed
sequentially in document order.
An identifier can reference multiple objects when example of the first scenario would be to create a wildcard is used
as conference and
join it with an instance name. MSML reserves the instance name comprised initial participant. An example of the second case
would be to unjoin one or more participants from a
single asterisk ('*') main conference
and join them to mean all objects that have a sidebar conference. In the same
identifier root and class. Instance names containing an asterisk
cannot be created. Wildcards MUST only first scenario, network
latencies may not be used as the right most term
of an identifier issue, but it is simpler for the client to
combine the requests. In the second case, the added network latency
between separate requests could mean perceptible audio loss to the
participant.
Each MSML request is processed as a single transaction. A media
server MUST ensure that it has the necessary resources available to
carry out the complete transaction before executing any elements of
the request. If it does not have sufficient resources, it MUST return
a 520 response and MUST NOT be used as part of execute the root for dialog
identifiers. Wildcards are only allowed where explicitly indicated
below. transaction.
The following are examples of valid wildcards:
conf:abc/dialog:*
conn:*
Examples of illegal wildcard usage are:
conf:*/dialog:73849
Although identifiers share a common syntax, MSML elements restrict request MUST be checked for well-formedness and validated
against the class schema prior to executing any elements. This allows XML
[4] errors to reported immediately and minimizes failures within a
transaction and the corresponding execution of objects only part of the
transaction.
Each element is expected to execute immediately. Elements such as
<dialogstart>, which take an unpredictable amount of time, are valid
"forked" and executed in a given context. As an
example, although it is valid to join two connections together, it is
not valid to join two IVR dialogs.
8. separate thread (see MSML Core Package
This section describes Dialog
packages). Once successfully forked, execution continues with the core
element following the </dialogstart>. As such, MSML package which MUST be supported
in order does not provide
mechanisms to use any sequence or coordinate other MSML packages. The core MSML package
defines operations with dialog
elements.
Processing within a framework, without explicit functionality, over which
functional packages transaction MUST stop if any errors occur.
Elements that were executed prior to the error are used.
Saleem & Sharratt not rolled back.
It is the responsibility of the client to determine appropriate
Saleem, et al. Expires - December 2006 April 2007 [Page 19] 23]
Internet-draft Media Server Markup Language June October 2006
(MSML)
8.1 <msml>
<msml> is the root element. When received by a media server, it
defines the set of operations that form a single MSML request.
Operations are requested by
actions based upon the contents of results indicated in the element. Each
operation response. Most
elements MAY appear zero or more times as children contain an optional "mark" attribute. The value of <msml>.
Specific operations are defined within that
attribute from the Conference package and last successfully executed element MUST be
returned in an error response. Note that errors that occur during the set of Dialog packages.
The results
execution of a request or dialog occur outside the contents context of events sent by a media
server are also enclosed within the <msml> element. The an MSML
transaction. These errors will be indicated in an asynchronous event.
Transaction results of
the transaction are included returned as a body in the response to part of the SIP request that contained response.
The transaction results indicate the success or failure of the
transaction. This response will contain
any The result MUST also include identifiers that the media server assigned to newly for any objects
created
objects. All messages that by a media server generates are correlated to for which the client did not provide an object identifier. Objects and identifiers are discussed in
section 7 (Media Server Object Model).
Attributes:
version: "1.1" Mandatory
8.2 <send>
Events are used to affect
instance name. Additionally, if the behavior transaction fails, the reason for
the failure MUST be returned, as well as an indication of different objects within how much of
the transaction was executed before the failure occurred SHOULD be
returned.
7.
Media Server Object Model
Media servers are general-purpose platforms for executing real-time
media processing tasks. These tasks range in complexity from simple
ones such as serving announcements, to complex ones, such as speech
interfaces, centralized multimedia conferencing, and sophisticated
gaming applications.
Calls are established to a media server. The <send> element is used server using SIP. Clients will often
use SIP third party call control (3PCC) [16] to send an event establish calls to the
specified recipient within the Media Server.
Attributes:
event: the name a
media server on behalf of an event. Mandatory.
target: an object identifier. When end users. However MSML does not require
that 3PCC be used; only that the client and the media server share a
common identifier is for the call and its associated RTP [15] sessions.
Objects represent entities which source, sink, or modify media
streams. A media streams is a
dialog, it may optionally be appended with bidirectional or unidirectional media
flow between objects on a slash "/" followed
by media server. The following subsections
define the target to be included in a MSML Dialog <send>.
Mandatory.
valuelist: classes of objects that exist on a list media server and the
way these are identified in MSML.
7.1
Objects
A media object is an endpoint of zero one or more parameters media streams. It may be
a connection that are included
with terminates RTP sessions from the event.
mark: network or a token
resource that can be used to identify execution progress
in the case transforms or manipulates media. MSML defines four
classes of errors. The value media objects. Each class defines the basic properties of
how object instances are used within a media server. However most
classes require that the mark attribute from function of specific instances be defined by
the
last successfully executed client, using MSML element is returned in an error
response. Therefore the value or other languages such as VoiceXML.
The following classes of all mark attributes within an
MSML document should be unique.
Saleem & Sharratt media processing objects are defined. The
class names are given in parentheses:
Saleem, et al. Expires - December 2006 April 2007 [Page 20] 24]
Internet-draft Media Server Markup Language June October 2006
(MSML)
8.3 <result>
The <result> element
o network connection (conn)
o conference (conf)
o dialog (dialog)
o operator (oper)
Network connection is used to report the results of an MSML
transaction. It is included as a body abstraction for the media processing
resources involved in terminating the final response RTP session(s) of a call. For
audio services a connection instance presents a full-duplex audio
stream interface within a media server. Multimedia connections have
multiple media streams of different media types, each corresponding
to the an RTP session. Network connections get instantiated through SIP request which initiated the transaction. An optional child
element <description> may include text which expands on
[1].
A conference represents the meaning media resources and state information
required for a single logical mix of error responses. Response codes are defined each media type in section 11
(Response Codes).
attributes:
response: a numeric code indicating the overall success or
failure of the transaction,
conference (e.g. audio and in the case of failure, an
indication of the reason. Mandatory.
mark: in the case of an error, the value of the mark attribute
from the last successfully executed element that included the
mark attribute.
In the case of failure, a description video). MSML models multiple mixes/views
of the reason SHOULD same media type as separate conferences. Each conference has
multiple inputs. Inputs may be
provided using the child element <description>.
Three other child elements divided into classes that allow the response an
application to include identifiers
for objects created by the request but which did not have instance
names specified by different media treatment for different
participants. For example, the client. Those elements are <confid> and
<dialogid>, video streams for objects created though a <createconference> and
<dialogstart> respectively.
8.4 <event>
The <event> element is used to notify an event some participants
may be assigned to fixed regions of the screen while those for other
participants may only be shown when they are speaking.
A conference has a single logical output per media server
client. Three types type. For each
participant, it consists of events are defined by MSML Core package;
"msml.dialog.exit", "msml.conf.nomedia", and "msml.conf.asn". These
correspond to the termination of an executing dialog, a audio conference
being automatically deleted when mix, less any
contributed audio of the last participant has left, participant, and the notification of the current set of active speakers for a
conference, respectively. Events may also be generated video mix shared by all
conference participants. Video conferences using voice activated
switching have an
executing dialog. In this case the event type is specified by optional ability to show the
dialog. (see MSML Dialog Core Package <send>).
attributes:
name: previous speaker to
the type of event. If current speaker.
Conferences are instantiated using the event is generated because <createconference> element.
The content of the execution MSML Dialog <send>, the value MUST be <createconference> element specifies the value
parameters of the "event" attribute audio and/or video mixes.
Dialogs are a class of objects that represent automated participants.
They are similar to network connections from a media flow perspective
and may have one or more media streams as the <send> element abstraction for their
interface within a media server. Unlike connections however, dialogs
are created and destroyed through MSML, and the
MSML Dialog Core package. If media server itself
implements the event is generated because of dialog participant. Dialogs are instantiated through
the execution <dialogstart> element. Contents of an <exit>, the value MUST be "moml.exit". If <dialogstart> element
define the event is generated because of desired or expected dialog behavior. Dialogs may also be
invoked by referencing VoiceXML as the execution of a
Saleem & Sharratt dialog description language.
Saleem, et al. Expires - December 2006 April 2007 [Page 21] 25]
Internet-draft Media Server Markup Language June October 2006
(MSML)
<disconnect>, the value MUST be "moml.disconnect". If the event
is generated because
Operators are implicit functions that are used to filter or transform
a media stream. The function that an instance of an error, the value must be
"moml.error". Mandatory.
id: the identifier operator fulfills
is defined as a property of the conference or dialog that generated
the event or caused the event to media stream. Operators may be generated. Mandatory.
<event> has two children, <name> and <value>, which contain the
name
unidirectional or bidirectional and value respectively of each namelist item associated
with the event.
9. MSML Conference Core Package
9.1 Conferences
A conference has a mixer for each type of media that the conference
supports. Each mix has have a corresponding description that defines how
the media type. Unidirectional
operators reflect simple atomic functions such as automatic gain
control, filtering tones from participants contributes conferences, or applying specific gain
values to that mix. A mixer has
multiple inputs that are combined in a stream. Unidirectional operators have a single media specific way
input, which is connected to create the media stream from one object, and a
single logical output.
The elements that describe the mix for each media type are called
mixer description elements. They are:
<audiomix> defines the parameters for mixing audio media.
<videolayout> defines output, which is connected to the composition media stream of a video window.
These elements, defined in sections 9.6 (Audio Mix)
different object.
Bidirectional operators have two media inputs and 9.7 (Video
Layout) respectively, are used as content of two media outputs.
One media input and output is associated with the <createconference>
element stream to establish one
object and the initial properties of other input and output is associated with a conference. The
elements are used within the <modifyconference> element stream to change the
properties of
a conference once it has been created, or within different object. Bidirectional objects may treat the
<destroyconference> element media
differently in each direction. For example, an operator could be
defined which changed the media sent to remove individual mixes a connection based upon
recognized speech or DTMF received from the
conference.
Conferences may be terminated by an MSML client connection. Operators are
implicitly instantiated when streams are created or modified using
the
<destroyconference> elements <join> element to remove and elements <modifystream> respectively.
The relationships between the entire conference or by
removing different object classes is shown in
the last mixer(s) associated figure below.
+--------------------------------------+
| Media Server |
| |
|------+ ,---. |
| | +------+ / \ |
<== RTP ==>| conn |<---->| oper |<---->( conf ) |
| | +------+ \ / |
|------+ `---' |
| ^ ^ |
| | | |
| | +------+ +------+ | |
| | | | | | | |
| +-->|dialog| |dialog|<---+ |
| | | | | |
| +------+ +------+ |
+--------------------------------------+
A single, full-duplex instance of each object class is shown together
with the conference.
Conferences can also be terminated automatically by a media server
based on criteria specified when the conference common relationships between them. An operator (such as gain) is created. When the
shown between a connection and a conference and dialogs are shown
participating both with an individual connection and with a
conference. The figure is deleted, any remaining participants not meant to imply only one to one
relationships. Conferences will often have their
associated SIP dialogs left unchanged or deleted based on the value hundreds of the "term" attribute specified when the conference was created.
Saleem & Sharratt participants,
Saleem, et al. Expires - December 2006 April 2007 [Page 22] 26]
Internet-draft Media Server Markup Language June October 2006
(MSML)
9.2 Media Streams
Objects have at least one media input and output for each type of
media that they support. Each object class defines the number of
inputs
and outputs objects of that class support. Media streams are
created when objects are joined, either explicitly using <join>, connections or
implicitly when conferences may be interacting with more
than one dialog. For example, one dialog may be recording a
conference while other dialogs announce participants joining or
leaving the conference.
7.2
Identifiers
Objects are created referenced using <dialogstart>. Dialog
creation has two stages, allocating identifiers that are composed of one or
more terms. Each term specifies an object class and configuring the resources
required for the dialog instance, names a specific
instance within that class. The object class and implicitly joining those
resources instance are
separated by a colon ":" in an identifier term.
Identifiers are assigned to objects when they are first created. In
general, either the dialog target during the dialog execution. Refer to MSML Dialog Base package.
A join operation by default creates client or a bidirectional audio stream
between two objects. Video and unidirectional streams media server may also specify the
instance name for an object. Objects for which a client does not
assign an instance name will be
created. A media stream is created assigned one by connecting a media server. Media
server assigned instance names are returned to the output from one client as a
complete object identifier in the response to the input request that
created the object.
It is meaningful for some classes of another object and vice versa (assuming objects to exist independently
on a
bidirectional or full-duplex join).
Many media server. Network connections may be created through SIP at
any time. MSML can then be used to associate their media with other
objects as required to create services. Conferences may only support be created
and have specific resources reserved waiting for participant
connections.
Objects from these two classes, connections and conferences, are
considered independent objects since they can exist on a single input standalone
basis. Identifiers for each type of media.
Within this specification, only the conference object class supports
an arbitrary number independent objects consist of inputs. When single term as
defined above. For example, identifiers for a stream is requested to conference and
connection could be
created "conf:abc" or "conn:1234" respectively. Clients
which choose to an object that already has a stream of the same type
connected assign instance names to its single input, independent objects must use
globally unique instance names. One way to create globally unique
names is to include the result domain name of the request depends upon
the type client as part of the media stream.
Audio mixing is done by summing audio signals. Automatically mixing
audio streams has common and straight forward applications. For
example, the ability
name.
Dialogs are created to bridge two streams allows for the easy
creation of simple three-way calls or provide a service to bridge private announcements
with independent objects.
Dialogs may act as a participant in a [whispered] conference mix for an individual participant. In or interact with a
connection similar to a two participant call. Dialogs depend upon the case
existence of general conferences however, an MSML client SHOULD create
an audio conference independent objects and then join participants to the conference.
Conference mixers SHOULD subtract this is reflected in the audio
composition of each participant from their identifiers. Operators modify the mix so that they do not hear themselves.
A media server that receives a request that requires joining an audio
stream to the single audio input flow
between other objects, such as application of an object that already has an
audio stream connected, SHOULD automatically bridge the new stream
with the existing stream, creating gain between a mix of the two audio streams.
The maximum number of streams that may be bridged in this manner is
implementation-specific. It is RECOMMENDED that
connection and a conference. As operators are merely media server
support bridging at least two streams. A media server that cannot
bridge a new stream with any existing streams MUST fail the operation
requesting transform
primitives defined as properties of the join.
Unlike audio mixing, there are many different ways that two video
streams may be combined and presented. For example, media stream, they may be
presented side are not
represented by side in separate panes, picture in picture, or in a
Saleem & Sharratt identifiers and created implicitly.
Saleem, et al. Expires - December 2006 April 2007 [Page 23] 27]
Internet-draft Media Server Markup Language June October 2006
(MSML)
single pane which displays only a single stream at
Identifiers for dialogs are composed of a time based on structured list of slash
('/') separated terms. The left-most term of the identifier must
specify a
heuristic such conference or connection. This serves as active speaker. Each the root for the
identifier. An example of these options creates an identifier for a
very different presentation dialog acting as a
conference participant could be:
conf:abc/dialog:recorder
All objects except connections are created using MSML. Connections
are created when media sessions get established through SIP. There
are several options clients and require significantly different media
resources.
A join operation does not describe how a new stream servers can be combined
with an existing stream. Therefore automatic bridging of video is not
supported. A media server MUST fail requests use to join establish a new video
stream to an object that only supports
shared instance name for a single video input connection and
already has its media streams.
When media servers support multiple media types, the instance name
SHOULD be a video stream connected to call identifier that input. For an object to
have multiple video streams joined can be used to it, identify the object itself must be
capable
collection of RTP sessions associated with a call. When MSML is used
in supporting multiple video streams. Conference objects can
support multiple video streams conjunction with SIP and provide a way to specify third party call control, the
mixing presentation for call
identifier MUST be the same as the local tag assigned by the video streams.
A media
server MUST NOT establish any streams unless to identify the SIP dialog. This will be the tag the media
server
is able adds to create all the streams requested by "To" header in its response to an operation. Streams
are only able initial invite
transaction. RFC 3261 requires the tag values to be created if both objects support globally unique.
An example of a connection identifier is: conn:74jgd63956ts.
With third party call control, the MSML client acts as a back to back
user agent (B2BUA) to establish the media type sessions. SIP dialogs are
established between the client and
at least one the media server allowing the use
of the following conditions is true:
1. each object that is to receive media is not already receiving server local tag as a
stream of that type.
2. any object that connection identifier. If third
party call control is not used, a SIP event package MAY be used to receive media and is already receiving
allow a
stream of that type supports receiving an additional stream of
that type. The only class of objects defined in this
specification that directly support receiving multiple streams
of the same type are conferences.
3. the media server is able to automatically bridge media streams
for an object that is notify new sessions to receive media and a client that is already
receiving has
subscribed to this information.
Identifiers as described above allow every object in a stream of the requested type. The only type of media defined server
to be uniquely addressed. They can also be used to refer to multiple
objects. There are two ways in which this specification that MAY can currently be automatically
bridged done:
wildcards
common instance names
An identifier can reference multiple objects when a wildcard is audio.
The directionality used
as an instance name. MSML reserves the instance name comprised of media streams associated with a connection are
modeled independently from what SDP [18] allows for
single asterisk ('*') to mean all objects that have the corresponding
RTP [15] sessions. Media servers same
identifier root and class. Instance names containing an asterisk
cannot be created. Wildcards MUST respect only be used as the SDP in what they
actually transmit but right most term
of an identifier and MUST NOT allow the SDP to affect the
directionality when joining streams internal to the media server.
9.3 <createconference>
<createconference> is be used to allocate and configure the media mixing
resources for conferences. A description as part of the properties for each
type of media mix required root for the conference is defined within the
content of the <createconference> element. Mixer descriptions dialog
identifiers. Wildcards are
described in Audio Mix and Video Layout sections. When no mixer
Saleem & Sharratt only allowed where explicitly indicated
below.
Saleem, et al. Expires - December 2006 April 2007 [Page 24] 28]
Internet-draft Media Server Markup Language June October 2006
(MSML)
descriptions
The following are specified, the default behavior MUST be equivalent
to inclusion examples of valid wildcards:
conf:abc/dialog:*
conn:*
Examples of illegal wildcard usage are:
conf:*/dialog:73849
Although identifiers share a single <audiomix>.
Clients can request that a media server automatically delete a
conference when a specified condition occurs by using common syntax, MSML elements restrict
the
"deletewhen" attribute. A value class of "nomedia" indicates that the
conference MUST be deleted when no participants remain into the
conference. When this occurs, objects which are valid in a given context. As an "msml.conf.nomedia" event MUST be
notified
example, although it is valid to the join two connections together, it is
not valid to join two IVR dialogs.
8.
MSML client. A value of "nocontrol" indicates Core Package
This section describes the
conference core MSML package which MUST be deleted when the SIP [1] dialog that carries the
<createconference> element supported
in order to use any other MSML packages. The core MSML package
defines a framework, without explicit functionality, over which
functional packages are used.
8.1
<msml>
<msml> is terminated. the root element. When this occurs, a media
server MUST terminate all participant dialogs received by sending a BYE for
their associated SIP dialog. A value of "never" MUST leave media server, it
defines the
ability to delete a conference under the control set of the operations that form a single MSML client.
attributes:
name: request.
Operations are requested by the instance name contents of the conference. If element. Each
operation MAY appear zero or more times as children of <msml>.
Specific operations are defined within the attribute is
not present, Conference package and in
the set of Dialog packages.
The results of a request or the contents of events sent by a media
server MUST assign are also enclosed within the <msml> element. The results of
the transaction are included as a globally unique
name for body in the conference. If response to the attribute is present but SIP
request that contained the
name is already in use, an error (432) transaction. This response will result and MSML
document execution MUST stop. Events which the conference
generates use this name as contain
any identifiers that the value of their "id" attribute
(see section 5.6.2 (<event>)).
deletewhen: defines whether media server assigned to newly created
objects. All messages that a media server should automatically
delete the conference. Possible values generates are "nomedia",
"nocontrol", correlated to
an object identifier. Objects and "never". Default is "nomedia".
term: when true, identifiers are discussed in
section 7 (Media Server Object Model).
Attributes:
version: "1.1" Mandatory
Saleem, et al. Expires - April 2007 [Page 29]
Internet-draft Media Server Markup Language October 2006
(MSML)
8.2
<send>
Events are used to affect the behavior of different objects within a
media server MUST server. The <send> element is used to send a BYE request on
all SIP dialogs still associated with an event to the conference when
specified recipient within the
conference Media Server.
Attributes:
event: the name of an event. Mandatory.
target: an object identifier. When the identifier is deleted. Setting term equal to false allows
clients for a
dialog, it may optionally be appended with a slash "/" followed
by the target to start dialogs on connections once be included in a MSML Dialog <send>.
Mandatory.
valuelist: a list of zero or more parameters that are included
with the conference has
completed. Default true. event.
mark: a token which MAY that can be used to identify execution progress
in the case of errors. The value of the mark attribute from the
last successfully executed MSML element is returned in an error
response. Therefore the value of all mark attributes within an
MSML document should be unique.
An example
8.3
<result>
The <result> element is used to report the results of creating an audio conference MSML
transaction. It is shown below. This
conference allows at most two participants to contend included as a body in the final response to be heard the
SIP request which initiated the transaction. An optional child
element <description> may include text which expands on the meaning
of error responses. Response codes are defined in section 11
(Response Codes).
attributes:
response: a numeric code indicating the overall success or
failure of the transaction, and
reports in the set case of active speakers no more frequently than every ten
seconds.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
Saleem & Sharratt failure, an
indication of the reason. Mandatory.
mark: in the case of an error, the value of the mark attribute
from the last successfully executed element that included the
mark attribute.
In the case of failure, a description of the reason SHOULD be
provided using the child element <description>.
Three other child elements allow the response to include identifiers
for objects created by the request but which did not have instance
Saleem, et al. Expires - December 2006 April 2007 [Page 25] 30]
Internet-draft Media Server Markup Language June October 2006
(MSML)
<createconference name="example">
<audiomix>
<n-loudest n="3"/>
<asn ri="10s"/>
</audiomix>
</createconference>
</msml>
9.3.1 <reserve>
Conference resources may be reserved
names specified by including the <reserve> client. Those elements are <confid> and
<dialogid>, for objects created though a <createconference> and
<dialogstart> respectively.
8.4
<event>
The <event> element as is used to notify an event to a child media server
client. Three types of <createconference>. <reserve> allows events are defined by MSML Core package;
"msml.dialog.exit", "msml.conf.nomedia", and "msml.conf.asn". These
correspond to the
specification termination of an executing dialog, a set conference
being automatically deleted when the last participant has left, and
the notification of resources which a media server will reserve
for the conference. Any requests current set of active speakers for resources beyond those that have
been reserved should be honored on a best-effort basis
conference, respectively. Events may also be generated by a media
server. an
executing dialog. In this case the event type is specified by the
dialog. (see MSML Dialog Core Package <send>).
attributes:
required: boolean that specifies whether <createconference>
should fail if
name: the requested resources are not available. When
set to false, type of event. If the conference will be created, with no reserved
resources, if event is generated because of
the complete reservation cannot execution MSML Dialog <send>, the value MUST be honored.
Default true.
9.3.1.1 <resource>
The resources to the value
of the "event" attribute from the <send> element within the
MSML Dialog Core package. If the event is generated because of
the execution of an <exit>, the value MUST be reserved are defined using <resource>. The
contents "moml.exit". If
the event is generated because of the execution of these elements describe a resource that
<disconnect>, the value MUST be "moml.disconnect". If the event
is to generated because of an error, the value must be
reserved. Descriptions are implementation-dependent. Media servers
that support MSML Dialogs may use
"moml.error". Mandatory.
id: the elements from identifier of the conference or dialog that package as generated
the basis for resource descriptions. Each resource element may use event or caused the attribute "n" event to define be generated. Mandatory.
<event> has two children, <name> and <value>, which contain the quantity
name and value respectively of each namelist item associated
with the resource to reserve.
For example, the following creates a event.
9.
MSML Conference Core Package
9.1
Conferences
A conference and reserves two
types has a mixer for each type of resources. One resource element may represent resources media that
are shared by all participants of the conference while
supports. Each mix has a corresponding description that defines how
the other may
represent resources media from participants contributes to that mix. A mixer has
multiple inputs that are reserved for each of the expected
participants.
attributes:
n: number of resources combined in a media specific way to be reserved. Default 1.
type: specifies whether create a
single logical output.
The elements that describe the resource is to be reserved by mix for each
individual participant or reserved as a shared conference
Saleem & Sharratt media type are called
mixer description elements. They are:
Saleem, et al. Expires - December 2006 April 2007 [Page 26] 31]
Internet-draft Media Server Markup Language June October 2006
(MSML)
resource. Valid values
<audiomix> defines the parameters for this attribute are "individual" or
"shared". Default "individual".
<createconference>
<reserve>
<resource n="20">
<!--description mixing audio media.
<videolayout> defines the composition of resources used by each participant-->
</resource>
<resource n="2" type="shared">
<!--description a video window.
These elements, defined in sections 9.6 (Audio Mix) and 9.7 (Video
Layout) respectively, are used as content of the shared conference resources-->
</resource>
</reserve>
</createconference>
9.4 <modifyconference>
All of <createconference>
element to establish the initial properties of an audio mix or the presentation of a video
mix may be changed during conference. The
elements are used within the life <modifyconference> element to change the
properties of a conference using once it has been created, or within the
<modifyconference> element. Changes to an audio mix are requested by
including an <audiomix>
<destroyconference> element as a child of <modifyconference>.
This to remove individual mixes from the
conference.
Conferences may also be used to add terminated by an audio mixer MSML client using the
<destroyconference> element to remove the entire conference if none
was previously allocated. Changes to a video presentation are
requested or by including a <videolayout> element as a child of
<modifyconference>. Similar to an audio mixer, this may
removing the last mixer(s) associated with the conference.
Conferences can also be used to
add a video mixer if none was previously allocated.
Mixers are removed terminated automatically by including a mixer description element within
<destroyconference/>.
Features and presentation aspects are enabled/added or modified by
including media server
based on criteria specified when the element(s) that define conference is created. When the feature
conference is deleted, any remaining participants will have their
associated SIP dialogs left unchanged or presentation
aspect within a mixer description. The complete specification deleted based on the value
of the
element must be included just as it would be included "term" attribute specified when the conference is was created. The new definition completely replaces any
previous definition
9.2
Media Streams
Objects have at least one media input and output for each type of
media that existed. Only things they support. Each object class defines the number of
inputs and outputs objects of that class support. Media streams are defined by
elements included in the mixer descriptions
created when objects are affected. Any
existing configuration aspects of a conference, which joined, either explicitly using <join>, or
implicitly when dialogs are not
specified within created using <dialogstart>. Dialog
creation has two stages, allocating and configuring the <modifyconference/> element, MUST maintain their
current state in resources
required for the Media Server.
For example, if an MSML client wanted dialog instance, and implicitly joining those
resources to change the minimum reporting
interval for active speaker notification from that shown in the
Conference Examples section (<createconference>) it would send dialog target during the
following dialog execution. Refer to the media server:
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
Saleem & Sharratt Expires - December 2006 [Page 27]
Internet-draft Media Server Markup Language June 2006
(MSML)
<modifyconference id="conf:example">
<audiomix>
<asn ri="4"/>
</audiomix>
</modifyconference>
</msml>
This would
MSML Dialog Base package.
A join operation by default creates a bidirectional audio stream
between two objects. Video and unidirectional streams may also enable active speaker notification if it had not
previously been enabled. The N-loudest mixing is unaffected.
Multiple elements MAY be included in the mixer descriptions similar
to when conferences are
created. For example, in a video conference, A media stream is created by connecting the video mix description (<videolayout>) could specify that output from one
object to the
layout input of the video being displayed should change such that the
regions currently displaying participants get smaller another object and new
region(s) are created to support additional participants. A media
server MUST make all of the requested changes or none of the
requested changes.
Additional examples of modifying conferences are presented in the
Conference Examples section.
attributes:
id: the identifier for vice versa (assuming a conference. Wildcards MUST NOT be
used. Mandatory.
mark:
bidirectional or full-duplex join).
Many objects may only support a token which can be used to identify execution progress
in the case of errors. The value single input for each type of media.
Within this specification, only the mark attribute from the
last successfully executed MSML element is returned in conference object class supports
an error
response. Therefore the value arbitrary number of all "mark" attributes within
an MSML document SHOULD be unique.
9.5 <destroyconference>
Destroy conference inputs. When a stream is used requested to delete mixers or be
created to delete the entire
conference and all state and shared resources. When an object that already has a mixer is
removed, all stream of the streams joined same type
connected to that mixer are unjoined. When a
conference is destroyed, SIP dialogs for any remaining participants
MUST be maintained or removed based on its single input, the value result of the "term"
attribute when the conference was created.
When there is no element content, <destroyconference/> deletes request depends upon
the
entire conference. Individual mixer(s) are removed by including a
mixer description element identifying type of the mix(es) to be removed as
content to <destroyconference/>. <audiomix/> media stream.
Audio mixing is used remove done by summing audio
mixers signals. Automatically mixing
audio streams has common and <videolayout/> is used remove video mixers. When one or
Saleem & Sharratt straight forward applications. For
Saleem, et al. Expires - December 2006 April 2007 [Page 28] 32]
Internet-draft Media Server Markup Language June October 2006
(MSML)
more mixer descriptions are specified, then Media Server MUST only
delete
example, the specified mixer and MUST NOT affect any other existing
mixers. When <audiomix/> or <videolayout/> are identified ability to bridge two streams allows for
individual removal, other feature aspects of the mix MUST NOT be
included. If specified, the Media Server MUST ignore any such
elements. When the last mixer is removed from a conference, easy
creation of simple three-way calls or to bridge private announcements
with a media
server MUST remove all [whispered] conference state, leaving or removing any
remaining SIP dialogs as described above.
attributes:
id: the identifier mix for a conference. Mandatory.
mark: a token which can be used to identify execution progress
in an individual participant. In
the case of errors. The value of the mark attribute from the
last successfully executed general conferences however, an MSML element is returned in client SHOULD create
an error
response. Therefore audio conference and then join participants to the value of all "mark" attributes within
an MSML document conference.
Conference mixers SHOULD be unique.
9.6 <audiomix>
The properties of subtract the overall audio mix are specified using of each participant from
the
<audiomix> element.
Attributes:
id: mix so that they do not hear themselves.
A media server that receives a request that requires joining an optional identifier for audio
stream to the single audio mix.
An example input of the description for an object that already has an
audio stream connected, SHOULD automatically bridge the new stream
with the existing stream, creating a mix is:
<audiomix id="mix1">
<asn ri="10s"/>
<n-loudest n="3">
</audiomix>
9.6.1 <n-loudest> of the two audio streams.
The <n-loudest> element defines maximum number of streams that participants contend to may be
included bridged in this manner is
implementation-specific. It is RECOMMENDED that a media server
support bridging at least two streams. A media server that cannot
bridge a new stream with any existing streams MUST fail the conference mix based upon their audio energy. When operation
requesting the element is not present, all participants join.
Unlike audio mixing, there are mixed.
Attributes:
n: the number of participants many different ways that will two video
streams may be included combined and presented. For example, they may be
presented side by side in the
audio mix separate panes, picture in picture, or in a
single pane which displays only a single stream at a time based upon having the greatest audio energy.
Saleem & Sharratt Expires - December 2006 [Page 29]
Internet-draft Media Server Markup Language June 2006
(MSML)
9.6.2 <asn>
The <asn> element enables notification of on a
heuristic such as active speakers. Active
speakers MUST speaker. Each of these options creates a
very different presentation and require significantly different media
resources.
A join operation does not describe how a new stream can be notified using the <event> element combined
with an event
name of "msml.conf.asn". The namelist of the event consists of the
set of active speakers. The name existing stream. Therefore automatic bridging of each item video is the string "speaker"
with not
supported. A media server MUST fail requests to join a value of the connection identifier for the connection.
Attributes:
ri: the minimum reporting interval defines the minimum duration
of time which must pass before changes new video
stream to active speakers will
be reported. A value of zero disables active speaker
notification.
An example of an active speaker notification is:
<event name="msml.conf.asn" id="conf:example">
<name>speaker</name>
<value>conn:hd93tg5hdf</value>
<name>speaker</name>
<value>conn:w8cn59vei7</value>
<name>speaker</name>
<value>conn:p78fnh6sek47fg</value>
</event>
9.7 <videolayout>
A object that only supports a single video layout is specified using the <videolayout> element. It is
used as input and
already has a container video stream connected to hold elements that describe all of the
properties of a input. For an object to
have multiple video mix. The parameters of the window that displays streams joined to it, the object itself must be
capable in supporting multiple video mix are defined by streams. Conference objects can
support multiple video streams and provide a way to specify the <root> element. When
mixing presentation for the video mix
in composed of multiple panes, streams.
A media server MUST NOT establish any streams unless the location and characteristics of media server
is able to create all the panes are defined streams requested by one or more <region> elements. A <region>
element is not required when an operation. Streams
are only able to be created if both objects support a single video stream is displayed media type and
at least one time and none of the visual attributes of regions are
required.
Some regions may be used following conditions is true:
1. each object that is to display receive media is not already receiving a video
stream based on a
selection criteria rather than having of that type.
2. any object that is to receive media and is already receiving a video
stream of a single
participant continuously presented in the region. One such that type supports receiving an example
is a distance learning lecture where the instructor sees each of the
students periodically displayed in a region. When a region is used to
display one of a number additional stream of streams, it is placed as a child
that type. The only class of a
<selector> element.
Attributes:
Saleem & Sharratt objects defined in this
Saleem, et al. Expires - December 2006 April 2007 [Page 30] 33]
Internet-draft Media Server Markup Language June October 2006
(MSML)
type: specifies the language used to define
specification that directly support receiving multiple streams
of the layout. Layouts
defined using MSML MUST use same type are conferences.
3. the value "text/msml-basic-layout".
This media server is the same convention as defined able to automatically bridge media streams
for an object that is to receive media and that is already
receiving a stream of the layout package
from the W3C SMIL 2.0 specification [19]. requested type. The default when
omitted only type of
media defined in this specification that MAY be automatically
bridged is "text/msml-basic-layout".
id: an optional identifier audio.
The directionality of media streams associated with a connection are
modeled independently from what SDP [18] allows for the video layout.
9.7.1 <root>
The <root> element describes corresponding
RTP [15] sessions. Media servers MUST respect the root window or virtual screen SDP in
which the conference video mix will be displayed. Simple conferences
can display participant video directly within the root window what they
actually transmit but
more complex conferences will use regions for this purpose. Areas of MUST NOT allow the window which are not used SDP to display video will show affect the root
window background.
All video presentations require a root window. It MUST be present
directionality when a video mix joining streams internal to the media server.
9.3
<createconference>
<createconference> is created used to allocate and it cannot be deleted, however its
attributes MAY be changed using the <modifyconference> element.
Attributes:
size: configure the size media mixing
resources for conferences. A description of the root window specified as one properties for each
type of the five
standard common intermediate formats (e.g. CIF, QCIF, etc.).
backgroundcolor: the color media mix required for the root window background conference is defined using the values for the "background-color" property of
the CSS2 specification [20].
backgroundimage: the URI for an image to be displayed as within the
root window background. Transparent portions
content of the image allow
the background color to show through.
9.7.2 <region>
<region> elements define video panes that <createconference> element. Mixer descriptions are used to display
participant video streams. Regions
described in Audio Mix and Video Layout sections. When no mixer
descriptions are rendered on top of specified, the root
window.
The size default behavior MUST be equivalent
to inclusion of a region is single <audiomix>.
Clients can request that a media server automatically delete a
conference when a specified relative to the size of the root
window condition occurs by using the "relativesize"
"deletewhen" attribute. Relative sizes are
expressed as fractions (e.g. 1/4, 1/3) that preserve the aspect ratio A value of "nomedia" indicates that the original video stream while allowing for efficient scaling
implementations.
Regions are located on
conference MUST be deleted when no participants remain into the root window based on
conference. When this occurs, an "msml.conf.nomedia" event MUST be
notified to the MSML client. A value of "nocontrol" indicates the
position attributes "top" and "left". These attributes define the
Saleem & Sharratt Expires - December 2006 [Page 31]
Internet-draft Media Server Markup Language June 2006
(MSML)
position of the top left corner of the region as an offset from
conference MUST be deleted when the
top left corner of SIP [1] dialog that carries the root window. Their values may be expressed
either as
<createconference> element is terminated. When this occurs, a number of pixels or as media
server MUST terminate all participant dialogs by sending a percent of the vertical or
horizontal dimension BYE for
their associated SIP dialog. A value of "never" MUST leave the root window. Percent values are appended
with a percent ('%') character. Percent values of "33%" and "67%"
should be interpreted as "1/3" and "2/3" to allow easy alignment of
regions whose size is expressed relative
ability to the size of the root
window.
An example of delete a video layout with six regions is:
+-------+---+
| | 2 |
| 1 +---+
| | 3 |
+---+---+---+
| 6 | 5 | 4 |
+---+---+---+
<videolayout type="text/msml-basic-layout">
<root size="CIF"/>
<region id="1" left="0" top="0" size="2/3"/>
<region id="2" left="67%" top="0" size="1/3"/>
<region id="3" left="67%" top="33%" size="1/3">
<region id="4" left="67%" top="67%" size="1/3"/>
<region id="5" left="33%" top="67%" size="1/3"/>
<region id="6" left="0" top="67%" size="1/3"/>
</videolayout>
The area of conference under the root window covered by a region is a function control of the
region's position and its size. When areas of different regions
overlap, they are layered in order MSML client.
attributes:
name: the instance name of their "priority" attribute. The
region with the highest value for conference. If the "priority" attribute is below
all other regions and will be hidden by overlapping regions. The
region with
not present, the lowest non-zero value media server MUST assign a globally unique
name for the "priority" conference. If the attribute is
on top of all other regions and will not be hidden by overlapping
regions. The priority attribute may be assigned values between 0 and
1. A value of zero disables the region, freeing any resources
associated with present but the region, and unjoining any video stream displayed
name is already in the region.
Regions that do not specify a priority use, an error (432) will be assigned a priority by
a media server when a conference is created. The first region within result and MSML
document execution MUST stop. Events which the <videolayout> element that does not specify a priority will be
assigned a priority of one, conference
generates use this name as the second a priority value of two, etc. In
this way, all regions that do not explicitly specify a priority will
be underneath all regions that do specify a priority. As well, within
Saleem & Sharratt their "id" attribute
(see section 5.6.2 (<event>)).
Saleem, et al. Expires - December 2006 April 2007 [Page 32] 34]
Internet-draft Media Server Markup Language June October 2006
(MSML)
those regions that do not specify
deletewhen: defines whether a priority, they will be layered
from top to bottom, in media server should automatically
delete the order they appear within conference. Possible values are "nomedia",
"nocontrol", and "never". Default is "nomedia".
term: when true, the <videolayout>
element.
For example, if media server MUST send a layout was specified as follows:
<videolayout>
<root size="CIF"/>
<region id="a" ... priority=".3" .../>
<region id="b" ... />
<region id="c" ... priority=".2" ...>
<region id="d" ... />
</videolayout>
Then BYE request on
all SIP dialogs still associated with the regions would be layered, from top conference when the
conference is deleted. Setting term equal to bottom, c,a,b,d.
Portions of regions that extend beyond false allows
clients to start dialogs on connections once the root window will be
cropped. For example, conference has
completed. Default true.
mark: a layout specified as:
<videolayout>
<root size="CIF"/>
<region id="foo" left="50%" top="50%" size="2/3"/>
</videolayout>
would appear similar to:
+-----------+
| root |
|background |
| +-----+--
| | |//
| | foo |//
+-----+-----+//
|////////
Visual attributes are token which MAY be used to define aspects identify execution progress
in the case of errors. The value of the visual appearance mark attribute from the
last successfully executed MSML element is returned in an error
response. Therefore the value of individual regions. A border may all mark attributes within an
MSML document should be defined together with a title
and/or logo. Text unique.
An example of creating an audio conference is shown below. This
conference allows at most two participants to contend to be heard and logos are displayed
reports the set of active speakers no more frequently than every ten
seconds.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<createconference name="example">
<audiomix>
<n-loudest n="3"/>
<asn ri="10s"/>
</audiomix>
</createconference>
</msml>
9.3.1
<reserve>
Conference resources may be reserved by including the <reserve>
element as images on top a child of <createconference>. <reserve> allows the
region's video, below all regions with
specification of a lower priority. The visual
attributes are "title", "titletextcolor", "titlebackgroundcolor",
"bordercolor", "borderwidth", and "logo".
Visual attributes can also be defined set of resources which a media server will reserve
for individual streams (Video
Stream Properties). When visual attributes the conference. Any requests for resources beyond those that have
been reserved should be honored on a best-effort basis by a media
server.
attributes:
required: boolean that specifies whether <createconference>
should fail if the requested resources are specified as part of
Saleem & Sharratt not available. When
set to false, the conference will be created, with no reserved
resources, if the complete reservation cannot be honored.
Default true.
Saleem, et al. Expires - December 2006 April 2007 [Page 33] 35]
Internet-draft Media Server Markup Language June October 2006
(MSML)
both a region and
9.3.1.1
<resource>
The resources to be reserved are defined using <resource>. The
contents of these elements describe a stream, those associated with the stream MUST
take precedence. This allows streams resource that are chosen for display
automatically (Stream Selection) is to have proper text and logos
displayed. The region visual attributes be
reserved. Descriptions are displayed when no stream
is associated with the region.
Two other attributes associated with a region, "blank" and "freeze",
define the state of implementation-dependent. Media servers
that support MSML Dialogs may use the video displayed in elements from that package as
the region. When basis for resource descriptions. Each resource element may use
the blank
or freeze attribute is assigned the value "true", then the Media
Server MUST display "n" to define the region either as a blank region, or quantity of the video
image frozen at resource to reserve.
For example, the last received frame.
Open Issue: these attributes are specified for following creates a region conference and not
allowed for streams because reserves two
types of resources. One resource element may represent resources that appears to be
are shared by all participants of the common use case.
Applying them to streams would allow only that stream to be affected
within a selector conference while the other streams continue to display normally.
Except may
represent resources that are reserved for personal mixing scenarios, each of the same effect can expected
participants.
attributes:
n: number of resources to be achieved
by having reserved. Default 1.
type: specifies whether the participant mute their own transmission resource is to the media
server.
Attributes associated with be reserved by each region are:
id:
individual participant or reserved as a name that can be shared conference
resource. Valid values for this attribute are "individual" or
"shared". Default "individual".
<createconference>
<reserve>
<resource n="20">
<!--description of resources used to refer to the region.
left: the position by each participant-->
</resource>
<resource n="2" type="shared">
<!--description of the region from the left side shared conference resources-->
</resource>
</reserve>
</createconference>
9.4
<modifyconference>
All of the root
window.
top: the position properties of an audio mix or the region from the top presentation of a video
mix may be changed during the root
window.
relativesize: the size life of a conference using the region expressed
<modifyconference> element. Changes to an audio mix are requested by
including an <audiomix> element as a fraction child of
the root window size.
priority: a number between 0 and 1 that is <modifyconference>.
This may also be used to define the
precedence when rendering overlapping regions. A value of zero
disables add an audio mixer to the region.
title: text conference if none
was previously allocated. Changes to be displayed a video presentation are
requested by including a <videolayout> element as the title for the region
titletextcolor: the color of the text
titlebackgroundcolor: the color of the text background
bordercolor: the color of the region border
borderwidth: the width of the region border
logo: the URI a child of an image file
<modifyconference>. Similar to an audio mixer, this may be displayed
Saleem & Sharratt used to
add a video mixer if none was previously allocated.
Saleem, et al. Expires - December 2006 April 2007 [Page 34] 36]
Internet-draft Media Server Markup Language June October 2006
(MSML)
freeze: a boolean value, with
Mixers are removed by including a default of false, that defines
whether the video image should be frozen at mixer description element within
<destroyconference/>.
Features and presentation aspects are enabled/added or modified by
including the currently
displayed frame
blank: a boolean value, with a default of false, element(s) that defines
whether define the region should display black instead feature or presentation
aspect within a mixer description. The complete specification of the
associated video stream
9.7.3 <selector>
It is often desired that one of several video streams
element must be
automatically selected to included just as it would be displayed. The <selector> element is
used to define included when the selection criteria and its associated parameters.
The selection algorithm
conference is specified by the "method" attribute.
Currently defined selection methods allow for voice activated
switching and to iterate sequentially through the set of associated
video streams. created. The regions new definition completely replaces any
previous definition that existed. Only things that will display the selected video stream are placed as
child defined by
elements of included in the <selector> element. Including regions within a
<selector> element does mixer descriptions are affected. Any
existing configuration aspects of a conference, which are not affect
specified within the <modifyconference/> element, MUST maintain their layout with respect to
regions not subject to
current state in the selection. Media Server.
For simple video conferences
that display example, if an MSML client wanted to change the video directly minimum reporting
interval for active speaker notification from that shown in the root window,
Conference Examples section (<createconference>) it would send the <root>
element can be placed as a child of <selector>. Region
following to the media server:
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<modifyconference id="conf:example">
<audiomix>
<asn ri="4"/>
</audiomix>
</modifyconference>
</msml>
This would also enable active speaker notification if it had not
previously been enabled. The N-loudest mixing is unaffected.
Multiple elements MUST
NOT MAY be used included in this case. the mixer descriptions similar
to when conferences are created. For example, below is in a common video layout conference,
the video mix description (<videolayout>) could specify that allows the
layout of the video
stream from being displayed should change such that the
regions currently active speaker displaying participants get smaller and new
region(s) are created to be displayed in the large
region ("1") at the top left support additional participants. A media
server MUST make all of the layout while requested changes or none of the streams from
five other participants
requested changes.
Additional examples of modifying conferences are displayed presented in regions located at the
layout periphery.
+-------+---+
| | 2 |
| 1 +---+
| | 3 |
+---+---+---+
| 6 | 5 | 4 |
+---+---+---+
<videolayout type="text/msml-basic-layout">
<root size="CIF"/>
<selector id="switch" method="vas">
<region id="1" left="0" top="0" size="2/3"/>
</selector>
Saleem & Sharratt
Conference Examples section.
attributes:
id: the identifier for a conference. Wildcards MUST NOT be
used. Mandatory.
Saleem, et al. Expires - December 2006 April 2007 [Page 35] 37]
Internet-draft Media Server Markup Language June October 2006
(MSML)
<region id="2" left="67%" top="0" size="1/3"/>
<region id="3" left="67%" top="33%" size="1/3">
<region id="4" left="67%" top="67%" size="1/3"/>
<region id="5" left="33%" top="67%" size="1/3"/>
<region id="6" left="0" top="67%" size="1/3"/>
</videolayout>
All selector methods must be defined so that they work if only a
single region is
mark: a child token which can be used to identify execution progress
in the case of errors. The value of the selector. Selector methods that
support more than one child region MUST specify how mark attribute from the method works
across multiple regions. Media server implementations MAY support
only a single region for methods that are defined to allow multiple
regions.
The selector or region for a participant's video
last successfully executed MSML element is defined using returned in an error
response. Therefore the
"display" attribute value of <stream> during a join operation. Specifying a
selector allows the stream to all "mark" attributes within
an MSML document SHOULD be displayed according unique.
9.5
<destroyconference>
Destroy conference is used to delete mixers or to delete the criteria
defined by the selector method. Specifying entire
conference and all state and shared resources. When a region supports
continuous presence display mixer is
removed, all of participants. Some the streams may be joined with both a selector and to that mixer are unjoined. When a region. In this case,
conference is destroyed, SIP dialogs for any remaining participants
MUST be maintained or removed based on the value of
<blankothers> attribute defines whether the streams associated with a
continuous presence region should be blanked "term"
attribute when the stream conference was created.
When there is
selected for display in one of no element content, <destroyconference/> deletes the selector regions.
Attributes common to all selector methods are:
id:
entire conference. Individual mixer(s) are removed by including a name that can be used
mixer description element identifying the mix(es) to refer be removed as
content to the selector.
method: the name of the method <destroyconference/>. <audiomix/> is used to select the remove audio
mixers and <videolayout/> is used remove video stream.
status: specifies whether mixers. When one or
more mixer descriptions are specified, then Media Server MUST only
delete the selector is "active" specified mixer and MUST NOT affect any other existing
mixers. When <audiomix/> or
"disabled".
blankothers: when "true", video streams that <videolayout/> are also displayed
in continuous presence regions will have identified for
individual removal, other feature aspects of the continuous
presence regions blanked when mix MUST NOT be
included. If specified, the stream Media Server MUST ignore any such
elements. When the last mixer is displayed in removed from a
selection region.
9.7.3.1 <vas> Voice Activate Switching
Voice activated switching (VAS) is conference, a media
server MUST remove all conference state, leaving or removing any
remaining SIP dialogs as described above.
attributes:
id: the identifier for a conference. Mandatory.
mark: a token which can be used to display the video stream
that correlates with identify execution progress
in the participant who is currently speaking. It is
specified using a selector method case of errors. The value of "vas".
If the video stream associated with mark attribute from the active speaker
last successfully executed MSML element is not
currently displayed returned in a selection region, then it replaces an error
response. Therefore the video
in value of all "mark" attributes within
an MSML document SHOULD be unique.
9.6
<audiomix>
The properties of the region that is displaying overall audio mix are specified using the video of
<audiomix> element.
Attributes:
id: an optional identifier for the speaker that was
Saleem & Sharratt audio mix.
Saleem, et al. Expires - December 2006 April 2007 [Page 36] 38]
Internet-draft Media Server Markup Language June October 2006
(MSML)
least recently active. If the video
An example of the active speaker is
currently displayed in a selection region, then there is no change description for an audio mix is:
<audiomix id="mix1">
<asn ri="10s"/>
<n-loudest n="3">
</audiomix>
9.6.1
<n-loudest>
The <n-loudest> element defines that participants contend to
any region. be
included in the conference mix based upon their audio energy. When VAS
the element is applied to a single region, this has not present, all participants are mixed.
Attributes:
n: the
effect number of participants that will be included in the current speaker
audio mix based upon having the greatest audio energy.
9.6.2
<asn>
The <asn> element enables notification of active speakers. Active
speakers MUST be notified using the <event> element with an event
name of "msml.conf.asn". The namelist of the event consists of the
set of active speakers. The name of each item is displayed in that region.
Attributes associated the string "speaker"
with voice activated switching are:
si: switching a value of the connection identifier for the connection.
Attributes:
ri: the minimum reporting interval is defines the minimum period duration
of time that which must
elapse pass before allowing the video to switch changes to the active
speaker.
speakersees: defines whether the speakers will
be reported. A value of zero disables active speaker sees the
"current"
notification.
An example of an active speaker (themselves) or notification is:
<event name="msml.conf.asn" id="conf:example">
<name>speaker</name>
<value>conn:hd93tg5hdf</value>
<name>speaker</name>
<value>conn:w8cn59vei7</value>
<name>speaker</name>
<value>conn:p78fnh6sek47fg</value>
</event>
Saleem, et al. Expires - April 2007 [Page 39]
Internet-draft Media Server Markup Language October 2006
(MSML)
9.7
<videolayout>
A video layout is specified using the "previous" speaker.
9.8 <join>
<join> <videolayout> element. It is
used as a container to create one or more streams between two independent
objects. Streams may be audio or hold elements that describe all of the
properties of a video and may be bidirectional or
unidirectional. A bidirectional stream is implicitly composed mix. The parameters of two
unidirectional streams the window that can be manipulated independently. The
streams to be established displays
the video mix are specified defined by <stream> elements (section
<stream>) as the content of <join>.
Without any content, <join> by default establishes a bidirectional
audio stream. <root> element. When only a stream the video mix
in composed of a single type has previously been
created between two objects, multiple panes, the location and characteristics of
the panes are defined by one or more <region> elements. A <region>
element is not required when only a unidirectional single video stream
exists, <join> can is displayed
at one time and none of the visual attributes of regions are
required.
Some regions may be used to add display a video stream of another media type or
make the stream bidirectional by including the necessary <stream>
elements. Bidirectional streams are made unidirectional by using
<unjoin> (section <unjoin>) to remove the unidirectional based on a
selection criteria rather than having a video stream for of a single
participant continuously presented in the direction that region. One such an example
is no longer required.
In addition to defining a distance learning lecture where the media type and direction instructor sees each of streams,
<stream> elements are also the
students periodically displayed in a region. When a region is used to establish the properties
display one of
streams, such as gain, voice masking, or tone clamping a number of audio streams, or labels and other visual characteristics of video streams.
Properties are often defined asymmetrically for it is placed as a single direction child of a stream. Creating a bidirectional stream requires two <stream>
elements within
<selector> element.
Attributes:
type: specifies the <join>, one for each direction, if one direction
is language used to have different properties from define the other direction.
If a media server can provide services layout. Layouts
defined using both compressed or
uncompressed media, the MSML client may need to distinguish within
requests which format is to be used. When compressed streams are
created, both objects must MUST use the value "text/msml-basic-layout".
This is the same media format or an error
response (450) convention as defined for the layout package
from the W3C SMIL 2.0 specification [19]. The default when
omitted is generated.
attributes:
Saleem & Sharratt Expires - December 2006 [Page 37]
Internet-draft Media Server Markup Language June 2006
(MSML)
id1: an identifier of either a connection or conference.
Wildcards MUST NOT be used. Any other object class results in a
440 error.
id2: "text/msml-basic-layout".
id: an optional identifier of either a connection for the video layout.
9.7.1
<root>
The <root> element describes the root window or conference.
Wildcards MUST NOT be used. Any other object class results virtual screen in a
440 error.
mark: a token
which can be used to identify execution progress
in the case of errors. The value of the mark attribute from the
last successfully executed MSML element is returned in an error
response. Therefore the value of all mark attributes within an
MSML document SHOULD conference video mix will be unique.
For example, consider a call center coaching scenario where a
supervisor displayed. Simple conferences
can listen to display participant video directly within the conversation between an agent and a
customer, and provide hints to root window but
more complex conferences will use regions for this purpose. Areas of
the agent, window which are not heard by used to display video will show the
customer. One join establishes root
window background.
All video presentations require a stream between the agent and the
customer and another join establishes root window. It MUST be present
when a stream between the agent and
the supervisor. A third join video mix is used to establish a half-duplex
stream from the customer to the supervisor. The media server
automatically bridges the media streams from the customer and the
supervisor for the agent, and from the customer created and it cannot be deleted, however its
attributes MAY be changed using the agent for the
supervisor.
Assuming <modifyconference> element.
Attributes:
size: the following connections, each with a single audio stream:
conn:supervisor
conn:agent
conn:customer
The following would create size of the media flows previously described:
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<join id1="conn:supervisor" id2="conn:agent"/>
<join id1="conn:agent" id2="conn:customer"/>
<join id1="conn:supervisor" id2="conn:customer">
<stream media="audio" dir="to-id1"/>
</join>
</msml>
The following example, shows joining a participant to a multimedia
conference. It assumes that root window specified as one of the conference has a video presentation
Saleem & Sharratt five
standard common intermediate formats (e.g. CIF, QCIF, etc.).
Saleem, et al. Expires - December 2006 April 2007 [Page 38] 40]
Internet-draft Media Server Markup Language June October 2006
(MSML)
region named "topright". The "display" attribute is explained in
section Video Stream Properties.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<join id1="conn:hd83t5hf7g3" id2="conf:example">
<stream media="audio"/>
<stream media="video" dir="from-id1" display="topright"/>
<stream media="video" dir="to-id1"/>
</msml>
9.9 <modifystream>
Media streams can have different properties such as
backgroundcolor: the gain color for an
audio stream or a visual label the root window background
defined using the values for a video stream. These properties
are specified as the content "background-color" property of <stream> elements (section <stream>).
<modifystream> is used
the CSS2 specification [20].
backgroundimage: the URI for an image to change be displayed as the properties
root window background. Transparent portions of a stream by
including one or more <stream> the image allow
the background color to show through.
9.7.2
<region>
<region> elements define video panes that are used to have their
properties changed.
Stream properties MUST be set as specified by the element <stream> as
a child element display
participant video streams. Regions are rendered on top of <modifystream> element. Any properties not
included in the <stream> element when modifying a stream MUST remain
unchanged. Setting a property for only one direction root
window.
The size of a
bidirectional stream MUST NOT affect region is specified relative to the other direction. The
directionality size of streams can be changed the root
window using issuing an <unjoin>
followed by a <join>. Any streams that exist between the two objects
that "relativesize" attribute. Relative sizes are not included within <modifystream> MUST NOT be affected.
attributes:
id1: an identifier of either a conference or a connection. The
instance name MUST NOT contain a wildcard if "id2" contains a
wildcard. Mandatory.
id2: an identifier of either a conference or a connection. The
instance name MUST NOT contain a wildcard if "id1" contains a
wildcard. Mandatory.
mark: a token which can be used to identify execution progress
in
expressed as fractions (e.g. 1/4, 1/3) that preserve the case of errors. The value aspect ratio
of the mark attribute from original video stream while allowing for efficient scaling
implementations.
Regions are located on the
last successfully executed MSML element is returned in an error
response. Therefore root window based on the value of all mark the
position attributes within an
MSML document are RECOMMENDED to be unique.
Saleem & Sharratt Expires - December 2006 [Page 39]
Internet-draft Media Server Markup Language June 2006
(MSML)
9.10 <unjoin>
Unjoin removes one or more media streams between two objects. In "top" and "left". These attributes define the
absence
position of any content in <stream> element, all media streams between
the objects MUST be removed. Individual streams may be removed by
specifying them using <stream> elements, while the unspecified
streams MUST NOT be removed. A bidirectional stream is changed to a
unidirectional stream by unjoining top left corner of the direction that is no longer
required, using region as an offset from the <unjoin> element. Operator elements MUST NOT be
specified within <stream> elements when streams are being unjoined
using
top left corner of the <unjoin> element. Any specified stream operators MUST be
ignored.
<unjoin> and <join> root window. Their values may be used together to move a media stream, such expressed
either as from a main conference to a sidebar conference.
attributes:
id1: an identifier number of either a conference pixels or as a connection. The
instance name MUST NOT contain a wildcard if "id2" contains a
wildcard. Mandatory.
id2: an identifier percent of either a conference the vertical or
horizontal dimension of the root window. Percent values are appended
with a connection. The
instance name MUST NOT contain a wildcard if "id1" contains a
wildcard. Mandatory.
mark: a token which can percent ('%') character. Percent values of "33%" and "67%"
should be used interpreted as "1/3" and "2/3" to identify execution progress
in the case of errors. The value allow easy alignment of the mark attribute from the
last successfully executed MSML element
regions whose size is returned in an error
response. Therefore expressed relative to the value size of all mark attributes within an
MSML document SHOULD be unique.
The following removes a participant from a conference and plays a
leave tone for the remaining participants in the conference.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<unjoin id1="conn:jd73ht89sf489f" id2="conf:1"/>
<dialogstart target="conf:1" type="application/moml+xml">
<play>
<audio uri="file://leave_tone.wav"/>
</play>
</dialogstart>
</msml>
Saleem & Sharratt root
window.
An example of a video layout with six regions is:
+-------+---+
| | 2 |
| 1 +---+
| | 3 |
+---+---+---+
| 6 | 5 | 4 |
+---+---+---+
<videolayout type="text/msml-basic-layout">
<root size="CIF"/>
<region id="1" left="0" top="0" size="2/3"/>
<region id="2" left="67%" top="0" size="1/3"/>
<region id="3" left="67%" top="33%" size="1/3">
<region id="4" left="67%" top="67%" size="1/3"/>
Saleem, et al. Expires - December 2006 April 2007 [Page 40] 41]
Internet-draft Media Server Markup Language June October 2006
(MSML)
9.11 <monitor>
Monitor is a specialized unidirectional join that copies the media
that is destined for a connection object. One example of the use for
<monitor> may be quality monitoring within a conference.
<region id="5" left="33%" top="67%" size="1/3"/>
<region id="6" left="0" top="67%" size="1/3"/>
</videolayout>
The media
stream may be removed using the <unjoin> element (see section
<unjoin>).
attributes:
id1: an identifier area of the connection to be monitored. Any other
object class results in root window covered by a 440 error. Wildcards MUST NOT be
used.
id2: an identifier of the object which region is to receive the copy
of the media destined to id1. id2 may be a connection or a
conference. Any other object class results in a 440 error.
Wildcards MUST NOT be used.
compressed: "true" or "false". Specifies whether function of the join
should occur before or after compression.
region's position and its size. When "true", id2 must
be a connection using the same media format as id1 or an error
response (450) is generated. Default is "false.
mark: a token which can be used to identify execution progress areas of different regions
overlap, they are layered in the case order of errors. their "priority" attribute. The
region with the highest value of for the mark "priority" attribute from the
last successfully executed MSML element is returned in an error
response. Therefore below
all other regions and will be hidden by overlapping regions. The
region with the lowest non-zero value for the "priority" attribute is
on top of all mark attributes within an
MSML document SHOULD be unique.
9.12 <stream>
Individual streams are specified using the <stream> element. They MAY other regions and will not be included as a child element in any of the stream manipulation
elements <join>, <modifystream>, or <unjoin>. hidden by overlapping
regions. The type of the stream is specified using a "media" priority attribute that
uses may be assigned values corresponding to the top-level MIME media types as
defined in RFC 2046 [21]. This specification only addresses audio between 0 and
video media. Other specifications may define procedures for
additional types.
A bidirectional stream is identified when no direction attribute
"dir" is present.
1. A unidirectional stream is identified when a
direction attribute is present. The "dir" attribute MUST have a value of "from-id1" or "to-id1" depending on the required direction. These
values are relative to the identifier attributes of the parent
element.
Saleem & Sharratt Expires - December 2006 [Page 41]
Internet-draft Media Server Markup Language June 2006
(MSML)
The compressed attribute is used to distinguish zero disables the compressed nature
of region, freeing any resources
associated with the region, and unjoining any video stream when necessary. It is implementation specific what is
used when displayed
in the attribute is region.
Regions that do not present. Joining compressed streams
acts much like an RTP [15] relay.
The properties of the specify a priority will be assigned a priority by
a media streams are specified as the content of
<stream> elements server when a conference is created. The first region within
the <videolayout> element is used as that does not specify a child priority will be
assigned a priority of <join> or
<modifystream>. Stream elements MUST NOT have any content when they
are used as one, the second a child priority of <unjoin> to identify specific streams to
remove.
Some properties are defined within MSML as additional attributes or
child elements of <stream> two, etc. In
this way, all regions that are media type specific. Ones for
audio streams and video streams are defined in the following two sub-
sections. Operators, viewed as properties of the media stream, MAY be
specified as child elements of the <stream> element.
attributes:
media: "audio" or video". Mandatory
dir: "from-id1" or "to-id1".
compressed: "true" or "false". Specifies whether the stream
uses compressed media. Default is implementation specific.
9.12.1 Audio Stream Properties
Audio mixes can be specified to only mix the N-loudest participants.
However there may do not explicitly specify a priority will
be some "preferred" participants underneath all regions that are always
able to contribute. When audio streams are joined to do specify a conference
that uses N-loudest audio mixing, preferred streams need to be
identified.
A preferred audio stream is identified using the "preferred"
attribute. The "preferred" attribute MAY be used for an audio stream priority. As well, within
those regions that is input to do not specify a conference and MUST NOT priority, they will be used for other streams.
Additional attributes of layered
from top to bottom, in the <stream> element for audio streams are:
preferred: a boolean value that defines whether order they appear within the stream does
not contend for N-loudest mixing. A value of "true" means that <videolayout>
element.
For example, if a layout was specified as follows:
<videolayout>
<root size="CIF"/>
<region id="a" ... priority=".3" .../>
<region id="b" ... />
<region id="c" ... priority=".2" ...>
<region id="d" ... />
</videolayout>
Then the stream MUST always regions would be mixed while a value layered, from top to bottom, c,a,b,d.
Portions of "false" means regions that extend beyond the stream MAY contend for mixing into a conference when
N-loudest mixing is enabled. Default "false".
There are two elements that can root window will be used to change the characteristics
of an audio stream as defined below.
Saleem & Sharratt
cropped. For example, a layout specified as:
<videolayout>
<root size="CIF"/>
<region id="foo" left="50%" top="50%" size="2/3"/>
</videolayout>
Saleem, et al. Expires - December 2006 April 2007 [Page 42]
Internet-draft Media Server Markup Language June October 2006
(MSML)
9.12.1.1 <gain>
The <gain> element may be
would appear similar to:
+-----------+
| root |
|background |
| +-----+--
| | |//
| | foo |//
+-----+-----+//
|////////
Visual attributes are used to adjust define aspects of the volume visual appearance
of an audio media
stream. It individual regions. A border may be set to defined together with a specific gain amount, to automatically
adjust title
and/or logo. Text and logos are displayed as images on top of the gain to
region's video, below all regions with a desired target level, or to mute the stream.
Attributes:
id: an optional identifier which may lower priority. The visual
attributes are "title", "titletextcolor", "titlebackgroundcolor",
"bordercolor", "borderwidth", and "logo".
Visual attributes can also be referenced elsewhere defined for sending events to the gain primitive.
amt: a specific gain to apply individual streams (Video
Stream Properties). When visual attributes are specified in dB or the string
"mute" indicating that as part of
both a region and a stream, those associated with the stream should be muted. This
attribute MUST NOT be used if "agc" is present.
agc: boolean indicating whether automatic gain control is to be
used.
take precedence. This attribute MUST NOT be used if "amt" is present.
tgtlvl: the desired target level allows streams that are chosen for AGC specified in dBm0.
This attribute MUST be specified if "agc" is set display
automatically (Stream Selection) to "true".
This attribute MUST NOT be specified if "agc" have proper text and logos
displayed. The region visual attributes are displayed when no stream
is not present.
maxgain: associated with the maximum gain that AGC may apply. Maxgain is
specified region.
Two other attributes associated with a region, "blank" and "freeze",
define the state of the video displayed in dB. This the region. When the blank
or freeze attribute MUST be used if "agc" is
present and assigned the value "true", then the Media
Server MUST NOT be used when "agc" is not present.
9.12.1.2 <clamp>
The <clamp> element is used to filter tones and/or audio-band dtmf
from a media stream.
Attributes of display the <clamp> element are:
dtmf: boolean indicating whether DTMF tones should be removed.
tone: boolean indicating whether other tones should be removed.
9.12.2 Video Stream Properties
Video mixes define a presentation that may have multiple regions,
such region either as a quad-split. Each region displays the video from one blank region, or more
participants. When the video streams
image frozen at the last received frame.
Open Issue: these attributes are joined to such specified for a conference, the region and not
allowed for streams because that will display the video needs appears to be specified as part of the join operation.
The region common use case.
Applying them to streams would allow only that will stream to be affected
within a selector while other streams continue to display normally.
Except for personal mixing scenarios, the video is specified using the
"display" attribute. The "display" attribute MUST same effect can be used for a video
stream that is input achieved
by having the participant mute their own transmission to the media
server.
Attributes associated with each region are:
id: a conference and MUST NOT name that can be used for other
Saleem & Sharratt to refer to the region.
Saleem, et al. Expires - December 2006 April 2007 [Page 43]
Internet-draft Media Server Markup Language June October 2006
(MSML)
streams. The value of
left: the attribute MUST identify a <region> (see
section <region>) or a <selector> (see section <selector>) that is
defined for position of the conference. A stream MUST NOT be directly joined to a region that is defined within a selector. Changing from the value left side of the
"display" attribute can be used to change where in a video
presentation layout a video stream is displayed.
Additional attributes root
window.
top: the position of the <stream> element for video streams are:
display: region from the identifier top of a video layout the root
window.
relativesize: the size of the region or selector expressed as a fraction of
the root window size.
priority: a number between 0 and 1 that is to be used to display define the video stream.
9.12.2.1 <visual>
Some regions of video conferences may display different streams
automatically, such as
precedence when voice activated switching is used.
Connections MAY also be joined directly without the use rendering overlapping regions. A value of video
mixing. In these cases, zero
disables the <visual> element may be used region.
title: text to define
visual display properties for a stream.
The <visual> element MAY use any of be displayed as the visual attributes defined title for
regions (see section <region>). This allows the visual aspects region
titletextcolor: the color of
regions within a <selector> to be tailored to the selected video
stream, or for streams that are directly joined text
titlebackgroundcolor: the color of the text background
bordercolor: the color of the region border
borderwidth: the width of the region border
logo: the URI of an image file to display be displayed
freeze: a name or
logo.
10. MSML Dialog Packages
10.1 Overview
MSML Dialog Packages define an XML [4] language for composing complex
media objects from boolean value, with a vocabulary default of simple media resource objects
called primitives. It is primarily false, that defines
whether the video image should be frozen at the currently
displayed frame
blank: a descriptive or declarative
language to describe media processing objects. MSML dialogs operate
on boolean value, with a single or multiple streams which are identified by the MSML
document outside default of false, that defines
whether the scope region should display black instead of the MSML dialog package.
MSML Dialogs are intended
associated video stream
9.7.3
<selector>
It is often desired that one of several video streams be
automatically selected to be used in different environments. As
such, the language itself does not define how an MSML Dialog is used.
Each environment in which MSML Dialog displayed. The <selector> element is
used must to define how it is
used, the set of services provided selection criteria and its associated parameters.
The selection algorithm is specified by the mechanism "method" attribute.
Currently defined selection methods allow for passing
information between the environment voice activated
switching and MSML Dialog. The specific
mechanisms used to realize iterate sequentially through the interface between MSML Dialog and its
environment set of associated
video streams.
The regions that will display the selected video stream are platform specific.
MSML Dialog packages provide two models for access to media resources
and service creation building blocks. Both models MAY be used in
conjunction placed as
child elements of the <selector> element. Including regions within a
<selector> element does not affect their layout with each other respect to
regions not subject to the selection. For simple video conferences
that display the video directly in a complementary manner. The first
Saleem & Sharratt the root window, the <root>
Saleem, et al. Expires - December 2006 April 2007 [Page 44]
Internet-draft Media Server Markup Language June October 2006
(MSML)
model (referred to as "Media Primitives and Composites", part of the
mandatory MSML Dialog Base package) contains media primitives (such
as digit collection and announcements) and composite functions (such
as play and collect combined
element can be placed as a single operation). The second model
(referred to as "Media Groups", part child of the optional MSML Dialog
Group package) <selector>. Region elements MUST
NOT be used in this case.
For example, below is a common video layout that allows the ability video
stream from the currently active speaker to define complex customized
interactions, via event passing mechanisms, between media primitives,
if required.
MSML Dialog Core Package
Defines core framework over which all MSML dialog packages
operate.
MSML Dialog Base Package
Media Primitives
<dtmf> or <collect>
DTMF digit collection
<play>
Playing of Announcements
<dtmfgen>
Generation be displayed in the large
region ("1") at the top left of DTMF digits
<tonegen>
Tone genration
<record>
Media recording
Media Composites
<collect>
Supports play and collect operation.
Composite function with inclusion of play.
<record>
Supports play and record operation.
Composite function with inclusion of play.
MSML Dialog Group Package
<group>
Allows grouping of media primitives for parallel
execution, with an event exchange mechanism
between the media primitives to achieve
customized media operations. All layout while the above media
primitive elements streams from
five other participants are accepted within displayed in regions located at the
group.
Saleem & Sharratt Expires - December 2006 [Page 45]
Internet-draft Media Server Markup Language June 2006
(MSML)
Following operations MUST
layout periphery.
+-------+---+
| | 2 |
| 1 +---+
| | 3 |
+---+---+---+
| 6 | 5 | 4 |
+---+---+---+
<videolayout type="text/msml-basic-layout">
<root size="CIF"/>
<selector id="switch" method="vas">
<region id="1" left="0" top="0" size="2/3"/>
</selector>
<region id="2" left="67%" top="0" size="1/3"/>
<region id="3" left="67%" top="33%" size="1/3">
<region id="4" left="67%" top="67%" size="1/3"/>
<region id="5" left="33%" top="67%" size="1/3"/>
<region id="6" left="0" top="67%" size="1/3"/>
</videolayout>
All selector methods must be supported using elements described above
using either the MSML Dialog Base Package or MSML Dialog Group
Package.
Announcement only
<play>
Collection only
<dtmf> or <collect>
Recording defined so that they work if only
<record>
Play and Collect
<collect>
<play/>
</collect>
Play and Record
<record>
<play/>
</record>
Additional MSML Dialog packages are:
O MSML Dialog Transform Package
O MSML Dialog Speech Package
O MSML Fax Detection Package
O MSML Fax Send/Receive Package
MSML Dialogs MAY be used to simply expose primitive media resource
objects but will be used a
single region is a child of the selector. Selector methods that
support more often to describe dialog operations and
media transformation objects which can be controlled via user
interaction.
MSML Dialogs do not contain any computation or flow control
constructs. There are no results automatically generated when media
operations complete. Results than one child region MUST be explicitly requested using specify how the method works
across multiple regions. Media server implementations MAY support
only a
<send> single region for methods that are defined to allow multiple
regions.
The selector or <exit> element within region for a participant's video is defined using the definition
"display" attribute of the MSML Dialog.
10.2 Primitives
Primitives perform <stream> during a single function on join operation. Specifying a media
selector allows the stream or multiple
streams such as generating audio/video, recognizing speech or DTMF,
or adjusting to be displayed according to the gain. They criteria
defined by the selector method. Specifying a region supports
continuous presence display of participants. Some streams may be composed so that primitives
execute concurrently. Primitives not composed for concurrent
Saleem & Sharratt
joined with both a selector and a region. In this case, the value of
<blankothers> attribute defines whether the streams associated with a
Saleem, et al. Expires - December 2006 April 2007 [Page 46] 45]
Internet-draft Media Server Markup Language June October 2006
(MSML)
execution MUST simply execute sequentially in the order they occur in
a MSML document. All concurrently executing primitives in
continuous presence region should be blanked when the same
MSML object (defined stream is
selected for display in one MSML document) MAY interact with each
other through events (see MSML Dialog Group package).
Primitives are categorized into one of the following descriptive
categories.
o recognizers have selector regions.
Attributes common to all selector methods are:
id: a media input but no output. They allow
different things within a media stream to name that can be recognized or
detected and for events used to refer to the selector.
method: the name of the method used to select the video stream.
A value of "vas" (see section on Voice Activated Switching) MAY
be generated based upon received
media.
o transformers have one media input and output and may send and
receive events;
o sources and sinks generate specified.
status: specifies whether the selector is "active" or consume media. They
"disabled".
blankothers: when "true", video streams that are also displayed
in continuous presence regions will have either a
media input or the continuous
presence regions blanked when the stream is displayed in a media output but not both. They may receive
and generate events.
o composites combine underlying primitives
selection region.
9.7.3.1
Voice Activate Switching (vas)
Voice activated switching (VAS) is used to provide higher-
level user interaction, without display the need for specific event
based exchange between video stream
that correlates with the primitives. The composite elements
provide participant who is currently speaking. It is
specified using a simpler mechanism for more commonly used services,
such as play and collect or play and record.
Primitives may define different media processing behavior (states)
based upon selector method value of "vas".
If the events which they receive. Primitives which support
different processing states must define their default starting state
and should support video stream associated with the "initial" attribute to allow active speaker is not
currently displayed in a selection region, then it replaces the video
in the region that state is displaying the video of the speaker that was
least recently active. If the video of the active speaker is
currently displayed in a selection region, then there is no change to be
specified when
any region. When VAS is applied to a single region, this has the primitive
effect that the current speaker is displayed in that region.
Attributes associated with voice activated switching are:
si: switching interval is instantiated. All primitives must
support the "terminate" event class.
The following types minimum period of primitives are defined within this
specification:
Recognizers Transformers Source/Sink Composites
------------------------------------------------------
dtmf/collect agc play dtmf/collect
faxtone clamp record record
speech gain dtmfgen
vad gate tonegen
relay faxsend
faxrcv
Primitives have shadow variables, similar time that must
elapse before allowing the video to switch to those within VoiceXML
[7], which are automatically assigned values when the primitives are
used. Upon initialization active
speaker.
speakersees: defines whether the active speaker sees the
"current" speaker (themselves) or the "previous" speaker.
9.8
<join>
<join> is used to create one or more streams between two independent
objects. Streams may be audio or video and may be bidirectional or
unidirectional. A bidirectional stream is implicitly composed of an MSML Dialog context, all shadow
Saleem & Sharratt two
unidirectional streams that can be manipulated independently. The
Saleem, et al. Expires - December 2006 April 2007 [Page 47] 46]
Internet-draft Media Server Markup Language June October 2006
(MSML)
variables have the string value "undefined". Each primitive has its
own instance of shadow variables which are global in scope
streams to the
entire MSML Dialog context.
Names SHOULD be assigned to individual primitives when more than one
primitive of established are specified by <stream> elements (section
<stream>) as the same content of <join>.
Without any content, <join> by default establishes a bidirectional
audio stream. When only a stream of a single type is used within one MSML document. Shadow
variables are overwritten if the primitive has not previously been named and is
instantiated
created between two objects, or when only a second time.
Shadow variables cannot be modified under user control. They may unidirectional stream
exists, <join> can be
returned from used to add a stream of another media type or
make the MSML Dialog context using stream bidirectional by including the <send> element.
10.3 Events
Events provide necessary <stream>
elements. Bidirectional streams are made unidirectional by using
<unjoin> (section <unjoin>) to remove the mechanism unidirectional stream for primitives
the direction that is no longer required.
In addition to interact with each
other defining the media type and for a MSML context direction of streams,
<stream> elements are also used to interact with its external
environment. The external environment is defined by establish the way in which
a MSML context has been invoked. This will often be through MSML but
other languages and protocols properties of
streams, such as SIP may also be used.
Every primitive gain, voice masking, or tone clamping of audio
streams, or labels and group conceptually implements their own event
queue. Events sent to them get placed into their associated queue.
Events other visual characteristics of video streams.
Properties are removed from their queues and processed in order.
Primitives within often defined asymmetrically for a group conceptually have their own thread single direction of
execution. Due to
a stream. Creating a bidirectional stream requires two <stream>
elements within the asynchronous nature of servicing events from
multiple queues, it cannot be assumed that several events sent in
sequence <join>, one for each direction, if one direction
is to have different queues, will be processed in properties from the order in which
they were sent. For example, if recognition of something led to
sending events to both a <play> and other direction.
If a <record> in that order, it is
possible that media server can provide services using both compressed or
uncompressed media, the <record> MSML client may process its event before the <play>.
Primitives each define the set of events need to distinguish within
requests which they support and format is to be used. When compressed streams are
created, both objects must use the
behavior associated with their handling of each event. This allows
many types same media format or an error
response (450) is generated.
attributes:
id1: an identifier of behaviors to either a connection or conference.
Wildcards MUST NOT be defined. For example, VCR type controls
can used. Any other object class results in a
440 error.
id2: an identifier of either a connection or conference.
Wildcards MUST NOT be constructed by defining primitives used. Any other object class results in a
440 error.
mark: a token which support events
corresponding to each control. Media recognition/detection can be used to cause those events to be generated.
Alternatively, events can be originated elsewhere, such as identify execution progress
in the case of errors. The value of the mark attribute from a
Control Agent, and simply received by the primitive implementing
last successfully executed MSML element is returned in an error
response. Therefore the
control. Examples value of all mark attributes within an
MSML document SHOULD be unique.
For example, consider a call center coaching scenario where a
supervisor can listen to the use of events include adjusting volume
(gain) and pause conversation between an agent and resume of both announcement playout a
customer, and record
creation.
Primitives act on events based upon provide hints to the longest match of an event
name. Event names agent, which are not heard by the
customer. One join establishes a period '.' delimited sequence of tokens. The
first token, or stream between the root of agent and the name, can be considered an event
class. Matching allows
customer and another join establishes a standard meaning to be defined stream between the agent and then
Saleem & Sharratt
Saleem, et al. Expires - December 2006 April 2007 [Page 48] 47]
Internet-draft Media Server Markup Language June October 2006
(MSML)
extended based upon what triggers an event's generation. For example,
a record primitive has different behavior depending upon whether it
completed because a user stopped speaking or because it was
cancelled. The recording is retained in
the first case but not the
second.
Longest match allows new recognizers to be created and supervisor. A third join is used without
changing how existing primitives are defined. For example, a face
recognition capability could be created which generates a
terminate.frowning event when a user looks puzzled. Although no
primitive directly defines this event, it will still effect to establish a generic
terminate action. Primitives which require specialized behavior based
upon frowning may be extended half-duplex
stream from the customer to support this. As well, the event can
still be exported supervisor. The media server
automatically bridges the media streams from the MSML context without requiring that
primitives receiving customer and the event understand facial expressions.
10.4 MSML Dialog Usage with SIP
MSML Dialogs MAY be used directly with SIP
supervisor for dialog interactions
(e.g., IVR or fax). It can be initially invoked as part of the
"Prompt agent, and Collect" service described in "Basic Network Media
Services with SIP" [9]. That defines service indicators for a small
number of well defined services using the user part of from the SIP
Request-URI (R-URI).
The prompt customer and collect service uses "dialog" as the service
indicator. URI parameters further refine the specific IVR request.
This document defines an additional parameter "msml-param" agent for the
dialog service indicator as follows:
dialog-parameters = ";" ( dialog-param [ vxml-parameters ] )
| moml-param
dialog-param = "voicexml=" dialog-url
moml-param = "moml=" moml-url
There are no additional URI parameters when MSML is used as the
dialog language.
MSML Dialogs defines discrete IVR dialog commands. These commands MAY
be included directly in the body of the INVITE to the "dialog"
service indicator by using
supervisor.
Assuming the "cid" [12] URL scheme. This scheme
identifies following connections, each with a message body part which in this case single audio stream:
conn:supervisor
conn:agent
conn:customer
The following would contain create the
MSML Dialog request. Note that media flows previously described:
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<join id1="conn:supervisor" id2="conn:agent"/>
<join id1="conn:agent" id2="conn:customer"/>
<join id1="conn:supervisor" id2="conn:customer">
<stream media="audio" dir="to-id1"/>
</join>
</msml>
The following example, shows joining a multipart message body, containing participant to a
single part, MUST be present even if multimedia
conference. It assumes that the INVITE does not contain an
SDP offer. Subsequent MSML Dialog requests are sent conference has a video presentation
region named "topright". The "display" attribute is explained in the body of
SIP INFO messages
section Video Stream Properties.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<join id1="conn:hd83t5hf7g3" id2="conf:example">
<stream media="audio"/>
<stream media="video" dir="from-id1" display="topright"/>
<stream media="video" dir="to-id1"/>
</msml>
9.9
<modifystream>
Media streams can have different properties such as are all messages from the gain for an
audio stream or a media server.
An example of SIP URI visual label for a video stream. These properties
are specified as described above is:
Saleem & Sharratt the content of <stream> elements (section <stream>).
<modifystream> is used to change the properties of a stream by
including one or more <stream> elements that are to have their
properties changed.
Saleem, et al. Expires - December 2006 April 2007 [Page 49] 48]
Internet-draft Media Server Markup Language June October 2006
(MSML)
sip:dialog@mediaserver.example.net;\
moml=cid:14864099865376@appserver.example.net
The body part that contained the MSML Dialog referenced
Stream properties MUST be set as specified by the URL
would have element <stream> as
a Content-Id header of:
Content-Id: <14864099865376@appserver.example.net>
The results of executing an <exit> or <disconnect>, or child element of executing a
<send> which has a "target" attribute value equal to "source", are
notified <modifystream> element. Any properties not
included in SIP INFO messages using the <event> <stream> element from MSML
Core package. No messages are sent if execution completes normally
without executing when modifying a stream MUST remain
unchanged. Setting a property for only one direction of these elements.
If there is an error during validation or execution, then a media
server
bidirectional stream MUST notify the error as described above and must include NOT affect the
namelist items "moml.error.status" and "moml.error.description". other direction. The
values for these items are defined in section 11.
A restricted subset
directionality of MSML Dialogs streams can also be used with the
"Announcement" service defined in [9]. This service uses "annc" as changed using issuing an <unjoin>
followed by a <join>. Any streams that exist between the service indicator and defines parameters two objects
that describe are not included within <modifystream> MUST NOT be affected.
attributes:
id1: an
announcement. The "play=" parameter identifies the URL identifier of either a prompt conference or a provisioned announcement sequence. connection. The value
instance name MUST NOT contain a wildcard if "id2" contains a
wildcard. Mandatory.
id2: an identifier of the "play="
parameter can refer to either a MSML Dialog body part using conference or a "cid" URL as
described above. That body part must only connection. The
instance name MUST NOT contain a wildcard if "id1" contains a
wildcard. Mandatory.
mark: a token which can be used to identify execution progress
in the <play>
primitive.
Using MSML Dialogs enhances case of errors. The value of the announcement service by allowing mark attribute from the
client to specify a sequence
last successfully executed MSML element is returned in an error
response. Therefore the value of audio segments rather than requiring
each sequence all mark attributes within an
MSML document are RECOMMENDED to be provisioned as well as support for video.
Moreover, MSML Dialogs define a standard set unique.
9.10
<unjoin>
Unjoin removes one or more media streams between two objects. In the
absence of variables in contrast
to [9] which defines a parameterization mechanism but does not
formally specify any semantics.
If a content in <stream> element, all media server does not understand streams between
the "cid" scheme or does not
understand MSML Dialogs, it must respond with the SIP response code
"488 - not acceptable here". If the MSML Dialog body contains
elements other than objects MUST be removed. Individual streams may be removed by
specifying them using <stream> elements, while the <play> primitive, or there are errors during
validation, a media server must respond with a SIP response code "400
- bad request". Finally, if there unspecified
streams MUST NOT be removed. A bidirectional stream is changed to a discrepancy between parameters
specified in
unidirectional stream by unjoining the Request-URI and corresponding attributes defined in direction that is no longer
required, using the MSML Dialog body, <unjoin> element. Operator elements MUST NOT be
specified within <stream> elements when streams are being unjoined
using the Request-URI parameters must <unjoin> element. Any specified stream operators MUST be silently
ignored.
MSML Dialogs
<unjoin> and <join> may be used together to move a media stream, such
as from a main conference to a sidebar conference.
attributes:
id1: an identifier of either a conference or a connection. The
instance name MUST NOT change the operation of the announcement
service from that defined in [9]. When the announcement completes, contain a
Saleem & Sharratt wildcard if "id2" contains a
wildcard. Mandatory.
Saleem, et al. Expires - December 2006 April 2007 [Page 50] 49]
Internet-draft Media Server Markup Language June October 2006
(MSML)
media server issues
id2: an identifier of either a SIP BYE request. conference or a connection. The INFO method
instance name MUST NOT contain a wildcard if "id1" contains a
wildcard. Mandatory.
mark: a token which can be used
with to identify execution progress
in the announcement service.
10.5 MSML Dialog Structure and Modularity
MSML is structured as a set case of packages. Only the core and base
packages are required. errors. The Dialog Core package, defines the framework
for MSML requests to a media server, without specific functionality.
It consists value of the "primitive" abstraction, an abstract element for
control flow, the sequential execution model, and the <send> element.
That is, mark attribute from the
last successfully executed MSML Dialog Core package allows for element is returned in an error
response. Therefore the execution of a
sequence value of one or more media processing primitives with the ability
to notify events to the invocation environment.
Primitives are contained all mark attributes within the an
MSML Dialog Base package, which
defines the basic <play>, <record>, <dtmf>, <dtmfgen>, <tonegen> document SHOULD be unique.
The following removes a participant from a conference and
<collect> elements. Another package, the MSML Dialog Transform
package, defines plays a
leave tone for the simple half duplex filters. More advanced
primitives are defined remaining participants in the speech and fax packages. The MSML
speech package depends on the MSML Dialog base package as it extends
the capability of conference.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<unjoin id1="conn:jd73ht89sf489f" id2="conf:1"/>
<dialogstart target="conf:1" type="application/moml+xml">
<play> by adding synthesized speech. Finally, the
group execution model, which
<audio uri="file://leave_tone.wav"/>
</play>
</dialogstart>
</msml>
9.11
<monitor>
Monitor is currently the only element which
changes a specialized unidirectional join that copies the flow of control media
that is defined in destined for a separate MSML Dialog
Group package. All connection object. One example of these packages are optional with the exception
that MSML Dialog Core and MSML Dialog Base packages MUST use for
<monitor> may be
implemented to provide the minimal functionality.
10.6 MSML Dialog Core Package quality monitoring within a conference. The MSML Dialog Core package defines the structural framework and
abstractions for MSML Dialogs(via its schema). It also defines the
basic elements which are not part of the core primitive or control
abstractions. This package is dependent on the MSML Core package.
Events generated by MSML Dialogs, such as prompt completion, digits
collected, or dialog termination, etc, are communicated by the Media
Server via media
stream may be removed using the MSML Core Package <unjoin> element (see MSML Core Package <event>).
MSML Dialogs are executed independently from the MSML core context.
When section
<unjoin>).
attributes:
id1: an MSML Dialog is started, MSML allocates the dialog control
resources, and if successful, starts those resources executing. MSML
core execution then continues without waiting for identifier of the MSML dialog connection to
complete. This forking be monitored. Any other
object class results in a 440 error. Wildcards MUST NOT be
used.
id2: an identifier of MSML dialog invocation from the MSML core
context object which is done via to receive the <dialogstart> element. Media streams are
created between copy
of the MSML dialog target and other internal media
server resources as part of dialog execution. Stream creation is
subject destined to the requirements defined id1. id2 may be a connection or a
conference. Any other object class results in MSML Core package and a 440 error.
Wildcards MUST NOT be used.
compressed: "true" or "false". Specifies whether the join
should occur before or after compression. When "true", id2 must
be a connection using the same media
streams format as defined in MSML Conference Core package.
Saleem & Sharratt id1 or an error
response (450) is generated. Default is "false.
Saleem, et al. Expires - December 2006 April 2007 [Page 51] 50]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.6.1 <dialogstart>
The <dialogstart> element is used to instantiate an MSML media dialog
on connections or conferences. The dialog is specified either inline
or by
mark: a URI [8]. Inline dialogs MUST token which can be composed used to identify execution progress
in the case of any errors. The value of the mark attribute from the
last successfully executed MSML
Dialog packages. element is returned in an error
response. Therefore the value of all mark attributes within an
MSML dialogs document SHOULD be unique.
9.12
<stream>
Individual streams are specified using the <stream> element. They MAY
be defined externally included as VoiceXML
[7]. The MSML dialog description MUST NOT be inline if the src
attribute, containing a URI, is present. child element in any of the stream manipulation
elements <join>, <modifystream>, or <unjoin>.
The originator type of the MSML dialog stream is notified specified using a
"msml.dialog.exit" event when the dialog completes. Any results
returned by "media" attribute that
uses values corresponding to the dialog when it exits are sent top-level MIME media types as
defined in RFC 2046 [21]. This specification only addresses audio and
video media. Other specifications may define procedures for
additional types.
A bidirectional stream is identified when no direction attribute
"dir" is present. A unidirectional stream is identified when a namelist
direction attribute is present. The "dir" attribute MUST have a value
of "from-id1" or "to-id1" depending on the required direction. These
values are relative to the
event. identifier attributes of the parent
element.
The "msml.dialog.exit" event is also compressed attribute is used when dialogs fail due to
errors encountered fetching external documents or errors that occur
within distinguish the dialog execution thread. In this case, a namelist
containing compressed nature
of the items "dialog.exit.status" and
"dialog.exit.description" stream when necessary. It is returned with the event to inform implementation specific what is
used when the
client attribute is not present. Joining compressed streams
acts much like an RTP [15] relay.
The properties of the failure and media streams are specified as the failure reason. The values content of these
items
<stream> elements when the element is used as a child of <join> or
<modifystream>. Stream elements MUST NOT have any content when they
are used as a child of <unjoin> to identify specific streams to
remove.
Some properties are defined within this package MSML as additional attributes or
child elements of <stream> that are media type specific. Ones for
audio streams and video streams are defined in the MSML Core package.
Information from following two sub-
sections. Operators, viewed as properties of the failed dialog may media stream, MAY be returned
specified as additional
namelist items.
attributes:
target: an identifier child elements of a connection the <stream> element.
attributes:
media: "audio" or video". Mandatory
dir: "from-id1" or "to-id1".
Saleem, et al. Expires - April 2007 [Page 51]
Internet-draft Media Server Markup Language October 2006
(MSML)
compressed: "true" or "false". Specifies whether the stream
uses compressed media. Default is implementation specific.
9.12.1
Audio Stream Properties
Audio mixes can be specified to only mix the N-loudest participants.
However there may be some "preferred" participants that are always
able to contribute. When audio streams are joined to a conference which
will interact with
that uses N-loudest audio mixing, preferred streams need to be
identified.
A preferred audio stream is identified using the dialog. "preferred"
attribute. The identifier must not contain
wildcards. Mandatory.
src: the URL of the dialog description. MUST NOT "preferred" attribute MAY be used if the
MSML dialog description is inline. Otherwise for an error (422)
will result and MSML document execution will stop.
type: a MIME type which identifies the type of language used audio stream
that is input to
describe the dialog. application/moml+xml a conference and
application/vxml+xml are MUST NOT be used to identify MSML Dialogs and
VoiceXML [7] respectively.
name: an instance name for other streams.
Additional attributes of the dialog. If <stream> element for audio streams are:
preferred: a boolean value that defines whether the attribute is stream does
not
present, the media server will assign an identifier to the
dialog. If contend for N-loudest mixing. A value of "true" means that
the attribute is present but stream MUST always be mixed while a value of "false" means
that the name stream MAY contend for mixing into a conference when
N-loudest mixing is already
associated with the target, an error (431) will result and MSML
document execution will stop. Any results enabled. Default "false".
There are two elements that a dialog
generates will be correlated to its identifier.
mark: a token which can be used to identify execution progress
in change the case characteristics
of errors. an audio stream as defined below.
9.12.1.1
<gain>
The value of <gain> element may be used to adjust the mark attribute from volume of an audio media
stream. It may be set to a specific gain amount, to automatically
adjust the
Saleem & Sharratt gain to a desired target level, or to mute the stream.
Attributes:
id: an optional identifier which may be referenced elsewhere
for sending events to the gain primitive.
amt: a specific gain to apply specified in dB or the string
"mute" indicating that the stream should be muted. This
attribute MUST NOT be used if "agc" is present.
agc: boolean indicating whether automatic gain control is to be
used. This attribute MUST NOT be used if "amt" is present.
tgtlvl: the desired target level for AGC specified in dBm0.
This attribute MUST be specified if "agc" is set to "true".
This attribute MUST NOT be specified if "agc" is not present.
Saleem, et al. Expires - December 2006 April 2007 [Page 52]
Internet-draft Media Server Markup Language June October 2006
(MSML)
last successfully executed MSML element
maxgain: the maximum gain that AGC may apply. Maxgain is returned
specified in an error
response. Therefore the value of all "mark" attributes within
an MSML document should dB. This attribute MUST be unique.
The following sections show examples of initiating an external MSML
dialog, an in-line embedded MSML dialog, used if "agc" is
present and an MSML initiated
VoiceXML dialog. MUST NOT be used when "agc" is not present.
9.12.1.2
<clamp>
The following example starts a MSML dialog on <clamp> element is used to filter tones and/or audio-band dtmf
from a connection.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234"
type="application/moml+xml"
name="sample"
src="http://server.example.com/scripts/foo.moml"/>
</msml>
The following example starts an in-line embedded MSML dialog on media stream.
Attributes of the <clamp> element are:
dtmf: boolean indicating whether DTMF tones should be removed.
tone: boolean indicating whether other tones should be removed.
9.12.2
Video Stream Properties
Video mixes define a
connection.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234" name="sample">
<play>
<audio uri="file://clip1.wav"/>
<audio uri="http://host1/clip2.wav"/>
<tts uri="http://host2/text.ssml"/>
<var type="date" subtype="mdy" value="20030601"/>
</play>
<send target="source"
event="done"
namelist="play.amt play.end"/>
</dialogstart>
</msml>
The following example starts presentation that may have multiple regions,
such as a VoiceXML dialog on quad-split. Each region displays the video from one or more
participants. When video streams are joined to such a connection.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234"
type="application/vxml+xml"
name="sample"
src="http://server.example.com/scripts/foo.vxml"/>
</msml>
Saleem & Sharratt Expires - December 2006 [Page 53]
Internet-draft Media Server Markup Language June 2006
(MSML)
If this dialog fails once its execution thread had begun, for example conference, the fetch of
region that will display the VoiceXML document failed, an example video needs to be specified as part of
the event
which would be returned would be:
<?xml version="1.0" encoding="UTF-8"?>
<event name="msml.dialog.exit"
id="conn:abcd1234/dialog:sample">
<name>dialog.exit.status</name>
<value>423</value>
<name>dialog.exit.description</name>
<value>External document fetch error</value>
</event>
10.6.2 <dialogend>
Dialog end join operation.
The region that will display the video is specified using the
"display" attribute. The "display" attribute MUST be used for a video
stream that is input to terminate a MSML dialog created through
<dialogstart> before it completes of its own accord. conference and MUST NOT be used for other
streams. The operation value of
<dialogend> depends on the dialog language being used by the
executing context. When attribute MUST identify a <region> (see
section <region>) or a <selector> (see section <selector>) that context is VoiceXML, a
"connection.disconnected" event will
defined for the conference. A stream MUST NOT be thrown directly joined to the VoiceXML
application. When a
region that context is MSML Dialog, defined within a "terminate" event
will be sent to the MSML core context.
<dialogend> allows selector. Changing the executing dialog value of the opportunity
"display" attribute can be used to gracefully
complete before generating change where in a "msml.dialog.exit" event. Dialog results
may be returned and will be contained as video
presentation layout a namelist to that event.
attributes:
id: video stream is displayed.
Additional attributes of the <stream> element for video streams are:
display: the identifier of a dialog. Mandatory.
mark: a token which can video layout region or selector
that is to be used to identify execution progress
in display the case of errors. The value video stream.
9.12.2.1
<visual>
Some regions of the mark attribute from the
last successfully executed MSML Dialog element video conferences may display different streams
automatically, such as when voice activated switching is returned in
an error response. Therefore the value of all "mark" attributes
within an MSML document should used.
Connections MAY also be unique.
For example, if joined directly without the dialog from use of video
mixing. In these cases, the previous example was still
executing, <visual> element may be used to define
visual display properties for a stream.
The <visual> element MAY use any of the following would terminate visual attributes defined for
regions (see section <region>). This allows the dialog and generate a
"msml.dialog.exit" event.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogend id="conn:abcd1234/dialog:sample"/>
</msml>
Saleem & Sharratt visual aspects of
Saleem, et al. Expires - December 2006 April 2007 [Page 54] 53]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.6.3 <send>
Sends an event and optional namelist
regions within a <selector> to be tailored to the recipient identified by
the target attribute. Event names are defined by the recipient. In
the case where the recipient is an MSML Dialog group selected video
stream, or primitive,
the events are defined within this document. Other recipients MAY use
names for streams that are suitable for their environment.
The "target" attribute specifies the recipient of the event.
Recipients MAY be other directly joined to display a name or
logo.
10.
MSML Dialog primitives or groups executing
within the object, the object itself, or the environment which
invoked the Packages
10.1
Overview
MSML Dialog. Sending events Dialog Packages define an XML [4] language for composing complex
media objects from a vocabulary of simple media resource objects
called primitives. It is primarily a descriptive or declarative
language to describe media primitives processing objects. MSML dialogs operate
on a single or groups
is supported multiple streams which are identified by the MSML Dialog Group package. Any target which is
unknown within
document outside the object is assumed scope of the MSML dialog package.
MSML Dialogs are intended to be destined to the external
environment. By convention, the string "source" SHOULD used to
address that environment but any target name distinct from in different environments. As
such, the language itself does not define how an MSML Dialog namespace MAY be is used.
Attributes:
event:
Each environment in which MSML Dialog is used must define how it is
used, the name set of an event.
target: services provided and the recipient of mechanism for passing
information between the event. environment and MSML Dialog. The recipient MUST be a specific
mechanisms used to realize the interface between MSML Dialog primitive, the currently executing group, or the and its
environment are platform specific.
MSML Dialog environment. A primitive is specified by a
primitive type, optionally appended by a period '.' followed by
the identifier of a primitive. Identifiers are only needed when
more than one primitive of the same type exists packages provide two models for access to media resources
and service creation building blocks. Both models MAY be used in the object.
The executing group is specified using the token "group". The
environment is specified using the token "source", optionally
appended by a period '.' followed by any environment specific
target.
namelist: a list of zero or more shadow variables which are
included
conjunction with the event.
10.6.4 <exit>
Exit causes execution each other in a complementary manner. The first
model (referred to as "Media Primitives and Composites", part of the
mandatory MSML Dialog to terminate.
Attributes:
namelist: Base package) contains media primitives (such
as digit collection and announcements) and composite functions (such
as play and collect combined as a list of one or more shadow variables which MAY
optionally be sent single operation). The second model
(referred to as "Media Groups", part of the context which invoked optional MSML Dialog
Group package) allows the ability to define complex customized
interactions, via event passing mechanisms, between media primitives,
if required.
MSML Dialog
object.
Saleem & Sharratt Core Package
Defines core framework over which all MSML dialog packages
operate.
MSML Dialog Base Package
Media Primitives
<dtmf> or <collect>
DTMF digit collection
<play>
Playing of Announcements
<dtmfgen>
Saleem, et al. Expires - December 2006 April 2007 [Page 55] 54]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.6.5 <disconnect>
Disconnect is similar to <exit> but has the additional semantics
Generation of
indicating to the context which invoked the DTMF digits
<tonegen>
Tone genration
<record>
Media recording
Media Composites
<collect>
Supports play and collect operation.
Composite function with inclusion of play.
<record>
Supports play and record operation.
Composite function with inclusion of play.
MSML Dialog, that it
should disconnect from a media server, the Dialog Group Package
<group>
Allows grouping of media stream associated primitives for parallel
execution, with the object. The method of disconnection depends upon how an event exchange mechanism
between the media stream was initially established. If SIP was used, a
<disconnect> would cause a media server primitives to issue a BYE request. The
request would be sent for achieve
customized media operations. All the SIP dialog associated with above media
session on which
primitive elements are accepted within the MSML Dialog was operating.
Attributes:
namelist: a list of one or more shadow variables which MAY
optionally
group.
Following operations MUST be sent to the context which invoked supported using elements described above
using either the MSML Dialog
object.
10.7 MSML Dialog Base Package
The or MSML Dialog Base package defines a required set of base
functionality for Media Server. It support individual media
primitives, such as playing an announcement Group
Package.
Announcement only
<play>
Collection only
<dtmf> or collection digits, as
well as composite operations such as play <collect>
Recording only
<record>
Play and collect. When this
package is used in conjunction with Collect
<collect>
<play/>
</collect>
Play and Record
<record>
<play/>
</record>
Additional MSML Dialog Group package the
event based mechanism is used to control primitives. This package may
also be used in conjunction with packages are:
Saleem, et al. Expires - April 2007 [Page 55]
Internet-draft Media Server Markup Language October 2006
(MSML)
O MSML Dialog Transform Package
O MSML Dialog Speech package Package
O MSML Fax Detection Package
O MSML Fax Send/Receive Package
MSML Dialogs MAY be used to extend the
functionality of prompts simply expose primitive media resource
objects but will be used more often to include TTS describe dialog operations and
media transformation objects which can be controlled via user input collection to
include ASR.
In the following sections, subsections of a primitive define child
elements of that primitive and are not themselves considered
primitives. They
interaction.
MSML Dialogs do not receive events or populate shadow variables.
10.7.1 <play>
Play is used to generate an audio contain any computation or video stream. It flow control
constructs. There are no results automatically generated when media
operations complete. Results MUST play in
sequence be explicitly requested using a
<send> or <exit> element within the media created by definition of the child MSML Dialog.
10.2
Primitives
Primitives perform a single function on a media elements <audio>,
<video>, <media>, <tts>, and <var>. When the play stops, either
because the terminate event is received stream or all media generation has
completed, multiple
streams such as generating audio/video, recognizing speech or DTMF,
or adjusting the <playexit> element, if present, is executed. At least
one media generation element must gain. They may be present.
Play supports two states; generate and suspend. Media generation
occurs composed so that primitives
execute concurrently. Primitives not composed for concurrent
execution MUST simply execute sequentially in the generate state and is suspended order they occur in
a MSML document. All concurrently executing primitives in the suspend state.
Once same
MSML object (defined in one MSML document) MAY interact with each
other through events (see MSML Dialog Group package).
Primitives are categorized into one of the suspend state, following descriptive
categories.
o recognizers have a media generation continues input but no output. They allow
different things within a media stream to be recognized or
detected and for events to be generated based upon receiving
the received
media.
o transformers have one media input and output and may send and
receive events;
o sources and sinks generate event. or consume media. They have either a
media input or a media output but not both. They may receive
and generate events.
o composites combine underlying primitives to provide higher-
level user interaction, without the need for specific event
based exchange between the primitives. The default initial state is generate.
Saleem & Sharratt composite elements
Saleem, et al. Expires - December 2006 April 2007 [Page 56]
Internet-draft Media Server Markup Language June October 2006
(MSML)
Audio MAY be generated in different languages by specifying the
xml:lang attribute for <play> and/or the child elements of <play>.
The language is inherited by the child elements but each child MAY
specify its own language. Except for physical audio clips, it is an
error if
provide a language is specified but the simpler mechanism for more commonly used services,
such as play and collect or play and record.
Primitives may define different media server can not render
the audio in processing behavior (states)
based upon the requested language.
Attributes:
id: an optional identifier which may be referenced elsewhere
for sending events which they receive. Primitives which support
different processing states must define their default starting state
and should support the "initial" attribute to allow that state to be
specified when the play primitive.
interval: specifies the delay between stopping one iteration
and beginning another. The attribute has no effect if
iterations is not also specified. Default primitive is no interval.
iterate: specifies instantiated. All primitives must
support the number "terminate" event class.
The following types of times primitives are defined within this
specification:
Recognizers Transformers Source/Sink Composites
------------------------------------------------------
dtmf/collect agc play dtmf/collect
faxtone clamp record record
speech gain dtmfgen
vad gate tonegen
relay faxsend
faxrcv
Primitives have shadow variables, similar to those within VoiceXML
[7], which are automatically assigned values when the media specified by primitives are
used. Upon initialization of an MSML Dialog context, all shadow
variables have the child media elements should be played. string value "undefined". Each iteration is a
complete play of each primitive has its
own instance of the child media elements shadow variables which are global in document
order. Defaults scope to once '1'.
initial: defines the initial state for the play element.
Default is "generate".
maxtime: defines the maximum allowed time for the <play> to
complete.
barge: defines whether or not audio announcements MAY
entire MSML Dialog context.
Names SHOULD be
interrupted by DTMF detection during play-out. The DTMF digit
barging assigned to individual primitives when more than one
primitive of the announcement same type is stored in the digit buffer. Valid
values for barge used within one MSML document. Shadow
variables are "true" or "false", and overwritten if the attribute primitive has not been named and is
mandatory.
cleardb: defines whether
instantiated a second time.
Shadow variables cannot be modified under user control. They may be
returned from the digit buffer is cleared or not,
prior to starting MSML Dialog context using the announcement. Valid values <send> element.
10.3
Events
Events provide the mechanism for cleardb
are "true" or "false", primitives to interact with each
other and the attribute is mandatory.
offset: defines an offset, measured in units of time, where the
<play> is for a MSML context to begin media generation. Offset interact with its external
environment. The external environment is only valid when
all child media elements are <audio>.
skip: an amount, expressed defined by the way in time, which
a MSML context has been invoked. This will often be used to skip through the media when "forward" MSML but
other languages and "backward" events are
received. Default is 3s (three seconds).
xml:lang: specifies the language to use for content which can protocols such as SIP may also be rendered used.
Every primitive and group conceptually implements their own event
queue. Events sent to them get placed into their associated queue.
Events are removed from their queues and processed in different languages.
Saleem & Sharratt order.
Saleem, et al. Expires - December 2006 April 2007 [Page 57]
Internet-draft Media Server Markup Language June October 2006
(MSML)
Events:
Following describes input events
Primitives within a group conceptually have their own thread of
execution. Due to the media primitive object.
The MSML Dialog Group package allows an event exchange
mechanism between primitives.
pause: causes the play asynchronous nature of servicing events from
multiple queues, it cannot be assumed that several events sent in
sequence to enter different queues, will be processed in the suspend state.
resume: causes play order in which
they were sent. For example, if recognition of something led to enter the generate state.
forward: skips forward through the media. Only has effect when
all child media elements are <audio>.
backward: skips backward through the media. Only has effect
when all child media elements are <audio>.
restart: skips to the beginning of the media. Only has effect
when all child media elements are <audio>.
toggle-state: causes the suspend / generate state
sending events to toggle.
terminate: terminates the play both a <play> and assigns values to a <record> in that order, it is
possible that the shadow
variables.
Shadow Variables:
play.amt: identifies <record> may process its event before the length <play>.
Primitives each define the set of time for events which media was
generated before they support and the play was stopped.
behavior associated with their handling of each event. This does not include
time which may have elapsed while the play was in the suspend
state.
play.end: contains the event allows
many types of behaviors to be defined. For example, VCR type controls
can be constructed by defining primitives which caused the play support events
corresponding to stop.
When each control. Media recognition/detection can be
used to cause those events to be generated.
Alternatively, events can be originated elsewhere, such as from a
Control Agent, and simply received by the play stops because all media generation has completed,
end is assigned primitive implementing the value "play.complete".
Note: Attributes barge and cleardb provide a simplified mechanism for
controlling play operations with implicit DTMF without
control. Examples of the use of
<group> events include adjusting volume
(gain) and event exchange mechanism. When using pause and resume of both announcement playout and record
creation.
Primitives act on events based upon the <play> element
within longest match of an event
name. Event names are a period '.' delimited sequence of tokens. The
first token, or the group framework and barge is specified, detection root of barge
condition generates the name, can be considered an implicit terminate event
class. Matching allows a standard meaning to be defined and then
extended based upon what triggers an event's generation. For example,
a record primitive has different behavior depending upon whether it
completed because a user stopped speaking or because it was
cancelled. The recording is retained in the play
primitive.
Following sections describe first case but not the child elements of <play>.
10.7.1.1 <audio>
Identifies pre-recorded audio to play. Local URI references may
resolve
second.
Longest match allows new recognizers to be created and used without
changing how existing primitives are defined. For example, a single physical audio clip, face
recognition capability could be created which generates a logical clip,
terminate.frowning event when a user looks puzzled. Although no
primitive directly defines this event, it will still effect a generic
terminate action. Primitives which require specialized behavior based
upon frowning may be extended to support this. As well, the event can
still be exported from the MSML context without requiring that
primitives receiving the event understand facial expressions.
10.4
MSML Dialog Usage with SIP
MSML Dialogs MAY be used directly with SIP for dialog interactions
(e.g., IVR or fax). It can be initially invoked as part of the
"Prompt and Collect" service described in "Basic Network Media
Services with SIP" [9]. That defines service indicators for a
Saleem & Sharratt small
Saleem, et al. Expires - December 2006 April 2007 [Page 58]
Internet-draft Media Server Markup Language June October 2006
(MSML)
provisioned sequence
number of clips (physical or logical). A logical clip
is one which can be rendered differently based on the language
attribute. Logical clips are provisioned for each of the languages
that a media server supports. Remote URI references are resolved
according to the capabilities of the remote server.
Attributes:
uri: Identifies well defined services using the location user part of the audio to be played. SIP
Request-URI (R-URI).
The
file prompt and http schemes are supported.
format: defines collect service uses "dialog" as the encoding and file type of service
indicator. URI parameters further refine the audio
resource. The format attribute is defined as a string type of
form "audio/<filetype>;codecs=<codec>". The keyword 'audio'
identifies specific IVR request.
This document defines an audio content. The codecs field identifies the
audio file's codec to be used additional parameter "msml-param" for decoding the audio content.
If format attribute
dialog service indicator as follows:
dialog-parameters = ";" ( dialog-param [ vxml-parameters ] )
| moml-param
dialog-param = "voicexml=" dialog-url
moml-param = "moml=" moml-url
There are no additional URI parameters when MSML is not specified, used as the filetype MUST
dialog language.
MSML Dialogs defines discrete IVR dialog commands. These commands MAY
be
determined from included directly in the URI and body of the codec information MUST be
determined from INVITE to the media resource.
audiosamplerate: Identifies audio sample rate "dialog"
service indicator by using the "cid" [12] URL scheme. This scheme
identifies a message body part which in kHz. If not
specified, this case would contain the sample rate SHOULD
MSML Dialog request. Note that a multipart message body, containing a
single part, MUST be determined from present even if the media
resource.
audiosamplesize: Identifies audio sample size in bits. If INVITE does not
specified, contain an
SDP offer. Subsequent MSML Dialog requests are sent in the sample size SHOULD be determined body of
SIP INFO messages as are all messages from the a media
resource.
iterate: specifies the number server.
An example of times the audio is to be
played. Defaults to once '1'.
xml:lang: specifies SIP URI as described above is:
sip:dialog@mediaserver.example.net;\
moml=cid:14864099865376@appserver.example.net
The body part that contained the language to use when MSML Dialog referenced by the URI identifies URL
would have a logical clip, either directly, Content-Id header of:
Content-Id: <14864099865376@appserver.example.net>
The results of executing an <exit> or <disconnect>, or as part of executing a sequence.
10.7.1.2 <video>
Identifies pre-recorded multimedia to play. Contents identified by
the URI
<send> which has a "target" attribute may contain audio only, video only, or both audio
and video. Media Server SHOULD attempt value equal to play both audio and video
from the identified URI, if both "source", are available
notified in SIP INFO messages using the content.
Attributes:
uri: Identifies the location <event> element from MSML
Core package. No messages are sent if execution completes normally
without executing one of the video these elements.
If there is an error during validation or multimedia to be
played. The file execution, then a media
server MUST notify the error as described above and http schemes are supported.
format: defines must include the encoding
namelist items "moml.error.status" and file type "moml.error.description". The
values for these items are defined in section 12.
A restricted subset of MSML Dialogs can also be used with the video or
multimedia resource. The format attribute is
"Announcement" service defined as a
Saleem & Sharratt in [9]. This service uses "annc" as
Saleem, et al. Expires - December 2006 April 2007 [Page 59]
Internet-draft Media Server Markup Language June October 2006
(MSML)
string type of form
"video/<filetype>;codecs=<codecx>,<codecy>".
the service indicator and defines parameters that describe an
announcement. The keyword
'video' "play=" parameter identifies video only media the URL of a prompt or media containing audio
and video.
a provisioned announcement sequence. The "codecs" field identifies value of the audio and/or video
codecs "play="
parameter can refer to be used for decoding the file content, where a MSML Dialog body part using a "cid" URL as
described above. That body part must only contain the
order of <play>
primitive.
Using MSML Dialogs enhances the codec values is not significant. In announcement service by allowing the event
client to specify a sequence of audio and video content, using 'video' keyword, the
codecs=<codecx>,<codecy> field MAY segments rather than requiring
each sequence to be used provisioned as well as support for video.
Moreover, MSML Dialogs define a standard set of variables in contrast
to identify the
audio codec and the video codec. [9] which defines a parameterization mechanism but does not
formally specify any semantics.
If a media server does not specified, understand the codec
information SHOULD be determined from "cid" scheme or does not
understand MSML Dialogs, it must respond with the media file.
audiosamplerate: Identifies audio sample rate in kHz. If SIP response code
"488 - not
specified, acceptable here". If the sample rate SHOULD be determined from MSML Dialog body contains
elements other than the <play> primitive, or there are errors during
validation, a media
file.
audiosamplesize: Identifies audio sample size server must respond with a SIP response code "400
- bad request". Finally, if there is a discrepancy between parameters
specified in bits. If not
specified, the sample size SHOULD be determined from Request-URI and corresponding attributes defined in
the media
file.
codecconfig: Identifies an optional special instruction string
for codec configuration. Default is to send no special
configuration string to MSML Dialog body, the codec.
profile: Identifies a video profile name specific to Request-URI parameters must be silently
ignored.
MSML Dialogs MUST NOT change the codec.
If not specified, default video profile operation of the codec SHOULD be
selected.
level: Identifies announcement
service from that defined in [9]. When the announcement completes, a video profile level to
media server issues a SIP BYE request. The INFO method MUST NOT used
with the codec. Default announcement service.
10.5
MSML Dialog Structure and Modularity
MSML is to send no profile information to structured as a set of packages. Only the codec core and allow base
packages are required. The Dialog Core package, defines the
codec framework
for MSML requests to select a media server, without specific functionality.
It consists of the "primitive" abstraction, an internal default.
imagewidth: Identifies abstract element for
control flow, the width of video image in pixels.
Default is to use image width information from media file.
imageheight: Identifies sequential execution model, and the height of video image in pixels.
Default is to use image height information from media file.
maxbitrate: Identifies <send> element.
That is, the bitrate MSML Dialog Core package allows for the execution of a
sequence of one or more media processing primitives with the video signal in kbps.
Default is ability
to notify events to use maximum bitrate information from the media
file.
framerate: Identifies the video frame rate invocation environment.
Primitives are contained within the MSML Dialog Base package, which
defines the basic <play>, <record>, <dtmf>, <dtmfgen>, <tonegen> and
<collect> elements. Another package, the MSML Dialog Transform
package, defines the simple half duplex filters. More advanced
primitives are defined in frames per
second. Default is to use frame rate information from the media
file.
iterate: specifies speech and fax packages. The MSML
speech package depends on the number MSML Dialog base package as it extends
the capability of times <play> by adding synthesized speech. Finally, the audio
group execution model, which is to be
played. Defaults to once '1'.
Saleem & Sharratt currently the only element which
Saleem, et al. Expires - December 2006 April 2007 [Page 60]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.7.1.3 <media>
Identifies multimedia content for play. All content
changes the flow of <media>
element MUST start to play concurrently. This element may be used to
generate control is defined in a multi-media stream from two independent media resources,
one identifying audio and the other identifying video.
The <media> element MUST contain at least one child element. Valid
child elements separate MSML Dialog
Group package. All of <media> these packages are <audio> and <video>, as described
earlier. <media> element MUST contain at most one <audio> element or
at most one <video> element.
10.7.1.4 <var>
Specifies optional with the generation of audio from a variable using prerecorded
audio segments. A variable represents a semantic concept (such as
date or number) exception
that MSML Dialog Core and dynamically produces the appropriate speech.
Prerecorded audio allows an application vendor or service provider MSML Dialog Base packages MUST be
implemented to
choose provide the exact voice for their audio and therefore completely
control minimal functionality.
10.6
MSML Dialog Core Package
The MSML Dialog Core package defines the "sound structural framework and feel" of the service provided to end users.
abstractions for MSML Dialogs(via its schema). It
provides very high audio quality and allows the variables to blend
seamlessly into the surrounding audio segments.
Text to speech (TTS) using SSML [27] may also be used to render
variables, but may defines the
basic elements which are not provide as good quality, or allow as complete
control part of the "sound and feel" core primitive or user experience. TTS control
abstractions. This package is normally
used for reading text such as emails and for very large vocabularies dependent on the MSML Core package.
Events generated by MSML Dialogs, such as stock names. TTS results in a very clear difference between prompt completion, digits
collected, or dialog termination, etc, are communicated by the variables and Media
Server via the surrounding audio segments. (See MSML Dialog
Speech package).
Attributes:
type: specifies Core Package (see MSML Core Package <event>).
MSML Dialogs are executed independently from the type of variable. Mandatory. Variable type
must be one of "date", "digits", "duration", "month", "money",
"number", "silence", "time", or "weekday".
subtype: specifies MSML core context.
When an optional clarification of type. Specific
values depend upon the type.
value: text which should be rendered appropriate to MSML Dialog is started, MSML allocates the type dialog control
resources, and subtype attributes.
xml:lang: specifies if successful, starts those resources executing. MSML
core execution then continues without waiting for the language MSML dialog to use when rendering
complete. This forking of MSML dialog invocation from the
variable.
Saleem & Sharratt Expires - December 2006 [Page 61]
Internet-draft MSML core
context is done via the <dialogstart> element. Media Server Markup Language June 2006
(MSML)
10.7.1.5 <playexit> streams are
created between the MSML dialog target and other internal media
server resources as part of dialog execution. Stream creation is
subject to the requirements defined in MSML Core package and media
streams as defined in MSML Conference Core package.
10.6.1
<dialogstart>
The <playexit> <dialogstart> element is used to instantiate an MSML media dialog
on connections or conferences. The dialog is specified either inline
or by a URI [8]. Inline dialogs MUST be invoked when generation composed of all content any of the <play> has come to completion. The contents of this element MSML
Dialog packages. MSML dialogs MAY be used to send events.
Attributes:
none
10.7.2 <dtmfgen>
DTMF generator originates one or more DTMF digits in sequence.
Attributes:
id: an optional identifier which may defined externally as VoiceXML
[7]. The MSML dialog description MUST NOT be referenced elsewhere
for sending events to inline if the dtmfgen primitive.
digits: A string src
attribute, containing a URI, is present.
The originator of characters from the alphabet "0-9a-d#*"
which correspond to MSML dialog is notified using a sequence of DTMF tones. Mandatory.
level: used to define
"msml.dialog.exit" event when the power level for which dialog completes. Any results
returned by the tones will
be generated. Expressed in dBm0 in dialog when it exits are sent as a range of 0 namelist to -96 dBm0.
Larger negative values express lower power levels. Note that
values lower than -55 dBm0 will be rejected by most receivers
(TR-TSY-000181, ITU-T Q.24A). Default is -6 dBm0.
dur: the duration in milliseconds for which each tone should be
generated. Implementations may round the value if they only
support discrete durations. Default 100 ms.
interval:
event.
The "msml.dialog.exit" event is also used when dialogs fail due to
errors encountered fetching external documents or errors that occur
within the duration in milliseconds of dialog execution thread. In this case, a silence interval
following each generated tone. Implementations may round namelist
containing the
value if they only support discrete durations. Default 100 ms.
Events:
terminate: terminates DTMF generation items "dialog.exit.status" and assigns values to the
shadow variables.
Shadow Variables:
dtmfgen.end: contains
"dialog.exit.description" is returned with the event which caused DTMF generation to
stop.
The following sections describe inform the child elements
client of <dtmfgen>.
Saleem & Sharratt the failure and the failure reason. The values of these
items are defined within this package and the MSML Core package.
Saleem, et al. Expires - December 2006 April 2007 [Page 62] 61]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.7.2.1 <dtmfgenexit>
Information from the failed dialog may be returned as additional
namelist items.
attributes:
target: an identifier of a connection or a conference which
will interact with the dialog. The <dtmfgenexit> element identifier must not contain
wildcards. Mandatory.
src: the URL of the dialog description. MUST NOT be invoked when used if the DTMF generation
operation completes or
MSML dialog description is terminated as a inline. Otherwise an error (422)
will result of receiving and MSML document execution will stop.
type: a MIME type which identifies the
terminate event. The <dtmfgenexit> element MAY be type of language used to send events
when
describe the DTMF generation has completed.
Attributes:
None
10.7.3 <tonegen>
Tone generator allows customized tone generation. A sequence of
varying tones dialog. application/moml+xml and
application/vxml+xml are used to identify MSML Dialogs and
VoiceXML [7] respectively.
name: an instance name for the dialog. If the attribute is not
present, the media server will assign an identifier to the
dialog. If the attribute is present but the name is already
associated with optional silence intervals can be composed using the <tonegen> element. Child elements of <tonegen>, namely <tone> target, an error (431) will result and
<silence> specify MSML
document execution will stop. Any results that a single tone or sequence of tones.
Attributes:
id: an optional identifier dialog
generates will be correlated to its identifier.
mark: a token which may can be referenced elsewhere
for sending events used to identify execution progress
in the tonegen primitive.
iterate: A numeric value specifying the total number case of
iterations. A errors. The value of 'forever' represents infinite
repetitions. Optional. Default 1.
Events:
terminate: terminates tone generation and assigns values to the
shadow variables.
Shadow Variables:
tonegen.end: contains mark attribute from the event which caused tone generation to
stop.
last successfully executed MSML element is returned in an error
response. Therefore the value of all "mark" attributes within
an MSML document should be unique.
The following sections describe the child elements show examples of <tonegen>.
10.7.3.1 <tone> initiating an external MSML
dialog, an in-line embedded MSML dialog, and an MSML initiated
VoiceXML dialog.
The <tone> element specifies following example starts a single tone with an optional silence
interval. MSML dialog on a connection.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234"
type="application/moml+xml"
name="sample"
src="http://server.example.com/scripts/foo.moml"/>
</msml>
The tone specification consists of two tone frequencies,
their attenuation values, following example starts an in-line embedded MSML dialog on a duration of the tone, and the number of
times to repeat the tone.
Attributes:
Saleem & Sharratt
connection.
Saleem, et al. Expires - December 2006 April 2007 [Page 63] 62]
Internet-draft Media Server Markup Language June October 2006
(MSML)
duration: time duration or length of
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234" name="sample">
<play>
<audio uri="file://clip1.wav"/>
<audio uri="http://host1/clip2.wav"/>
<tts uri="http://host2/text.ssml"/>
<var type="date" subtype="mdy" value="20030601"/>
</play>
<send target="source"
event="done"
namelist="play.amt play.end"/>
</dialogstart>
</msml>
The following example starts a VoiceXML dialog on a connection.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogstart target="conn:abcd1234"
type="application/vxml+xml"
name="sample"
src="http://server.example.com/scripts/foo.vxml"/>
</msml>
If this dialog fails once its execution thread had begun, for example
the individual tone,
specified in "ms" or "s" in increments of 10ms. A value fetch of 0
represents an infinite duration. Mandatory.
iterate: specifies the number VoiceXML document failed, an example of times to execute the contents
of <tone> element. A value event
which would be returned would be:
<?xml version="1.0" encoding="UTF-8"?>
<event name="msml.dialog.exit"
id="conn:abcd1234/dialog:sample">
<name>dialog.exit.status</name>
<value>423</value>
<name>dialog.exit.description</name>
<value>External document fetch error</value>
</event>
10.6.2
<dialogend>
Dialog end is used to terminate a MSML dialog created through
<dialogstart> before it completes of 'forever' represents infinite
repetitions. Optional. Default 1.
Events:
none
Child Elements: its own accord. The child elements operation of <tone> element specify
<dialogend> depends on the dialog language being used by the
executing context. When that context is VoiceXML, a single tone and
an optional silence interval to
"connection.disconnected" event will be inserted at thrown to the end of tone
generation. A tone VoiceXML
Saleem, et al. Expires - April 2007 [Page 63]
Internet-draft Media Server Markup Language October 2006
(MSML)
application. When that context is defined by <tone1> MSML Dialog, a "terminate" event
will be sent to the MSML core context.
<dialogend> allows the executing dialog the opportunity to gracefully
complete before generating a "msml.dialog.exit" event. Dialog results
may be returned and <tone2> elements.
Each <tone> element MUST contain at least one will be contained as a namelist to that event.
attributes:
id: the identifier of <tone1> or
<tone2>, or MAY contain <tone1> and <tone2> exactly once.
<tone1>
Attributes:
freq: specifies a dialog. Mandatory.
mark: a token which can be used to identify execution progress
in the frequency case of errors. The value of the first tone in "hz",
ranging mark attribute from 0 - 3999 hz. Mandatory.
atten: specifies the attenuation level expressed
last successfully executed MSML Dialog element is returned in dBm0,
ranging from 0 to -96 dBm0. Mandatory.
<tone2>
Attributes:
freq: specifies
an error response. Therefore the frequency value of all "mark" attributes
within an MSML document should be unique.
For example, if the second tone in "hz",
ranging dialog from 0 - 3999 hz. Mandatory.
atten: specifies the attenuation level expressed in dBm0,
ranging from 0 to -96 dBm0. Mandatory.
<silence> - Refer to previous example was still
executing, the silence element definition below.
10.7.3.2 <silence>
The <silence> element inserts following would terminate the dialog and generate a silence interval as
"msml.dialog.exit" event.
<?xml version="1.0" encoding="UTF-8"?>
<msml version="1.1">
<dialogend id="conn:abcd1234/dialog:sample"/>
</msml>
10.6.3
<send>
Sends an event and optional content namelist to the recipient identified by
the target attribute. Event names are defined by the recipient. In
the case where the recipient is an MSML Dialog group or primitive,
the events are defined within this document. Other recipients MAY use
names that are suitable for their environment.
The "target" attribute specifies the recipient of <tonegen> the event.
Recipients MAY be other MSML Dialog primitives or <tone> elements. groups executing
within the object, the object itself, or the environment which
invoked the MSML Dialog. Sending events to media primitives or groups
is supported by the MSML Dialog Group package. Any target which is
unknown within the object is assumed to be destined to the external
environment. By convention, the string "source" SHOULD used to
address that environment but any target name distinct from the MSML
Dialog namespace MAY be used.
Attributes:
Saleem & Sharratt
Saleem, et al. Expires - December 2006 April 2007 [Page 64]
Internet-draft Media Server Markup Language June October 2006
(MSML)
duration: specifies
event: the amount name of silence interval in "ms" or
"s", in increments an event.
target: the recipient of 10ms. Mandatory.
Events:
none
10.7.3.3 <tonegenexit> the event. The <tonegenexit> element recipient MUST be invoked when a
MSML Dialog primitive, the tone generation
operation completes currently executing group, or the
MSML Dialog environment. A primitive is terminated as specified by a result of receiving the
terminate event. The <tonegenexit> element MAY be used to send events
when
primitive type, optionally appended by a period '.' followed by
the tone generation has completed.
Attributes:
none
10.7.4 <record>
Record creates identifier of a recording. Similar to play, <record> supports two
states; create and suspend. Received media becomes part primitive. Identifiers are only needed when
more than one primitive of the
recording when <record> is same type exists in the create state and object.
The executing group is discarded when
it specified using the token "group". The
environment is in specified using the suspend state.
Recording MUST be terminated when token "source", optionally
appended by a terminate event is received period '.' followed by any environment specific
target.
namelist: a list of zero or
when more shadow variables which are
included with the event.
10.6.4
<exit>
Exit causes execution of the MSML Dialog to terminate.
Attributes:
namelist: a nospeech event is received and no audio has yet been recorded.
<record> differentiates different types list of terminate events.
An optional <play> element one or more shadow variables which MAY
optionally be specified as a child element sent to the context which invoked the MSML Dialog
object.
10.6.5
<disconnect>
Disconnect is similar to <exit> but has the additional semantics of
<record>. This mechanism provides
indicating to the context which invoked the MSML Dialog, that it
should disconnect from a complete play-record operation,
where media server, the prompt(s) specified within media stream associated
with the <play> element are played in
advance of start object. The method of recording.
Note: Attributes prespeech, postspeech, and termkey provide disconnection depends upon how the
media stream was initially established. If SIP was used, a
simplified mechanism
<disconnect> would cause a media server to issue a BYE request. The
request would be sent for controlling record operations using implicit
DTMF and VAD, without the use of <group> and event exchange
mechanism. SIP dialog associated with media
session on which the MSML Dialog was operating.
Attributes:
id: an optional identifier
namelist: a list of one or more shadow variables which may MAY
optionally be referenced elsewhere
for sending events sent to the record primitive.
append: a boolean context which defines whether invoked the recording is
allowed to be appended to an existing file if dest already
exists. Default is "false". MSML Dialog
object.
10.7
MSML Dialog Base Package
The attribute is ignored if the
scheme is http.
Saleem & Sharratt MSML Dialog Base package defines a required set of base
functionality for Media Server. It support individual media
primitives, such as playing an announcement or collection digits, as
Saleem, et al. Expires - December 2006 April 2007 [Page 65]
Internet-draft Media Server Markup Language June October 2006
(MSML)
dest: the destination for the recording, which will contain
either audio only, video only, or both audio
well as composite operations such as play and video
depending on collect. When this
package is used in conjunction with MSML Dialog Group package the stream(s) being recorded. Recording MAY be
either local or external
event based upon the attribute value. File
and http schemes are supported.
audiodest: the destination for the audio only recording.
Recording MAY mechanism is used to control primitives. This package may
also be either local or external based upon used in conjunction with MSML Speech package to extend the
attribute value. All combinations
functionality of dest, audiodest, prompts to include TTS and
videodest are valid. File user input collection to
include ASR.
In the following sections, subsections of a primitive define child
elements of that primitive and http schemes are supported.
videodest: not themselves considered
primitives. They do not receive events or populate shadow variables.
10.7.1
<play>
Play is used to generate an audio or video stream. It MUST play in
sequence the destination for media created by the video only recording.
Recording MAY be child media elements <audio>,
<video>, <media>, <tts>, and <var>. When the play stops, either local or external based upon
because the
attribute value. All combinations of dest, audiodest, and
videodest are valid. File and http schemes are supported.
format: defines terminate event is received or all media generation has
completed, the encoding <playexit> element, if present, is executed. At least
one media generation element must be present.
Play supports two states; generate and file type of suspend. Media generation
occurs in the recording.
The format attribute generate state and is defined as a string type of form
"audio|video/filetype;codecs=x,y". The keyword 'audio'
identifies an audio only recording, while suspended in the keyword 'video'
identifies video only recording or an audio plus video
recording. The codecs field identifies suspend state.
Once in the audio and/or video
codecs to suspend state, media generation continues upon receiving
the generate event. The default initial state is generate.
Audio MAY be used for generated in different languages by specifying the recording, where
xml:lang attribute for <play> and/or the order child elements of the
codec values <play>.
The language is not significant. In inherited by the event of child elements but each child MAY
specify its own language. Except for physical audio and
video recording, using 'video' keyword, clips, it is an
error if a language is specified but the codecs=x,y field
MAY be used to identify media server can not render
the audio codec and in the video codec.
codecconfig: Identifies requested language.
Attributes:
id: an optional special instruction string identifier which may be referenced elsewhere
for codec configuration. Default is to send no special
configuration string sending events to the codec.
audiosamplerate: Identifies audio sample rate in kHz. If play primitive.
interval: specifies the delay between stopping one iteration
and beginning another. The attribute has no effect if
iterations is not
specified, also specified. Default is no interval.
iterate: specifies the sample rate SHOULD be determined from number of times the media
source.
audiosamplesize: Identifies audio sample size in bits. If not
specified, specified by
the sample size SHOULD child media elements should be determined from played. Each iteration is a
complete play of each of the child media
source.
profile: Identifies a video profile name specific elements in document
order. Defaults to once '1'.
initial: defines the codec.
If not specified, default video profile of the codec SHOULD be
selected initial state for the recording.
level: Identifies a video profile level to the codec. play element.
Default is to send no profile information to the codec and allow the
codec to select an internal default.
Saleem & Sharratt "generate".
Saleem, et al. Expires - December 2006 April 2007 [Page 66]
Internet-draft Media Server Markup Language June October 2006
(MSML)
imagewidth: Identifies
maxtime: defines the width of video image in pixels.
Default is to use image width information from maximum allowed time for the media
source.
imageheight: Identifies <play> to
complete.
barge: defines whether or not audio announcements MAY be
interrupted by DTMF detection during play-out. The DTMF digit
barging the height of video image announcement is stored in pixels.
Default the digit buffer. Valid
values for barge are "true" or "false", and the attribute is
mandatory.
cleardb: defines whether the digit buffer is cleared or not,
prior to use image height information from starting the media
source.
maxbitrate: Identifies announcement. Valid values for cleardb
are "true" or "false", and the bitrate attribute is mandatory.
offset: defines an offset, measured in units of time, where the video signal in kbps.
Default
<play> is to use maximum bitrate information from the begin media
source.
framerate: Identifies the video frame rate in frames per
second. Default is to use frame rate information from the media
source.
initial: defines the initial state for the record element.
Default is "create", which starts the recording as soon as the
<record> element is executed. The "initial" attribute generation. Offset is
applicable only valid when <record> is used within the <group>
structure.
maxtime: defines the maximum length of the recording in units
of time.
prespeech: defines a timer value, in seconds, for detection of
absence of audio energy at the start of the record operation.
If no audio energy is detection for the amount of time
specified by prespeech, the recording is terminated. Default is
"0s", which does not activate the prespeech timer.
postspeech: defines a timer value, in seconds, for detection of
absence of audio energy while the recoding is in progress.
During
all child media elements are <audio>.
skip: an amount, expressed in progress recording, if absence of audio energy is
detected as specified by the postspeech timer, time, which will be used to skip
through the recording is
terminated. media when "forward" and "backward" events are
received. Default is "0s", which disables 3s (three seconds).
xml:lang: specifies the ability to
terminate a recording due language to postspeech silence.
termkey: defines a single DTMF key use for content which when detection
terminates the recording. Absence of this attribute prevents
the recording from being terminated due to detection of DTMF
digits. When termkey is specified, the detected DTMF digit
terminates the recording and the DTMF digit is not entered can
be rendered in
the digit buffer. different languages.
Events:
Saleem & Sharratt Expires - December 2006 [Page 67]
Internet-draft Media Server Markup Language June 2006
(MSML)
Following describes input events to the media primitive object.
The MSML Dialog Group package allows an event exchange
mechanism between primitives.
pause: causes the record play to enter the suspend state. Received
media is discarded.
resume: causes record play to resume if it was suspended. It enter the generate state.
forward: skips forward through the media. Only has no effect otherwise. when
all child media elements are <audio>.
backward: skips backward through the media. Only has effect
when all child media elements are <audio>.
restart: skips to the beginning of the media. Only has effect
when all child media elements are <audio>.
toggle-state: causes the suspend / create generate state to toggle.
terminate: terminates the recording and assigns values to the
shadow variables.
terminate.cancelled: terminates the recording and assigns
values to the shadow variables. If the dest attribute used the
file scheme, the local recording is deleted. Applications are
responsible for removing external files created using the http
scheme.
terminate.finalsilence: terminates the recording play and assigns values to the shadow
variables. If the dest attribute used the
file scheme, the final silence is removed from the recording.
nospeech: terminates the recording and assigns values to the
shadow variables if it is received and no recording has yet
been created. The "nospeech" event is ignored if audio has
already been recorded.
Saleem, et al. Expires - April 2007 [Page 67]
Internet-draft Media Server Markup Language October 2006
(MSML)
Shadow Variables:
record.len:
play.amt: identifies the actual length of time for which media was
generated before the recording measured in
units of time. play was stopped. This does not include
time which may have elapsed while the record play was in the suspend
state.
record.end:
play.end: contains the event which caused the record play to
terminate. stop.
When the record terminates play stops because maxtime is
exceeded, all media generation has completed,
end is assigned the value
"record.complete.maxlength".
Record termination due to prespeech silence, results in
assigned value of "record.failed.prespeech"
Record termination due to postspeech silence, results in
assigned value of "record.complete.postspeech"
Saleem & Sharratt Expires - December 2006 [Page 68]
Internet-draft Media Server Markup Language June 2006
(MSML)
Record termination due to "play.complete".
Note: Attributes barge and cleardb provide a simplified mechanism for
controlling play operations with implicit DTMF detection, results in assigned
value of "record.complete.termkey"
The following sections describe without the child elements use of <record>.
10.7.4.1 <play>
The optional
<group> and event exchange mechanism. When using the <play> element as a child element of <record> allows a
prompt to be played prior to start of recording. The record operation
starts at
within the end group framework and barge is specified, detection of barge
condition generates an implicit terminate event to the play sequence or if
primitive.
Following sections describe the play is barged by
DTMF, assuming that barge=true is specified for child elements of <play>. For a
complete description, refer
10.7.1.1
<audio>
Identifies pre-recorded audio to play. Local URI references may
resolve to <play> element.
10.7.4.2 <tonegen>
The optional <tonegen> element as a child element of <record> allows single physical audio clip, a tone logical clip, or a
provisioned sequence of tones to clips (physical or logical). A logical clip
is one which can be played prior rendered differently based on the language
attribute. Logical clips are provisioned for each of the languages
that a media server supports. Remote URI references are resolved
according to start the capabilities of recording.
The record operation starts at the end remote server.
Attributes:
uri: Identifies the location of the tone generation. For a
complete description, refer audio to <tonegen> element.
10.7.4.3 <recordexit>
The <recordexit> element MUST be invoked when played. The
file and http schemes are supported.
format: defines the record operation
completes or when encoding and file type of the recording audio
resource. The format attribute is terminated defined as a result string type of
receiving the terminate event.
form "audio/<filetype>;codecs=<codec>". The <recordexit> element MAY be used
to send events when the recording has completed.
Attributes:
none
10.7.5 <dtmf> or <collect>
DTMF input fulfils several roles within MSML Dialogs. It is used to
trigger events which will affect keyword 'audio'
identifies an audio content. The codecs field identifies the media processing operation of
other primitives. It is also used to collect DTMF digits from a media
stream which are
audio file's codec to be reported back to the user of MSML Dialog.
Often DTMF detection is used for both purposes. Barge is decoding the most
common example, where a prompt audio content.
If format attribute is stopped based upon DTMF input but
more digits may remain to not specified, the filetype MUST be collected.
DTMF detection supports multiple simultaneous recognition patterns.
Different patterns can
determined from the URI and the codec information MUST be used to trigger sending different events
determined from the media resource.
audiosamplerate: Identifies audio sample rate in
order to implement DTMF controls. Alternatively one pattern may kHz. If not
specified, the sample rate SHOULD be
used to represent a collection and another pattern, a substring of determined from the first, used as a barge indication.
Saleem & Sharratt media
resource.
Saleem, et al. Expires - December 2006 April 2007 [Page 69] 68]
Internet-draft Media Server Markup Language June October 2006
(MSML)
An optional <play> element MAY
audiosamplesize: Identifies audio sample size in bits. If not
specified, the sample size SHOULD be specified as a child element of
<dtmf> or <collect>. This mechanism provides a complete play-collect
operation, where determined from the prompt(s) specified within media
resource.
iterate: specifies the <play> element
are played in advance number of DTMF digit collection.
Note that all patterns share times the same digit collection buffer, inter-
digit timing, audio is to be
played. Defaults to once '1'.
xml:lang: specifies the language to use when the URI identifies
a single <nomatch> element, and logical clip, either directly, or as part of a single <noinput>
element. As such, multiple patterns sequence.
10.7.1.2
<video>
Identifies pre-recorded multimedia to play. Contents identified by
the URI attribute may not be suitable contain audio only, video only, or both audio
and video. Media Server SHOULD attempt to support
simultaneous collections for different purposes. When this is
required, separate <dtmf> elements should be used instead.
<dtmf> terminates play both audio and video
from the identified URI, if any both are available in the content.
Attributes:
uri: Identifies the location of the <pattern>, <noinput>, video or <nomatch>
elements multimedia to be
played. The file and http schemes are matched supported.
format: defines the maximum number encoding and file type of times that they are
allowed. the video or
multimedia resource. The number of times they may match may be specified as an format attribute is defined as a
string type of <dtmf> form
"video/<filetype>;codecs=<codecx>,<codecy>". The keyword
'video' identifies video only media or of media containing audio
and video. The "codecs" field identifies the individual child elements.
Element identifier <dtmf> is equivalent audio and/or video
codecs to <collect>. However,
<collect> is the preferred name. MSML clients SHOULD use <collect>,
while MSML servers SHOULD support both.
Attributes:
id: an optional identifier which may be referenced elsewhere used for sending events to this primitive.
cleardb: a boolean indication of whether decoding the buffer for digit
collection should be cleared file content, where the
order of any collected digits when the
element codec values is instantiated. If set to false, any digits currently
in not significant. In the buffer MUST event of
audio and video content, using 'video' keyword, the
codecs=<codecx>,<codecy> field MAY be immediately compared against used to identify the pattern
elements.
fdt: defines
audio codec and the first-digit timer value. The first-digit timer
is started when DTMF detection is initially invoked. video codec. If no DTMF
digits are detected during this initial interval, not specified, the <noinput>
element MUST codec
information SHOULD be invoked.
idt: defines determined from the inter-digit timer to be used when digits are
being collected. When media file.
audiosamplerate: Identifies audio sample rate in kHz. If not
specified, the timers is started when sample rate SHOULD be determined from the
first digit is detected and restarted on each subsequent digit.
Timer expiration is applied to all patterns. After that, if any
patterns remain active and a nomatch element is media
file.
audiosamplesize: Identifies audio sample size in bits. If not
specified, the
nomatch is executed and DTMF input MUST terminate. The idt
attribute should only sample size SHOULD be used when digit collection is being
performed. No default.
starttimer: boolean value which defines whether determined from the first digit
timer (fdt) media
file.
codecconfig: Identifies an optional special instruction string
for codec configuration. Default is started initially. When set to false, send no special
configuration string to the
Saleem & Sharratt codec.
Saleem, et al. Expires - December 2006 April 2007 [Page 70] 69]
Internet-draft Media Server Markup Language June October 2006
(MSML)
starttimer event must be received for it
profile: Identifies a video profile name specific to start. Default
false.
iterate: specifies the number codec.
If not specified, default video profile of times the <pattern>,
<noinput>, and <nomatch> elements may be executed unless those
elements specify differently. The value "forever" MAY codec SHOULD be used
selected.
level: Identifies a video profile level to indicate that these may be executed any number of times. the codec. Default
is once '1'.
Events:
Following describes input events to the media primitive object.
The MSML Dialog Group package allows an event exchange
mechanism between primitives.
starttimer: starts the first digit timer (fdt) if it has not
already been started. Has send no effect otherwise.
terminate: terminates profile information to the DTMF input codec and assigns values to allow the
shadow variables.
Shadow Variables:
dtmf.digits:
codec to select an internal default.
imagewidth: Identifies the string of DTMF digits which have been received
(the contents width of video image in pixels.
Default is to use image width information from media file.
imageheight: Identifies the digit buffer).
dtmf.len: height of video image in pixels.
Default is to use image height information from media file.
maxbitrate: Identifies the number bitrate of digits the video signal in kbps.
Default is to use maximum bitrate information from the digit buffer.
dtmf.last: media
file.
framerate: Identifies the last digit video frame rate in frames per
second. Default is to use frame rate information from the digit buffer.
dtmf.end: contains media
file.
iterate: specifies the event which caused number of times the <dtmf> to
terminate or audio is assigned one of "dtmf.match", "dtmf.noinput",
or "dtmf.nomatch" depending upon which to be
played. Defaults to once '1'.
10.7.1.3
<media>
Identifies multimedia content for play. All content of <media>
element MUST start to play concurrently. This element may be used to
generate a multi-media stream from two independent media resources,
one identifying audio and the corresponding
elements reached its maximum. other identifying video.
The following sections describe the <media> element MUST contain at least one child element. Valid
child elements of <dtmf> or
<collect>.
10.7.5.1 <play>
The optional <play> element <media> are <audio> and <video>, as a child described
earlier. <media> element MUST contain at most one <audio> element of <dtmf> or <collect>
allows a prompt to be played prior to DTMF digit collection. DTMF
digit collection starts
at most one <video> element.
10.7.1.4
<var>
Specifies the end generation of audio from a variable using prerecorded
audio segments. A variable represents a semantic concept (such as
date or number) and dynamically produces the play sequence appropriate speech.
Prerecorded audio allows an application vendor or if service provider to
choose the
play is barged by DTMF, assuming that barge=true is specified exact voice for
<play>. For a complete description, refer their audio and therefore completely
control the "sound and feel" of the service provided to <play> element.
Saleem & Sharratt end users. It
provides very high audio quality and allows the variables to blend
seamlessly into the surrounding audio segments.
Saleem, et al. Expires - December 2006 April 2007 [Page 71] 70]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.7.5.2 <pattern>
The pattern element describes one or more DTMF digits that are
Text to speech (TTS) using SSML [27] may also be
recognized. When the pattern is matched, the child elements MUST be
executed.
Attributes:
digits: The digit pattern which should be matched.
format: an enumerated value which defines the format used to
express the digit pattern. The format render
variables, but may be "mgcp" or "megaco"
for patterns expressed not provide as digit map from those specifications, good quality, or allow as one complete
control of the simple built-in formats defined within this
specification. Currently, a single built-in format
"moml+digits" "sound and feel" or user experience. TTS is defined which allows normally
used for reading text such as emails and for very large vocabularies
such as stock names. TTS results in a match based on either very clear difference between
the variables and the surrounding audio segments. (See MSML Dialog
Speech package).
Attributes:
type: specifies the type of variable. Mandatory. Variable type
must be one of "date", "digits", "duration", "month", "money",
"number", "silence", "time", or more specific digits, or based upon a specific length
specification with "weekday".
subtype: specifies an optional return key. "moml+digits" is clarification of type. Specific
values depend upon the
default.
iterate: type.
value: text which should be rendered appropriate to the type
and subtype attributes.
xml:lang: specifies the number of times language to use when rendering the <pattern> may be
matched.
variable.
10.7.1.5
<playexit>
The value "forever" may be used to indicate that
<pattern> may <playexit> element MUST be matched any number invoked when generation of times. This value
overrides any specified in <dtmf>. Default is once '1'.
10.7.5.3 <detect> all content
of the <play> has come to completion. The contents of the <detect> this element MUST be executed whenever any
DTMF is first detected. It MUST
MAY be matched at most once. used to send events.
Attributes:
none
10.7.5.4 <noinput>
The <noinput> element is used when
10.7.2
<dtmfgen>
DTMF is being collected. Children
of the <noinput> element MUST be executed when generator originates one or more DTMF has not been
detected and the first digit timeout occurs. digits in sequence.
Attributes:
iterate: specifies
id: an optional identifier which may be referenced elsewhere
for sending events to the number dtmfgen primitive.
digits: A string of times characters from the <noinput> may be
triggered. The value "forever" may be alphabet "0-9a-d#*"
which correspond to a sequence of DTMF tones. Mandatory.
level: used to indicate that
<noinput> may define the power level for which the tones will
be triggered any number of times. This value
overrides any specified generated. Expressed in <dtmf>. Default is once '1'.
Saleem & Sharratt dBm0 in a range of 0 to -96 dBm0.
Larger negative values express lower power levels. Note that
Saleem, et al. Expires - December 2006 April 2007 [Page 72] 71]
Internet-draft Media Server Markup Language June October 2006
(MSML)
10.7.5.5 <nomatch>
The <nomatch> element is used when DTMF is being collected. Children
of the <nomatch> element MUST
values lower than -55 dBm0 will be executed when it rejected by most receivers
(TR-TSY-000181, ITU-T Q.24A). Default is determined that
none of the individual patterns can be matched.
Attributes:
iterate: specifies the number of times -6 dBm0.
dur: the <nomatch> may be
triggered. The value "forever" may duration in milliseconds for which each tone should be used to indicate that
<nomatch>
generated. Implementations may be triggered any number of times. This round the value
overrides any specified in <dtmf>. if they only
support discrete durations. Default is once '1'.
10.7.5.6 <dtmfexit>
The <dtmfexit> element MUST be invoked when 100 ms.
interval: the dtmf input completes
because one duration in milliseconds of <pattern>, <noinput>, or <nomatch> occurred its
ma