INTERNET-DRAFT J. Lazzaro April 25, 2005 J. Wawrzynek Expires: October 25, 2005 UC Berkeley RTP Payload Format for MIDI Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 25, 2005. Copyright Notice Copyright (C) The Internet Society (2005). All Rights Reserved. Lazzaro/Wawrzynek [Page 1] INTERNET-DRAFT 25 April 2005 Abstract This memo describes an RTP payload format for the MIDI command language. The format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as the remote operation of musical instruments) and content-delivery applications (such as file streaming). The format may be used over unicast and multicast UDP as well as TCP, and defines tools for graceful recovery from packet loss. Stream behavior, including the MIDI rendering method, may be customized during session setup. The format also serves as a mode for the mpeg4-generic format, to support the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds Level 2, and Structured Audio. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Bitfield Conventions . . . . . . . . . . . . . . . . . . . 6 2. Packet Format . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 RTP Header . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 MIDI Payload . . . . . . . . . . . . . . . . . . . . . . . 12 3. MIDI Command Section . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Timestamps . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Command Coding . . . . . . . . . . . . . . . . . . . . . . 17 4. The Recovery Journal System . . . . . . . . . . . . . . . . . . . 24 5. Recovery Journal Format . . . . . . . . . . . . . . . . . . . . . 26 6. Session Description Protocol . . . . . . . . . . . . . . . . . . 29 6.1 Session Descriptions for Native Streams . . . . . . . . . . 30 6.2 Session Descriptions for mpeg4-generic Streams . . . . . . 32 6.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 35 7. Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . 36 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . . . 37 A. The Recovery Journal Channel Chapters . . . . . . . . . . . . . . 38 A.1 Recovery Journal Definitions . . . . . . . . . . . . . . . 38 A.2 Chapter P: MIDI Program Change . . . . . . . . . . . . . . 43 A.3 Chapter C: MIDI Control Change . . . . . . . . . . . . . . 44 A.3.1 Log Inclusion Rules . . . . . . . . . . . . . . . . 44 A.3.2 Controller Log Format . . . . . . . . . . . . . . . 46 A.3.3 Log List Coding Rules . . . . . . . . . . . . . . . 47 A.3.4 The Parameter System . . . . . . . . . . . . . . . . 48 A.4 Chapter M: MIDI Parameter System . . . . . . . . . . . . . 50 A.4.1 Log Inclusion Rules . . . . . . . . . . . . . . . . 51 A.4.2 Log Coding Rules . . . . . . . . . . . . . . . . . . 53 A.4.2.1 The Value Tool . . . . . . . . . . . . . . . 54 A.4.2.2 The Count Tool . . . . . . . . . . . . . . . 57 A.5 Chapter W: MIDI Pitch Wheel . . . . . . . . . . . . . . . . 59 Lazzaro/Wawrzynek [Page 2] INTERNET-DRAFT 25 April 2005 A.6 Chapter N: MIDI NoteOff and NoteOn . . . . . . . . . . . . 60 A.6.1 Header Structure . . . . . . . . . . . . . . . . . . 61 A.6.2 Note Structures . . . . . . . . . . . . . . . . . . 62 A.7 Chapter E: MIDI Note Command Extras . . . . . . . . . . . . 63 A.7.1 Note Log Format . . . . . . . . . . . . . . . . . . 64 A.7.2 Log Inclusion Rules . . . . . . . . . . . . . . . . 64 A.8 Chapter T: MIDI Channel Aftertouch . . . . . . . . . . . . 65 A.9 Chapter A: MIDI Poly Aftertouch . . . . . . . . . . . . . . 66 B. The Recovery Journal System Chapters . . . . . . . . . . . . . . 67 B.1 System Chapter D: Simple System Commands . . . . . . . . . 67 B.1.1 Undefined System Commands . . . . . . . . . . . 68 B.2 System Chapter V: Active Sense Command . . . . . . . . . . 71 B.3 System Chapter Q: Sequencer State Commands . . . . . . . . 72 B.3.1 Non-compliant Sequencers . . . . . . . . . . . 74 B.4 System Chapter F: MIDI Time Code . . . . . . . . . . . . . 75 B.4.1 Partial Frames . . . . . . . . . . . . . . . . . . 77 B.5 System Chapter X: System Exclusive . . . . . . . . . . . . 79 B.5.1 Chapter Format . . . . . . . . . . . . . . . . 79 B.5.2 Log Inclusion Semantics . . . . . . . . . . . . 82 B.5.3 TCOUNT and COUNT fields . . . . . . . . . . . . 84 C. Session Configuration Tools . . . . . . . . . . . . . . . . . . . 86 C.1 The Journalling System . . . . . . . . . . . . . . . . . . 87 C.1.1 The j_sec Parameter . . . . . . . . . . . . . . . . 88 C.1.2 The j_update Parameter . . . . . . . . . . . . . . . 89 C.1.2.1 The anchor Sending Policy . . . . . . . . . . 89 C.1.2.2 The closed-loop Sending Policy . . . . . . . 90 C.1.2.3 The open-loop Sending Policy . . . . . . . . 94 C.1.3 Chapter Inclusion Parameters . . . . . . . . . . . . 95 C.2 Timestamp Semantics . . . . . . . . . . . . . . . . . . . . 102 C.2.1 The comex Algorithm . . . . . . . . . . . . . . . . 102 C.2.2 The async Algorithm . . . . . . . . . . . . . . . . 103 C.2.3 The buffer Algorithm . . . . . . . . . . . . . . . . 104 C.3 Packet Timing Tools . . . . . . . . . . . . . . . . . . . . 106 C.3.1 Packet Duration Tools . . . . . . . . . . . . . . . 106 C.3.2 The guardtime Parameter . . . . . . . . . . . . . . 107 C.3.3 MIDI Time Code Issues . . . . . . . . . . . . . . . 108 C.4 Stream Description . . . . . . . . . . . . . . . . . . . . 109 C.4.1 The musicport Parameter . . . . . . . . . . . . . . 109 C.4.2 Multi-stream examples using musicport . . . . . . . 112 C.5 MIDI Rendering . . . . . . . . . . . . . . . . . . . . . . 114 C.5.1 The multimode Parameter . . . . . . . . . . . . . . 115 C.5.2 The rinit Parameter . . . . . . . . . . . . . . . . 115 C.5.3 Encoding rinit Data Objects . . . . . . . . . . . . 117 C.5.4 MIDI Channel Mapping . . . . . . . . . . . . . . . . 118 C.5.4.1 smf_info . . . . . . . . . . . . . . . . . . 118 C.5.4.2 smf_inline, smf_url, smf_cid . . . . . . . . 120 C.5.4.3 chanmask . . . . . . . . . . . . . . . . . . 121 C.5.5 The audio/asc MIME Type . . . . . . . . . . . . . . 122 Lazzaro/Wawrzynek [Page 3] INTERNET-DRAFT 25 April 2005 C.6 Interoperability . . . . . . . . . . . . . . . . . . . . . 124 C.6.1 Content streaming . . . . . . . . . . . . . . . . . 124 C.6.2 Stage and studio . . . . . . . . . . . . . . . . . . 127 C.6.2.1 Capabilities . . . . . . . . . . . . . . . . 130 C.6.2.1.1 MIDI media lines . . . . . . . . . . . 132 C.6.2.1.2 MIDI rendering . . . . . . . . . . . . 134 C.6.2.1.3 Audio media lines . . . . . . . . . . 137 C.6.2.2 Baseline session . . . . . . . . . . . . . . 137 C.6.2.3 Examples . . . . . . . . . . . . . . . . . . 139 C.6.3 Network musical performance . . . . . . . . . . . . 146 D. Parameter Syntax Definitions . . . . . . . . . . . . . . . . . . 152 E. A MIDI Overview for Networking Specialists . . . . . . . . . . . 158 E.1 Commands Types . . . . . . . . . . . . . . . . . . . . . . 160 E.2 Running Status . . . . . . . . . . . . . . . . . . . . . . 160 E.3 Command Timing . . . . . . . . . . . . . . . . . . . . . . 161 E.4 AudioSpecificConfig templates for MMA renderers . . . . . . 161 F. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 166 G. Security Considerations . . . . . . . . . . . . . . . . . . . . . 167 H. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 168 H.1 rtp-midi MIME Registration . . . . . . . . . . . . . . . . 168 H.1.1 Repository request . . . . . . . . . . . . . . . . . 171 H.2 mpeg4-generic MIME Registration . . . . . . . . . . . . . . 172 H.2.1 Repository request . . . . . . . . . . . . . . . . . 175 H.3 asc MIME Registration . . . . . . . . . . . . . . . . . . . 176 I. References . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 I.1 Normative References . . . . . . . . . . . . . . . . . . . 178 I.2 Informative References . . . . . . . . . . . . . . . . . . 179 J. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 181 K. Intellectual Property Rights Statement . . . . . . . . . . . . . 181 L. Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 181 M. Change Log for . . . . . 183 Lazzaro/Wawrzynek [Page 4] INTERNET-DRAFT 25 April 2005 1. Introduction The Internet Engineering Task Force (IETF) has developed a set of focused tools for multimedia networking ([2] [6] [19] [20]). These tools can be combined in different ways to support a variety of real- time applications over Internet Protocol (IP) networks. For example, a telephony application might use the Session Initiation Protocol (SIP, [19]) to set up a phone call. Call setup would include negotiations to agree on a common audio codec [13]. Negotiations would use the Session Description Protocol (SDP, [6]) to describe candidate codecs. After a call is set up, audio data would flow between the parties using the Real Time Protocol (RTP, [2]) under any applicable profile (for example, the Audio/Visual Profile (AVP, [3])). The tools used in this telephony example (SIP, SDP, RTP) might be combined in a different way to support a content streaming application, perhaps in conjunction with other tools (such as the Real Time Streaming Protocol (RTSP, [20])). The MIDI command language [1] is widely used in musical applications that are analogous to the examples described above. On stage and in the recording studio, MIDI is used for the interactive remote control of musical instruments, an application similar in spirit to telephony. On web pages, Standard MIDI Files (SMFs, [1]) rendered using the General MIDI standard [1] provide a low-bandwidth substitute for audio streaming. This memo is motivated by a simple premise: if MIDI performances could be sent as RTP streams that are managed by IETF session tools, a hybridization of the MIDI and IETF application domains may occur. For example, interoperable MIDI networking may foster network music performance applications, in which a group of musicians, located at different physical locations, interact over a network to perform as they would if located in the same room [17]. As another example, the streaming community may begin to use MIDI for low-bitrate audio coding, perhaps in conjunction with normative sound synthesis methods [5]. As another example, manufacturers of professional audio equipment and electronic musical instruments may consider adopting the IETF multimedia stack (IP, RTP, RTSP) as the networking layer for a MIDI control plane. To enable MIDI applications using RTP, this memo defines an RTP payload format and its media type. Sections 2-5 and Appendices A-B define the RTP payload format. Section 6 and Appendices C-D define the media types identifying the payload format, the parameters needed for configuration, and how the parameters are utilized in SDP. Lazzaro/Wawrzynek [Page 5] INTERNET-DRAFT 25 April 2005 Appendix C also includes interoperability guidelines for the three example applications described above: network musical performance using SIP (Appendix C.6.3), content-streaming using RTSP (Appendix C.6.1) and RTSP-based stage and studio devices (Appendix C.6.2). Some applications may require MIDI media delivery at a certain service quality level (latency, jitter, packet loss, etc). RTP itself does not provide service guarantees. However, applications may use lower-layer network protocols to configure the quality of the transport services that RTP uses. These protocols may act to reserve network resources for RTP flows [23], or may simply direct RTP traffic onto a dedicated "media network" in a local installation. Note that RTP and the MIDI payload format DO provide tools that applications may use to achieve the best possible real-time performance at a given service level. This memo normatively defines the syntax and semantics of the MIDI payload format. However, this memo does not define algorithms for sending and receiving packets. An ancillary document [22] provides informative guidance on algorithms. Supplemental information may be found in related conference publications [17] [18]. Throughout this memo, the phrase "native stream" refers to a stream that uses the rtp-midi MIME type. The phrase "mpeg4-generic stream" refers to a stream that uses the mpeg4-generic MIME type (in mode rtp-midi) to operate in an MPEG 4 environment [4]. Section 6 describes this distinction in detail. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [11]. 1.2 Bitfield Conventions The packet bitfields in this document that share a common name often have identical semantics. As most of these bitfields appear in Appendices A-B, we define the common bitfield names in Appendix A.1. However, a few of these common names also appear in the main text of this document. For convenience, we list these definitions below: o R flag bit. R flag bits are reserved for future use. Senders MUST set R bits to 0. Receivers MUST ignore R bit values. o LENGTH field. All fields named LENGTH (as distinct from LEN) code the number of octets in the structure that contains it, including the header it resides in and all hierarchical levels Lazzaro/Wawrzynek [Page 6] INTERNET-DRAFT 25 April 2005 below it. If a structure contains a LENGTH field, a receiver MUST use the LENGTH field value to advance past the structure during parsing, rather than use knowledge about the internal format of the structure. 2. Packet Format In this section, we introduce the format of RTP MIDI packets. The description includes some background information on RTP, for the benefit of MIDI implementors new to IETF tools. Implementors should consult [2] for an authoritative description of RTP. This memo assumes the reader is familiar with MIDI syntax and semantics. Appendix E provides a MIDI overview, at a level of detail sufficient to understand most of this memo. Implementors should consult [1] for an authoritative description of MIDI. The MIDI payload format maps a MIDI command stream (16 voice channels + systems) onto an RTP stream. An RTP media stream is a sequence of logical packets that share a common format. Each packet consists of two parts: the RTP header and the MIDI payload. Figure 1 shows this format (vertical space delineates the header and payload). We describe RTP packets as "logical" packets to highlight the fact that RTP itself is not a network-layer protocol. Instead, RTP packets are mapped onto network protocols (such as unicast UDP, multicast UDP, or TCP) by an application [21]. The interleaved mode of the Real Time Streaming Protocol (RTSP, [20]) is an example of an RTP mapping to TCP transport, as is [25]. 2.1 RTP Header [2] provides a complete description of the RTP header fields. In this section, we clarify the role of a few RTP header fields for MIDI applications. All fields are coded in network byte order (big-endian). Lazzaro/Wawrzynek [Page 7] INTERNET-DRAFT 25 April 2005 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | Sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI command section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Journal section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 -- Packet format The behavior of the 1-bit M field depends on the MIME type of the stream. For native streams, the M bit MUST be set to 1 if the MIDI command section has a non-zero LEN field, and MUST be set to 0 otherwise. For mpeg4-generic streams, the M bit MUST be set to 1 for all packets (to conform to [4]). In an RTP MIDI stream, the 16-bit sequence number field is initialized to a randomly chosen value, and is incremented by one (modulo 2^16) for each packet sent in the stream. A related quantity, the 32-bit extended packet sequence number, may be computed by tracking rollovers of the 16-bit sequence number. Note that different receivers of the same stream may compute different extended packet sequence numbers, depending on when the receiver joined the session. The 32-bit timestamp field sets the base timestamp value for the packet. The payload codes MIDI command timing relative to this value. The timestamp units are set by the clock rate parameter. For example, if the clock rate has a value of 44100 Hz, two packets whose base timestamp values differ by 2 seconds have RTP timestamp fields that differ by 88200. Note that the clock rate parameter is not encoded within each RTP MIDI packet. A receiver of an RTP MIDI stream becomes aware of the clock rate as part of the session setup process. For example, if a session management tool uses the Session Description Protocol (SDP, [6]) to describe a media session, the clock rate parameter is set using the rtpmap attribute. We show examples of session setup in Section 6. Lazzaro/Wawrzynek [Page 8] INTERNET-DRAFT 25 April 2005 We now address the subject of the resolution of the clock rate. For RTP MIDI stream destined to be rendered into audio, the clock rate SHOULD be an audio sample rate of 32 KHz or higher. This recommendation is due to the sensitivity of human musical perception to small timing errors in musical note sequences, and due to the timbral changes that occur when two near-simultaneous MIDI NoteOns are rendered with a different timing than desired by the content author due to clock rate quantization. RTP MIDI streams that are not destined for audio rendering (such as MIDI streams that control stage lighting) MAY use a lower clock rate, but SHOULD use a clock rate high enough to avoid timing artifacts in the application. For RTP MIDI streams destined to be rendered into audio, the clock rate SHOULD be chosen from rates in common use in professional audio applications or in consumer audio distribution. At the time of this writing, these rates include 32 KHz, 44.1 KHz, 48 KHz, 64 KHz, 88.2 KHz, 96 KHz, 176.4 KHz, and 192 KHz. If the RTP MIDI session is a part of a synchronized media session that includes another (non-RTP MIDI) RTP audio stream with a clock rates of 32 KHz or higher, the RTP MIDI stream SHOULD use a clock rate that matches the clock rate of the other audio stream. However, if the RTP MIDI stream is destined to be rendered into audio, the RTP MIDI stream SHOULD NOT use a clock rate lower than 32 KHz, even if this second stream has a clock rate less than 32 KHz. Timestamps of consecutive packets do not necessarily increment at a fixed rate, because RTP MIDI packets are not necessarily sent at a fixed rate. The degree of packet transmission regularity reflects the underlying application dynamics. Interactive applications may vary the packet sending rate to track the gestural rate of a human performer, whereas content-streaming applications may send packets at a fixed rate. Therefore, the timestamps for two sequential RTP packets may be identical, or the second packet may have a timestamp arbitrarily larger than the first packet (modulo 2^32). Section 3 places additional restrictions on the RTP timestamps for two sequential RTP packets, as does the guardtime MIME parameter (Appendix C.3.2). We use the term "media time" to denote the temporal duration of the media coded by an RTP packet. The media time coded by a packet is computed by subtracting the last command timestamp in the MIDI command section from the RTP timestamp (modulo 2^32). If the MIDI list of the MIDI command section of a packet is empty, the media time coded by the packet is 0 ms. Appendix C.3.1 discusses media time issues in detail. All RTP streams from all parties in a multimedia session whose payload types (coded by the PT header field) are mapped to the rtp-midi media type share a single RTP session, and thus a common SSRC payload field space (as defined in [2]). Likewise, all RTP streams from all parties Lazzaro/Wawrzynek [Page 9] INTERNET-DRAFT 25 April 2005 in a multimedia session whose payload types are mapped to the mpeg4-generic media type in mode rtp-midi share an (independent) single RTP session. If a media line contains an RTP MIDI payload type, the media line payload type list MUST consist entirely of payload types mapped to the rtp-midi media type, or entirely of payload types mapped to the mpeg4-generic media type in mode rtp-midi. All RTP MIDI streams generated by a party using a single timing synchronization source share a distinct SSRC header field value. The SSRC value is chosen and updated using the methods described in [2]. For example, in the common case of a party sending one 16 voice channel + systems MIDI name space on a single RTP MIDI stream sent over a single network transport, the packets that make up the RTP MIDI stream are identifiable by a unique SSRC value. More complex uses of RTP MIDI may involve a single party sending several RTP MIDI streams. In many of these uses, the party uses a single timing synchronization clock, and thus all packets sent by the party share the same SSRC header field value. In these uses, a different payload type is assigned to each RTP MIDI stream. A receiver uses the PT field to identify which packets from an SSRC belong to each of the RTP MIDI streams it sends. For example, a hardware "breakout box" that transcodes the data on its MIDI 1.0 DIN input jack onto an RTP MIDI stream would use the same SSRC header value for all packets it sent in a multimedia session. If the breakout box had two MIDI 1.0 DIN input jacks, and used a common clock to timestamp incoming data on both jacks, the box would send two RTP MIDI streams, with each stream using the same SSRC field value with a different PT field value. In other cases, a party does not use the same timing synchronization clock for all of the RTP MIDI streams it sends. This architecture usually reflects the underlying hardware of the party, such as two MIDI piano keyboards, each generating an RTP MIDI stream from a distinct Internet address. Each piano would have its own clock. In this case, each RTP MIDI stream MUST use a distinct SSRC header field value. Although these streams MAY use the same payload type field, doing so would make it impossible, in the general case, for a session description to assign different properties for each stream (such as how the receiver should render the stream into audio, or how the receiver should present the stream to software applications via an operating system API). Thus, in practice, each of these streams usually has a distinct PT field value as well. Lazzaro/Wawrzynek [Page 10] INTERNET-DRAFT 25 April 2005 In general, the network transport choices for the RTP MIDI streams by a party are independent of the uses of the SSRC and PT values as described above -- indeed, the packet identification field permits this freedom of transport assignment. So, for example, a party may split a single 16-channel + Systems MIDI name space into two RTP MIDI streams, each containing a subset of MIDI commands. One stream may be sent over UDP transport (perhaps this stream contains real-time Note commands), the other stream may over TCP (perhaps this stream contains only bulk-data System Exclusive commands, unsuitable for UDP). Each stream uses the same SSRC. The session description maps a different payload type onto each stream, and via this payload type describes the nature of the MIDI command split, and perhaps the rendering method for the stream. As a second example, a party may generate 2 RTP MIDI streams, each coding a different 16-channel + Systems MIDI name space whose timestamps are derived from different synchronization timing sources, and send the streams to the same unicast network address and port pair (RTP port + RTCP port). In this case, two RTP MIDI streams are sent over a single logical transport (the receiver unicast address/port pair), and a receiver uses the different SSRC values (and probably, different PT values) to demultiplex the two streams of packets arriving at the same port. In an RTP MIDI session, each synchronization source (identified by its SSRC field value) MUST randomly choose a value to initialize its 32-bit timestamp clock, and MUST use readings from this clock to generate the timestamp field of all packets it creates for the RTP MIDI session. Note that in the case multiple RTP MIDI streams sent by a single SSRC, initializing each RTP MIDI stream with a unique random initialization value would not work, because the RTCP synchronization mechanism [2] maps the timebase of each SSRC (NOT each RTP MIDI stream) to a wall clock time. Instead, receivers of multiple RTP MIDI streams from a single SSRC merge-sort the streams using the RTP timestamp field, which is based on a common clock and a common random initialization value. In contrast, receivers DO use the wall clock time coded by RTCP to synchronize two RTP MIDI streams sent by the same party using different clock synchronization sources, and thus, different SSRC values. On a final note, in some uses of MIDI, parties send bidirectional traffic to conduct transactions (such as file exchange). These commands were designed to work over MIDI 1.0 DIN cable networks may be configured in a multicast topology, which use pure pure "party-line" signalling. Thus, if a multimedia session ensures a multicast connection between all parties, bidirectional MIDI commands will work without additional Lazzaro/Wawrzynek [Page 11] INTERNET-DRAFT 25 April 2005 support from the RTP MIDI payload format. 2.2 MIDI Payload The payload (Figure 1) MUST begin with the MIDI command section. The MIDI command section codes a (possibly empty) list of timestamped MIDI commands, and provides the essential service of the payload format. The payload MAY also contain a journal section. The journal section provides resiliency by coding the recent history of the stream. A flag in the MIDI command section codes the presence of a journal section in the payload. Section 3 defines the MIDI command section. Sections 4-5 and Appendices A-B define the recovery journal, the default format for the journal section. Here, we describe how these payload sections operate in a stream. The journalling method for a stream is set at the start of a session and MUST NOT be changed thereafter. A stream may be set to use the recovery journal, to use an alternative journal format (none are defined in this memo), or to not use a journal. The default journalling method of a stream is inferred from its transport type. Streams that use unreliable transport (such as UDP) default to using the recovery journal. Streams that use reliable transport (such as TCP) default to not using a journal. Appendix C.1.1 defines session configuration tools for overriding these defaults. For all types of transport, a sender MUST transmit an RTP packet stream with consecutive sequence numbers (modulo 2^16). If a stream uses the recovery journal, every payload in the stream MUST include a journal section. If a stream does not use journalling, a journal section MUST NOT appear in a stream payload. If a stream uses an alternative journal format, the specification for the journal format defines an inclusion policy. The payload of a stream encodes data for a single MIDI command name space (16 voice channels + Systems). Applications may use several streams in a session. Session configuration tools for multi-stream sessions are defined in Appendix C.4. In some applications, a receiver renders MIDI commands into audio (or into control actions, such as the rewind of a tape deck or the dimming of stage lights). In other applications, a receiver presents a MIDI stream to software programs via an Application Programmer Interface (API). Appendix C.5 defines session configuration tools to specify what receivers should do with a MIDI command stream. Lazzaro/Wawrzynek [Page 12] INTERNET-DRAFT 25 April 2005 If a stream is sent over UDP transport, the Maximum Transmission Unit (MTU) of the underlying network limits the practical size of the payload section (for example, an Ethernet MTU is 1500 octets), for applications where predictable and minimal packet transmission latency is critical. A sender SHOULD NOT create RTP MIDI UDP packets whose size exceeds the MTU of the underlying network. Instead, the sender SHOULD take steps to keep the maximum packet size under the MTU limit. These steps may take many forms. The default closed-loop recovery journal sending policy (defined in Appendix C.1.2.2) uses Real Time Control Protocol (RTCP, [2]) feedback to manage the RTP MIDI packet size. In addition, Section 3.2 and Appendix B.5.2 provide specific tools for managing the size of packets that code MIDI System Exclusive (0xF0) commands. Appendix C.4 defines session configuration tools that may be used to split a dense MIDI name space into several UDP streams, so that the payload fits comfortably into an MTU. Another option is to use TCP. Section 4.3 of [22] provides non-normative advice for packet size management. Lazzaro/Wawrzynek [Page 13] INTERNET-DRAFT 25 April 2005 3. MIDI Command Section Figure 2 shows the format of the MIDI command section. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|J|Z|P|LEN... | MIDI list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 -- MIDI command section The MIDI command section begins with a variable-length header. The header field LEN codes the number of octets in the MIDI list that follows the header. If the header flag B is 0, the header is one octet long, and LEN is a 4-bit field, supporting a maximum MIDI list length of 15 octets. If B is 1, the header is two octets long, and LEN is a 12-bit field, supporting a maximum MIDI list length of 4095 octets. LEN is coded in network byte order (big-endian): the 4 bits of LEN that appear in the first header octet code the most significant 4 bits of the 12-bit LEN value. A LEN value of 0 is legal, and codes an empty MIDI list If the J header bit is set to 1, a journal section MUST appear after MIDI command section in the payload. If the J header bit is set to 0, the payload MUST NOT contain a journal section. We define the semantics of the P header bit in Section 3.2. If the LEN header field is nonzero, the MIDI list has the structure shown in Figure 3. Lazzaro/Wawrzynek [Page 14] INTERNET-DRAFT 25 April 2005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 0 (1-4 octets long, or 0 octets if Z = 1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI Command 0 (1 or more octets long) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 1 (1-4 octets long) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI Command 1 (1 or more octets long) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time N (1-4 octets long) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI Command N (0 or more octets long) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3 -- MIDI list structure. If the header flag Z is 1, the MIDI list begins with a complete MIDI command (coded in the MIDI Command 0 field in Figure 3) preceded by a delta time (coded in the Delta Time 0 field). If Z is 0, the Delta Time 0 field is not present in the MIDI list, and the command coded in the MIDI Command 0 field has an implicit delta time of 0. The MIDI list structure may also optionally encode a list of N additional complete MIDI commands, each coded in a MIDI Command K field. Each additional command MUST be preceded by a Delta Time K field, which codes the command's delta time. We discuss exceptions to the "command fields code complete MIDI commands" rule in Section 3.2. The final MIDI command field (i.e. the MIDI Command N field shown in Figure 3) in the MIDI list MAY be empty. Moreover, a MIDI list MAY consist a single delta time (encoded in the Delta Time 0 field) without an associated command (which would have been encoded in the MIDI Command 0 field). These rules enable MIDI coding features that are explained in Section 3.1. We delay the explanations because an understanding of RTP MIDI timestamps is necessary to describe the features. 3.1 Timestamps In this section, we describe how RTP MIDI encodes a timestamp for each MIDI list command. Command timestamps have the same units as RTP packet header timestamps (described in Section 2.1 and [2]). Recall that RTP timestamps have units of seconds, whose scaling is set during session configuration (see Section 6.1 and [6]). As shown in Figure 3, the MIDI list encodes time using a compact delta- Lazzaro/Wawrzynek [Page 15] INTERNET-DRAFT 25 April 2005 time format. The RTP MIDI delta time syntax is a modified form of the MIDI File delta time syntax [1]. RTP MIDI delta times use 1-4 octet fields to encode 32-bit unsigned integers. Figure 4 shows the encoded and decoded forms of delta times. Note that delta time values may be legally encoded in multiple formats; for example, there are four legal ways to encode the zero delta time (0x00, 0x8000, 0x808000, 0x80808000). RTP MIDI uses delta times to encode a timestamp for each MIDI command. The timestamp for MIDI Command K is the summation (modulo 2^32) of the RTP timestamp and decoded delta times 0 through K. This cumulative coding technique, borrowed from MIDI File delta time coding, is efficient because it reduces the number of multi-octet delta times. All command timestamps in a packet MUST be less than or equal to the RTP timestamp of the next packet in the stream (modulo 2^32). This restriction ensures that a particular RTP MIDI packet in a stream is uniquely responsible for encoding time starting at the moment after the RTP timestamp encoded in the RTP packet header, and ending at the moment before the final command timestamp encoded in the MIDI list. The "moment before" and "moment after" text acknowledges the "less than or equal" semantics (as opposed to "strictly less than") in the sentence above this paragraph. Thus, it is possible to "pad" the end of an RTP MIDI packet with time that is guaranteed to be void of MIDI commands, by setting the "Delta Time N" field of the MIDI list to the end of the void time, and by omitting its corresponding "MIDI Command N" field (a syntactic construction the preamble of Section 3 expressly made legal). In addition, it is possible to code an RTP MIDI packet to express that a period of time in the stream in void of MIDI commands, without sending any other information in the packet's MIDI list. The RTP timestamp in the header would code the start of the void time. The MIDI list of this packet would consist of a "Delta Time 0" field that coded the end of the void time. No other fields would be present in the MIDI list (a syntactic construction the preamble of Section 3 also expressly made legal). By default, a command timestamp indicates the execution time for the command. The difference between two timestamps indicates the time delay between the execution of the commands. This difference may be zero, coding simultaneous execution. In this memo, we refer to this interpretation of timestamps as "comex" (COMmand EXecution) semantics. We formally define comex semantics in Appendix C.2. The comex interpretation of timestamps works well for transcoding a Standard MIDI File (SMF) into an RTP MIDI stream, as SMFs code a Lazzaro/Wawrzynek [Page 16] INTERNET-DRAFT 25 April 2005 timestamp for each MIDI command stored in the file. To transcode an SMF that uses metric time markers, use the SMF tempo map (encoded in the SMF as meta-events) to convert metric SMF timestamp units into seconds-based RTP timestamp units. The comex interpretation also works well for MIDI controllers that are implementing RTP MIDI natively (i.e. NOT by transcoding a MIDI 1.0 DIN serial cable). Other interpretations of timestamps may work better for transcoding a MIDI source that uses implicit command timing (such as MIDI 1.0 DIN cables) into an RTP MIDI stream. Appendix C.2 defines alternatives to comex semantics, and describes session configuration tools for selecting the timestamp interpretation semantics for a stream. One-Octet Delta Time: Encoded form: 0ddddddd Decoded form: 00000000 00000000 00000000 0ddddddd Two-Octet Delta Time: Encoded form: 1ccccccc 0ddddddd Decoded form: 00000000 00000000 00cccccc cddddddd Three-Octet Delta Time: Encoded form: 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 00000000 000bbbbb bbcccccc cddddddd Four-Octet Delta Time: Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd Figure 4 -- Decoding delta time formats 3.2 Command Coding Each non-empty MIDI Command field in the MIDI list codes one of the MIDI command types that may legally appear on a MIDI 1.0 DIN cable. Standard MIDI File meta-events do not fit this definition and MUST NOT appear in the MIDI list. As a rule, each MIDI Command field codes a complete command, in the binary command format defined in [1]. In the remainder of this section, we describe exceptions to this rule. The first MIDI channel command in the MIDI list MUST include a status Lazzaro/Wawrzynek [Page 17] INTERNET-DRAFT 25 April 2005 octet. Running status coding, as defined in [1], MAY be used for all subsequent MIDI channel commands in the list. As in [1], System Common and System Exclusive messages (0xF0 ... 0xF7) cancel the running status state, but System Real-time messages (0xF8 ... 0xFF) do not affect the running status state. All System commands in the MIDI list MUST include a status octet. As we note above, the first channel command in the MIDI list MUST include a status octet. However, the corresponding command in the original MIDI source data stream might not have a status octet (in this case, the source would be coding the command using running status). If the status octet of the first channel command in the MIDI list does not appear in the source data stream, the P (phantom) header bit MUST be set to 1. In all other cases, the P bit MUST be set to 0. Note that the P bit describes the MIDI source data stream, not the MIDI list encoding; regardless of the state of the P bit, the MIDI list MUST include the status octet. As receivers MUST be able to decode running status, sender implementors should feel free to use running status to improve bandwidth efficiency. However, senders SHOULD NOT introduce timing jitter into an existing MIDI command stream through an inappropriate use or removal of running status coding. This warning primarily applies to senders whose RTP MIDI streams may be transcoded onto a MIDI 1.0 DIN cable [1] by the receiver: both the timestamps and the command coding (running status or not) must comply with the physical restrictions of implicit time coding over a slow serial line. On a MIDI 1.0 DIN cable [1], a System Real-time command may be embedded inside of another "host" MIDI command. This syntactic construction is not supported in the payload format: a MIDI Command field in the MIDI list codes exactly one MIDI command (partially or completely). To encode an embedded System Real-time command, senders MUST extract the command from its host, and code it in the MIDI list as a separate command. The host command and System Real-time command SHOULD appear in the same MIDI list. The delta time of the System Real-time command SHOULD result in a command timestamp that encodes the System Real-time command placement in its original embedded position. Two methods are provided for encoding MIDI System Exclusive (SysEx) commands in the MIDI list. A SysEx command may be encoded in a MIDI Command field verbatim: a 0xF0 octet, followed by an arbitrary number of data octets, followed by a 0xF7 octet. Alternatively, a SysEx command may be encoded as multiple segments. The command is divided into two or more SysEx command segments; each segment Lazzaro/Wawrzynek [Page 18] INTERNET-DRAFT 25 April 2005 is encoded in its own MIDI Command field in the MIDI list. The payload format supports segmentation in order to encode SysEx commands that encode information in the temporal pattern of data octets. By encoding these commands as a series of segments, each data octet may be associated with a distinct delta time. Segmentation also supports the coding of large SysEx commands across several packets. To segment a SysEx command, first partition its data octet list into two or more sublists. The last sublist MAY be empty (i.e. contain no octets); all other sublists MUST contain at least one data octet. To complete the segmentation, add the status octets defined in Figure 5 to the head and tail of the first, last, and any "middle" sublists. Figure 6 shows example segmentations of a SysEx command. A sender MAY cancel a segmented SysEx command transmission that is in progress, by sending the "cancel" sublist shown in Figure 5. A "cancel" sublist MAY follow a "first" or "middle" sublist in the transmission, but MUST NOT follow a "last" sublist. The cancel MUST be empty (thus, 0xF7 0xF4 is the only legal cancel sublist). The cancellation feature is needed because Appendix C.1.3 defines configuration tools that let session parties exclude certain SysEx commands in the stream. Senders that transcode a MIDI source onto an RTP MIDI stream under these constraints have the responsibility of excluding undesired commands from the RTP MIDI stream. The cancellation feature lets a sender start the transmission of a command before the MIDI source has sent the entire command. If a sender determines that the command whose transmission is in progress should not appear on the RTP stream, it cancels the command. Without a method for cancelling a SysEx command transmission, senders would be forced to use a high-latency store-and-forward approach to transcoding SysEx commands onto RTP MIDI packets, in order to validate each SysEx command before transmission. The RECOMMENDED receiver reaction to a cancellation depends on the capabilities of the receiver. For example, a sound synthesizer that is directly parsing RTP MIDI packets and rendering them to audio will be aware of the fact that SysEx commands may be cancelled in RTP MIDI. These receivers SHOULD detect a SysEx cancellation in the MIDI list, and act as if it had never received the SysEx command. As a second example, a synthesizer may be receiving MIDI data from an RTP MIDI stream via a MIDI DIN cable (or a software API emulation of a MIDI DIN cable). In this case, an RTP-MIDI aware system receives the RTP MIDI stream, and transcodes it onto the MIDI DIN cable (or its emulation). Upon the receipt of the cancel sublist, the RTP-MIDI aware Lazzaro/Wawrzynek [Page 19] INTERNET-DRAFT 25 April 2005 transcoder might have already sent the first part of the SysEx command on the MIDI DIN cable to the receiver. Unfortunately, the MIDI DIN cable protocol cannot directly code "cancel SysEx in progress" semantics. However, MIDI DIN cable receivers begin SysEx processing after the complete command arrives. The receiver checks to see if it recognizes the command (coded in the first few octets) and then checks to see if the command is the correct length. Thus, in practice, a transcoder can cancel a SysEx command by sending an 0xF7 to (prematurely) end the SysEx command -- the receiver will detect the incorrect command length, and discard the command. Appendix C.1.3 defines configuration tools that may be used to prohibit SysEx command cancellation. The relative ordering of SysEx command segments in a MIDI list must match the relative ordering of the sublists in the original SysEx command. By default, commands other than System Real-time MIDI commands MUST NOT appear between SysEx command segments (Appendix C.1.3 defines configuration tools to change this default, to let other commands types appear between segments). If the command segments of a SysEx command are placed in the MIDI lists of two or more RTP packets, the segment ordering rules apply to the concatenation of all affected MIDI lists. ----------------------------------------------------------- | Sublist Position | Head Status Octet | Tail Status Octet | |-----------------------------------------------------------| | first | 0xF0 | 0xF0 | |-----------------------------------------------------------| | middle | 0xF7 | 0xF0 | |-----------------------------------------------------------| | last | 0xF7 | 0xF7 | |-----------------------------------------------------------| | cancel | 0xF7 | 0xF4 | ----------------------------------------------------------- Figure 5 -- Command segmentation status octets [1] permits 0xF7 octets that are not part of a (0xF0, 0xF7) pair to appear on a MIDI 1.0 DIN cable. Unpaired 0xF7 octets have no semantic meaning in MIDI, apart from cancelling running status. Unpaired 0xF7 octets MUST NOT appear in the MIDI list of the MIDI Command section. We impose this restriction to avoid interference with the command segmentation coding defined in Figure 5. Lazzaro/Wawrzynek [Page 20] INTERNET-DRAFT 25 April 2005 SysEx commands carried on a MIDI 1.0 DIN cable may use the "dropped 0xF7" construction [1]. In this coding method, the 0xF7 octet is dropped from the end of the SysEx command, and the status octet of the next MIDI command acts both to terminate the SysEx command and start the next command. To encode this construction in the payload format, follow these steps: o Determine the appropriate delta times for the SysEx command and the command that follows the SysEx command. o Insert the "dropped" 0xF7 octet at the end of the SysEx command, to form the standard SysEx syntax. o Code both commands into the MIDI list using the rules above. o Replace the 0xF7 octet that terminates the verbatim SysEx encoding or the last segment of the segmented SysEx encoding with a 0xF5 octet. This substitution informs the receiver of the original dropped 0xF7 coding. [1] reserves the undefined System Common commands 0xF4 and 0xF5 and the undefined System Real-time commands 0xF9 and 0xFD for future use. By default, undefined commands MUST NOT appear in a MIDI Command field in the MIDI list, with the exception of the 0xF5 octets used to code the "dropped 0xF7" construction and the 0xF4 octets used by SysEx "cancel" sublists. During session configuration, a stream may be customized to transport undefined commands (Appendix C.1.3). For this case, we now define how senders encode undefined commands in the MIDI list. An undefined System Real-time command MUST be coded using the System Real-time rules. If the undefined System Common commands are put to use in a future version of [1], the command will begin with an 0xF4 or 0xF5 status octet, followed by zero, one, or two data octets. To encode these commands, senders MUST terminate the command with an 0xF7 octet, and place the modified command into the MIDI Command field. Unfortunately, non-compliant uses of the undefined System Common commands may appear in MIDI implementations. To model these commands, we assume the command begins with an 0xF4 or 0xF5 status octet, followed by zero or more data octets, followed by zero or more trailing 0xF7 status octet(s). To encode the command, senders MUST first remove all trailing 0xF7 status octets from the command. Then, senders MUST terminate the command with an 0xF7 octet, and place the modified command into the MIDI Command field. Lazzaro/Wawrzynek [Page 21] INTERNET-DRAFT 25 April 2005 Note that we include the trailing octets in our model as a cautionary measure: if such commands appeared in a non-compliant use of an undefined System Common command, an RTP MIDI encoding of the command that did not remove trailing octets could be mistaken for an encoding of "middle" or "last" sublist of a segmented SysEx commands (Figure 5) under certain packet loss conditions. Lazzaro/Wawrzynek [Page 22] INTERNET-DRAFT 25 April 2005 Original SysEx command: 0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7 A two-segment segmentation: 0xF0 0x01 0x02 0x03 0x04 0xF0 0xF7 0x05 0x06 0x07 0x08 0xF7 A different two-segment segmentation: 0xF0 0x01 0xF0 0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7 A three-segment segmentation: 0xF0 0x01 0x02 0xF0 0xF7 0x03 0x04 0xF0 0xF7 0x05 0x06 0x07 0x08 0xF7 The segmentation with the largest number of segments: 0xF0 0x01 0xF0 0xF7 0x02 0xF0 0xF7 0x03 0xF0 0xF7 0x04 0xF0 0xF7 0x05 0xF0 0xF7 0x06 0xF0 0xF7 0x07 0xF0 0xF7 0x08 0xF0 0xF7 0xF7 Figure 6 -- Example segmentations Lazzaro/Wawrzynek [Page 23] INTERNET-DRAFT 25 April 2005 4. The Recovery Journal System The recovery journal is the default resiliency tool for unreliable transport. In this section, we normatively define the roles that senders and receivers play in the recovery journal system. MIDI is a fragile code. A single lost command in a MIDI command stream may produce an artifact in the rendered performance. We normatively classify rendering artifacts into two categories: o Transient artifacts. Transient artifacts produce immediate but short-term glitches in the performance. For example, a lost NoteOn (0x9) command produces a transient artifact: one note fails to play, but the artifact does not extend beyond the end of that note. o Indefinite artifacts. Indefinite artifacts produce long-lasting errors in the rendered performance. For example, a lost NoteOff (0x8) command may produce an indefinite artifact: the note that should have been ended by the lost NoteOff command may sustain indefinitely. As a second example, the loss of a Control Change (0xB) command for controller number 7 (Channel Volume) may produce an indefinite artifact: after the loss, all notes on the channel may play too softly or too loudly. The purpose of the recovery journal system is to satisfy the recovery journal mandate: the MIDI performance rendered from an RTP MIDI stream sent over unreliable transport MUST NOT contain indefinite artifacts. The recovery journal system does not use packet retransmission to satisfy this mandate. Instead, each packet includes a special section, called the recovery journal. The recovery journal codes the history of the stream, back to an earlier packet called the checkpoint packet. The range of coverage for the journal is called the checkpoint history. The recovery journal codes the information necessary to recover from the loss of an arbitrary number of packets in the checkpoint history. Appendix A.1 normatively defines the checkpoint packet and the checkpoint history. When a receiver detects a packet loss, it compares its own knowledge about the history of the stream with the history information coded in the recovery journal of the packet that ends the loss event. By noting the differences in these two versions of the past, a receiver is able to transform all indefinite artifacts in the rendered performance into transient artifacts, by executing MIDI commands to repair the stream. Lazzaro/Wawrzynek [Page 24] INTERNET-DRAFT 25 April 2005 We now state the normative role for senders in the recovery journal system. Senders prepare a recovery journal for every packet in the stream. In doing so, senders choose the checkpoint packet identity for the journal. Senders make this choice by applying a sending policy. Appendix C.1.2 normatively defines three sending policies: "closed-loop", "open-loop", and "anchor". By default, senders MUST use the closed-loop sending policy. If the session description overrides this default policy, by using the MIME parameter j_update defined in Appendix C.1.2, senders MUST use the specified policy. After choosing the checkpoint packet identity for a packet, the sender creates the recovery journal. By default, this journal MUST conform to the normative semantics in Section 5 and Appendices A-B in this memo. In Appendix C.1.3, we define MIME parameters that modify the normative semantics for recovery journals. If the session description uses these parameters, the journal created by the sender MUST conform to the modified semantics. Next, we state the normative role for receivers in the recovery journal system. A receiver MUST detect each RTP sequence number break in a stream. If the sequence number break is due to a packet loss event (as defined in [2]) the receiver MUST repair all indefinite artifacts in the rendered MIDI performance caused by the loss. If the sequence number break is due to an out-of-order packet (as defined in [2]) the receiver MUST NOT take actions that introduce indefinite artifacts (ignoring the out-of- order packet is a safe option). Receivers take special precautions when entering or exiting a session. A receiver MUST process the first received packet in a stream as if it were a packet that ends a loss event. Upon exiting a session, a receiver MUST ensure that the rendered MIDI performance does not end with indefinite artifacts. Receivers are under no obligation to perform indefinite artifact repairs at the moment a packet arrives. A receiver that uses a playout buffer may choose to wait until the moment of rendering before processing the recovery journal, as the "lost" packet may be a late packet that arrives in time to use. Lazzaro/Wawrzynek [Page 25] INTERNET-DRAFT 25 April 2005 Next, we state the normative role for the creator of the session description in the recovery journal system. Depending on the application, the sender, the receivers, and other parties may take part in creating or approving the session description. A session description that specifies the default closed-loop sending policy and the default recovery journal semantics satisfies the recovery journal mandate. However, these default behaviors may not be appropriate for all sessions. If the creators of a session description use the parameters defined in Appendix C.1 to override these defaults, the creators MUST ensure that the parameters define a system that satisfy the recovery journal mandate. Finally, we note that this memo does not specify sender or receiver recovery journal algorithms. Implementations are free to use any algorithm that conforms to the requirements in this section. The non- normative [22] discusses sender and receiver algorithm design. 5. Recovery Journal Format This section introduces the structure of the recovery journal, and defines the bitfields of recovery journal headers. Appendices A-B complete the bitfield definition of the recovery journal. The recovery journal has a three-level structure: o Top-level header. o Channel and system journal headers. Encodes recovery information for a single voice channel (channel journal) or for all systems commands (system journal). o Chapters. Describes recovery information for a single MIDI command type. Figure 7 shows the top-level structure of the recovery journal. The recovery journals consists of a 3-octet header, followed by an optional system journal (labeled S-journal in Figure 7) and an optional list of channel journals. Figure 8 shows the recovery journal header format. Lazzaro/Wawrzynek [Page 26] INTERNET-DRAFT 25 April 2005 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Recovery journal header | S-journal ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Channel journals ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7 -- Top-level recovery journal format 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|Y|A|R|TOTCHAN| Checkpoint Packet Seqnum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8 -- Recovery journal header If the Y header bit is set to 1, the system journal appears in recovery journal, directly following the recovery journal header. If the A header bit is set to 1, the recovery journal ends with a list of (TOTCHAN + 1) channel journals (the 4-bit TOTCHAN header field is interpreted as an unsigned integer). A MIDI channel MAY be represented by (at most) one channel journal in a recovery journal. Channel journals MUST appear in the recovery journal in ascending channel-number order. If A and Y are both zero, the recovery journal only contains its 3-octet header, and is considered to be an "empty" journal. The S (single-packet loss) bit appears in most recovery journal structures, including the recovery journal header. The S bit helps receivers efficiently parse the recovery journal in the common case of the loss of a single packet. Appendix A.1 defines S bit semantics. The R header bit is reserved. The semantics for R bits are uniform throughout the recovery journal, and are defined in Appendix A.1. The 16-bit Checkpoint Packet Seqnum header field codes the sequence number of the checkpoint packet for this journal, in network byte order (big-endian). The choice of the checkpoint packet sets the depth of the checkpoint history for the journal (defined in Appendix A.1). Lazzaro/Wawrzynek [Page 27] INTERNET-DRAFT 25 April 2005 Receivers may use the Checkpoint Packet Seqnum field of the packet that ends a loss event to verify that the journal checkpoint history covers the entire loss event. The checkpoint history covers the loss event if the Checkpoint Packet Seqnum field is less than or equal to one plus the highest RTP sequence number previously received on the stream (modulo 2^16). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| CHAN |R| LENGTH |P|C|M|W|N|E|T|A| Chapters ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9 -- Channel journal format Figure 9 shows the structure of a channel journal: a 3-octet header, followed by a list of leaf elements called channel chapters. A channel journal encodes information about MIDI commands on the MIDI channel coded by the 4-bit CHAN header field. Note that CHAN uses the same bit encoding as the channel nibble in MIDI Channel Messages (the cccc field in Figure E.1 of Appendix E). The 10-bit LENGTH field codes the length of the channel journal. The semantics for LENGTH fields are uniform throughout the recovery journal, and are defined in Appendix A.1. The third octet of the channel journal header is the Table of Contents (TOC) of the channel journal. The TOC is a set of bits that encode the presence of a chapter in the journal. Each chapter contains information about a certain class of MIDI channel command: o Chapter P: MIDI Program Change (0xC) o Chapter C: MIDI Control Change (0xB) o Chapter M: MIDI Parameter System (part of 0xB) o Chapter W: MIDI Pitch Wheel (0xE) o Chapter N: MIDI NoteOff (0x8), NoteOn (0x9) o Chapter E: MIDI Note Command Extras (0x8, 0x9) o Chapter T: MIDI Channel Aftertouch (0xD) o Chapter A: MIDI Poly Aftertouch (0xA) Chapters appear in a list following the header, in order of their appearance in the TOC. Appendices A.2-9 describe the bitfield format for each chapter, and define the conditions under which a chapter type MUST appear in the recovery journal. If any chapter types are required for a channel, an associated channel journal MUST appear in the recovery journal. Lazzaro/Wawrzynek [Page 28] INTERNET-DRAFT 25 April 2005 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|D|V|Q|F|X| LENGTH | System chapters ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10 -- System journal format Figure 10 shows the structure of the system journal: a 2-octet header, followed by a list of system chapters. Each chapter codes information about a specific class of MIDI Systems command: o Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset (0xFF), undefined System commands (0xF4, 0xF5, 0xF9, 0xFD) o Chapter V: Active Sense (0xFE) o Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC) o Chapter F: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01) o Chapter X: System Exclusive (all other 0xF0) The 10-bit LENGTH field codes the size of the system journal, and conforms to semantics described in Appendix A.1. The D, V, Q, F, and X header bits form a Table of Contents (TOC) for the system journal. A TOC bit that is set to 1 codes the presence of a chapter in the journal. Chapters appear in a list following the header, in the order of their appearance in the TOC. Appendix B describes the bitfield format for the system chapters, and define the conditions under which a chapter type MUST appear in the recovery journal. If any system chapter type is required to appear in the recovery journal, the system journal MUST appear in the recovery journal. 6. Session Description Protocol RTP does not perform session management. Instead, RTP works together with session management tools, such as the Session Initiation Protocol (SIP, [19]) and the Real Time Streaming Protocol (RTSP, [20]). RTP interacts with session management tools by defining media type parameters. In many cases, session management tools use the media type parameters via another standard, the Session Description Protocol (SDP, [6]). SDP is a textual format for specifying session descriptions. Session descriptions specify the network transport and media encoding for RTP sessions. Session management tools coordinate the exchange of Lazzaro/Wawrzynek [Page 29] INTERNET-DRAFT 25 April 2005 session descriptions between participants. Some session management tools use SDP to negotiate details of media transport (network addresses, ports, etc). We refer to this use of SDP as "negotiated usage". One example of negotiated usage is the Offer/Answer protocol ([13], and Appendix C.6.3 in this memo) as used by SIP. Other session management tools use SDP to declare the media encoding for the session, but use other techniques to negotiate network transport. We refer to this use of SDP as "declarative usage". One example of declarative usage is RTSP ([20], and Appendices C.6.1-2 in this memo). Below, we show session description examples for native (Section 6.1) and mpeg4-generic (Section 6.2) streams. In Section 6.3, we introduce session configuration tools that may be used to customize streams. 6.1 Session Descriptions for Native Streams The session description below shows a minimal native RTP MIDI stream sent over unicast UDP transport. v=0 o=lazzaro 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP4 192.0.2.94 a=rtpmap:96 rtp-midi/44100 The rtpmap attribute line uses the rtp-midi MIME type to specify a native stream. The clock rate specified on the rtpmap line (in the example above, 44100 Hz) sets the scaling for the RTP timestamp header field (see Section 2.1, and also [2]). Note that this document does not specify a default clock rate value for RTP MIDI. When RTP MIDI is used with SDP, parties MUST use the rtpmap line to communicate the clock rate. We consider the RTP MIDI stream shown above to be "minimal" because the session description does not customize the stream with parameters. Without such customization, a native RTP MIDI stream has these characteristics: 1. If the stream uses unreliable transport (unicast UDP, multicast UDP, ...), the recovery journal system is in use, and the RTP payload contains both the MIDI command section and the journal section. If the stream uses reliable transport (such as TCP), the stream does not use journalling, and the payload contains Lazzaro/Wawrzynek [Page 30] INTERNET-DRAFT 25 April 2005 only the MIDI command section (Section 2.2). 2. If the stream uses the recovery journal system, the recovery journal system uses the default sending policy and the default journal semantics (Section 4). 3. In the MIDI command section of the payload, command timestamps use the default semantics (Section 3). 4. The recommended temporal duration ("media time") an RTP packet ranges from 0 to 200 ms, and the RTP timestamp difference between sequential packets in the stream may be arbitrarily large (Section 2.1). 5. For each party, only one media description ("m=") that contains a payload type which is mapped to the rtp-midi media type may appear in the rtp-midi RTP session for the multimedia session. As described in Section 2.1, only one such RTP session may appear in the multimedia session. In this media description, only only payload type may appear which is mapped to the rtp-midi media type. 6. The rendering method for the stream is not specified. As in standard in RTP, RTP sessions managed by SIP are sendrecv by default (parties send and receive MIDI), and RTP sessions managed by RTSP are recvonly by default (server sends and client receives). In sendrecv RTP MIDI sessions, the 16 voice channel + systems MIDI name space is unique for each sender. Thus, in a two party session, the voice channel 0 sent by one party is distinct from the voice channel 0 sent by the other party. This behavior corresponds to what occurs when two MIDI 1.0 DIN devices are cross connected with two MIDI cables (one cable routing MIDI Out from the first device into MIDI In of the second device, a second cable routing MIDI In from the first device into MIDI Out of the second device). MIDI 1.0 DIN networks may be configured in a "party-line" multicast topology. For these networks, the MIDI protocol itself provides tools for addressing specific devices in transactions on a multicast network, and for device discovery. Thus, apart from providing a 1-to-many forward path and a many-to-1 reverse path, IETF protocols do not need to provide any special support for MIDI multicast networking. Lazzaro/Wawrzynek [Page 31] INTERNET-DRAFT 25 April 2005 6.2 Session Descriptions for mpeg4-generic Streams An mpeg4-generic [4] RTP MIDI stream uses an MPEG 4 Audio Object Type to render MIDI into audio. Three Audio Object Types accept MIDI input: o General MIDI (Audio Object Type ID 15), based on the General MIDI rendering standard [1]. o Wavetable Synthesis (Audio Object Type ID 14), based on the Downloadable Sounds Level 2 (DLS 2) rendering standard [9]. o Main Synthetic (Audio Object Type ID 13), based on Structured Audio and the programming language SAOL [5]. The primary service of an mpeg4-generic stream is to code Access Units (AUs). We define the mpeg4-generic RTP MIDI AU as the MIDI payload shown in Figure 1 of Section 2.1 of this memo: a MIDI command section optionally followed by a journal section. Exactly one RTP MIDI AU MUST be mapped to one mpeg4-generic RTP MIDI packet. The mpeg4-generic options for placing several AUs in an RTP packet MUST NOT be used with RTP MIDI. The mpeg4-generic options for fragmenting and interleaving AUs MUST NOT be used with RTP MIDI. The mpeg4-generic RTP packet payload (Figure 1 in [4]) MUST contain empty AU Header and Auxiliary sections. These rules yield mpeg4-generic packets that are structurally identical to native RTP MIDI packets, an essential property for the correct operation of the payload format. The session description shows a minimal mpeg4-generic RTP MIDI stream sent over unicast UDP transport. This example uses the General MIDI Audio Object Type under Synthesis Profile @ Level 2. v=0 o=lazzaro 2520644554 2838152170 IN IP6 first.example.net s=Example t=0 0 m=audio 5004 RTP/AVP 96 c=IN IP6 FF1E:03AD::7F2E:172A:1E24 a=rtpmap:96 mpeg4-generic/44100 a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12; config=7A0A0000001A4D546864000000060000000100604D54726B0000 000600FF2F000 (The a=fmtp line has been wrapped to fit the page to accommodate memo formatting restrictions; it comprises a single line in SDP) The fmtp attribute line codes the four parameters (streamtype, mode, profile-level-id, and config) that are required in all mpeg4-generic Lazzaro/Wawrzynek [Page 32] INTERNET-DRAFT 25 April 2005 session descriptions [4]. For RTP MIDI streams, the streamtype parameter MUST be set to 5, the "mode" parameter MUST be set to "rtp- midi", and the "profile-level-id" parameter MUST be set to the MPEG-4 Profile Level for the stream. For the Synthesis Profile, legal profile- level-id values are 11, 12, and 13, coding low (11), medium (12), or high (13) decoder computational complexity, as defined by MPEG conformance tests. In a minimal RTP MIDI session description, the config value MUST be a hexadecimal encoding [4] of the AudioSpecificConfig data block [7] for the stream. AudioSpecificConfig encodes the Audio Object Type for the stream, and also encodes initialization data (SAOL programs, DLS 2 wave tables, etc). Standard MIDI Files encoded in AudioSpecificConfig in a minimal session description MUST be ignored by the receiver. Receivers determine the rendering algorithm for the session by interpreting the first 5 bits of AudioSpecificConfig as an unsigned integer that codes the Audio Object Type. In our example above, the leading config string nibbles "7A" yield the Audio Object Type 15 (General MIDI). In Appendix E.4, we derive the config string value in the session description shown above; the starting point of the derivation is the MPEG bitstreams defined in [5] and [7]. We consider the stream to be "minimal" because the session description does not customize the stream through the use of parameters, other than the 4 required mpeg4-generic parameters described above. In Section 6.1, we describe the behavior of a minimal native stream, as a numbered list of characteristics. Items 1-4 on that list also describe the minimal mpeg4-generic stream, but items 5 and 6 require restatements, as listed below: 5. For each party, only one media description ("m=") that contains a payload type which is mapped to the mpeg4-generic media type under mode rtp-midi may appear in the RTP session for mpeg4-generic mode rtp-midi in the multimedia session. As described in Section 2.1, only one such RTP session may appear in the multimedia session. In this media description, only only payload type may appear which is mapped to the mpeg4-generic mode rtp-midi media type. 6. A minimal mpeg4-generic stream encodes the AudioSpecificConfig as an inline hexadecimal constant. If session description is sent over UDP, it may be impossible to transport large AudioSpecificConfig blocks within the Maximum Transmission Size (MTU) of the underlying network (for Ethernet, the MTU is 1500 octets). In some cases, the AudioSpecificConfig block may exceed the maximum size of the UDP packet itself. Lazzaro/Wawrzynek [Page 33] INTERNET-DRAFT 25 April 2005 The Section 6.1 comments on SIP and RTSP stream directional defaults, sendrecv MIDI channel usage and MIDI 1.0 DIN multicast networks also apply to mpeg4-generic RTP MIDI sessions. In sendrecv sessions, each party's session description MUST use identical values for the mpeg4-generic MIME parameters (including the required streamtype, mode, profile-level-id, and config parameters). As a consequence, each party uses an identically-configured MPEG 4 Audio Object Type to render MIDI commands into audio. The preamble to Appendix C discusses a way to create "virtual sendrecv" sessions that do not have this restriction. Lazzaro/Wawrzynek [Page 34] INTERNET-DRAFT 25 April 2005 6.3 Parameters This section introduces parameters for session configuration for RTP MIDI streams. Parameters are applied on a per-payload-type basis (signalled by the PT field in RTP headers, and by the payload type number in media lines in session descriptions). The parameters add features to the minimal streams described in Sections 6.1-2, and support several types of services: o Journal customization. The j_sec and j_update parameters configure the use of the journal section. The ch_default, ch_unused, ch_never, ch_anchor, and ch_active parameters configure the semantics of the recovery journal chapters, and also define the capabilities of the stream, by declaring the subset of MIDI commands that may appear in the stream. These MIME parameters are described in Appendix C.1, and override the default stream behaviors 1 and 2 listed in Section 6.1 and referenced in Section 6.2. o MIDI command timestamp semantics. The tsmode, octpos, mperiod, and linerate parameters customize the semantics of timestamps in the MIDI command section. These parameters let RTP MIDI accurately encode the implicit time coding of MIDI 1.0 DIN cables. These MIME parameters are described in Appendix C.2, and override default stream behavior 3 listed in Section 6.1 and referenced in Section 6.2 o Media time. The rtp_ptime and rtp_maxptime parameters define the temporal duration ("media time") of an RTP MIDI packet. The guardtime parameter sets the minimum sending rate of stream packets. These MIME parameters are described in Appendix C.3, and override default stream behavior 4 listed in Section 6.1 and referenced in Section 6.2. o Stream description. The musicport parameter labels the MIDI name space of multi-stream RTP sessions. Musicport is described in Appendix C.4. The musicport parameter overrides default stream behavior 5 in Sections 6.1 and 6.2. o MIDI rendering. Several MIME parameters specify the MIDI rendering method of a stream. These parameters are described in Appendix C.5, and override default stream behavior 6 in Sections 6.1 and 6.2. In Appendix C.6, we specify interoperability guidelines for three RTP MIDI application areas: content-streaming using RTSP (Appendix C.6.1), RTSP-based stage and studio devices (Appendix C.6.2), and network musical performance using SIP (Appendix C.6.3), Lazzaro/Wawrzynek [Page 35] INTERNET-DRAFT 25 April 2005 7. Extensibility The payload format defined in this memo exclusively encodes all commands that may legally appear on a MIDI 1.0 DIN cable. Many worthy uses of MIDI over RTP do not fall within the narrow scope of the payload format. For example, the payload format does not support the direct transport of Standard MIDI File (SMF) meta-event and metric timing data. As a second example, the payload format does not define transport tools for user-defined commands (apart from tools to support System Exclusive commands [1]). The payload format does not provide an extension mechanism to support new features of this nature, by design. Instead, we encourage the development of new payload formats for specialized musical applications. The IETF session management tools [13] [20] support codec negotiation, to facilitate the use of new payload formats in a backward-compatible way. However, the payload format does provide several extensibility tools, which we list below: o Journalling. As described in Appendix C.1, new token values for the j_sec and j_update MIME parameters may be defined in IETF standards-track documents. This mechanism supports the design of new journal formats and the definition of new journal sending policies. o Rendering. The payload format may be extended to support new MIDI renderers (Appendix C.5.2). The extension mechanism uses the standard MIME registration process [24]. Certain general aspects of the RTP MIDI rendering process may also be extended, via IETF standards-track documents that define new token values for the render (Appendix C.5) and smf_info (Appendix C.5.4.1) MIME parameters. o Undefined commands. [1] reserves 4 MIDI System commands for future use (0xF4, 0xF5, 0xF9, 0xFD). If updates to [1] define the reserved commands, IETF standards-track documents may be defined to provide resiliency support for the commands. Opaque LEGAL fields appear in System Chapter D for this purpose (Appendix B.1.1). Lazzaro/Wawrzynek [Page 36] INTERNET-DRAFT 25 April 2005 A final form of extensibility involves the inclusion of the payload format in framework documents. Framework documents describe how to combine protocols to form a platform for interoperable applications. For example, a network musical performance [17] framework might define how to use SIP [19], SDP [6] and RTP [2] to support real-time performances between geographically-distributed players. We discuss frameworks from an interoperability perspective in Appendix C.6. 8. Congestion Control The RTP congestion control requirements defined in [2] apply to RTP MIDI sessions, and implementors should carefully read the congestion control section in [2]. As noted in [2], all transport protocols used on the Internet need to address congestion control in some way, and RTP is not an exception. In addition, as RTP MIDI runs under the Audio/Video Profile [3], the congestion control requirements defined in [3] apply to RTP MIDI sessions. The basic congestion control requirement defined in [3] is that RTP sessions that use UDP transport should monitor packet loss (via RTCP, or via other means) to ensure that the RTP stream competes fairly with TCP flows that share the network. Finally, RTP MIDI has congestion control issues that are unique for an audio RTP payload format. In applications such as network musical performance [17], the packet rate is linked to the gestural rate of a human performer. Senders MUST monitor the MIDI command source for patterns that result in excessive packet rates, and take actions during RTP transcoding to reduce the RTP packet rate. [22] offers implementation guidance on this issue. Lazzaro/Wawrzynek [Page 37] INTERNET-DRAFT 25 April 2005 A. The Recovery Journal Channel Chapters A.1 Recovery Journal Definitions This Appendix defines the terminology and the coding idioms that are used in the recovery journal bitfield descriptions in Section 5 (journal header structure), Appendices A.2-9 (channel journal chapters) and Appendices B.1-5 (system journal chapters). We assume that the recovery journal resides in the journal section of an RTP packet with sequence number I ("packet I") and that the Checkpoint Packet Seqnum field in the top-level recovery journal header refers to a previous packet with sequence number C (an exception is the self- referential C = I case). Unless stated otherwise, algorithms are assumed to use modulo 2^16 arithmetic for calculations on 16-bit sequence numbers and modulo 2^32 arithmetic for calculations on 32-bit extended sequence numbers. Several bitfield coding idioms appear throughout the recovery journal system, with consistent semantics. Most recovery journal elements begin with an "S" (Single-packet loss) bit. S bits are designed to help receivers efficiently parse through the recovery journal hierarchy in the common case of the loss of a single packet. As a rule, S bits MUST be set to 1. However, an exception applies if a recovery journal element in packet I encodes data about a command stored in the MIDI command section of packet I - 1. In this case, the S bit of the recovery journal element MUST be set to 0. If a recovery journal element has its S bit set to 0, all higher-level recovery journal elements that contain it MUST also have S bits that are set to 0, including the top-level recovery journal header. Other consistent bitfield coding idioms are described below: o R flag bit. R flag bits are reserved for future use. Senders MUST set R bits to 0. Receivers MUST ignore R bit values. o LENGTH field. All fields named LENGTH (as distinct from LEN) code the number of octets in the structure that contains it, including the header it resides in and all hierarchical levels below it. If a structure contains a LENGTH field, a receiver MUST use the LENGTH field value to advance past the structure during parsing, rather than use knowledge about the internal format of the structure. Lazzaro/Wawrzynek [Page 38] INTERNET-DRAFT 25 April 2005 We now define normative terms used to describe recovery journal semantics. o Checkpoint history. The checkpoint history of a recovery journal is the concatenation of the MIDI command sections of packets C through I - 1. The final command in the MIDI command section for packet I - 1 is considered the most recent command; the first command in the MIDI command section for packet C is the oldest command. If command X is less recent than command Y, X is considered to be "before Y". A checkpoint history with no commands is considered to be empty. The checkpoint history never contains the MIDI command section of the packet I (the packet containing the recovery journal), so if C == I, the checkpoint history is empty by definition. o Session history. The session history of a recovery journal is the concatenation of MIDI command sections from the first packet of the session up to packet I - 1. The definitions of command recency and history emptiness follow those in the checkpoint history. The session history never contains the MIDI command section of packet I, and so the session history of the first packet in the session is empty by definition. o Finished/unfinished commands. If all octets of a MIDI command appear in the session history, the command is defined to be finished. If some but not all octets of a command appear in the session history, the command is defined to be unfinished. Unfinished commands occur if segments of a SysEx command appear in several RTP packets. For example, if a SysEx command is coded as 3 segments, with segment 1 in packet K, segment 2 in packet K + 1, and segment 3 in packet K + 2, the session histories for packets K + 1 and K + 2 contain unfinished versions of the command. A session history contains a finished version of a cancelled SysEx command if the history contains the cancel sublist for the command. o Reset State commands. Reset State (RS) commands reset renderers to an initialized "powerup" condition. The RS commands are: System Reset (0xFF), General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI 2 System Enable (0xF0 0x7E 0xcc 0x09 0x03 0xF7), General MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7), Turn DLS On (0xF0 0x7E 0xcc 0x0A 0x01 0xF7) and Turn DLS Off (0xF0 0x7E 0xcc 0x0A 0x02 0xF7). MIME registrations for renderers (Appendix C.5.2) and IETF standards-track documents MAY specify additional RS commands. o Active commands. Active command are MIDI commands that do not appear before a Reset State command in the session history. Lazzaro/Wawrzynek [Page 39] INTERNET-DRAFT 25 April 2005 o N-active commands. N-active commands are MIDI commands that do not appear before one of the following commands in the session history: MIDI Control Change numbers 123-127 (numbers with All Notes Off semantics) or 120 (All Sound Off), and any Reset State command. o C-active commands. C-active commands are MIDI commands that do not appear before one of the following commands in the session history: MIDI Control Change number 121 (Reset All Controllers) and any Reset State command. o Oldest-first ordering rule. Several recovery journal chapters contain a list of elements, where each element is associated with a MIDI command that appears in the session history. In most cases, the chapter definition requires that list elements be ordered in accordance with the "oldest-first ordering rule". Below, we normatively define this rule: Elements associated with the most recent command in the session history coded in the list MUST appear at the end of the list. Elements associated with the oldest command in the session history coded in the list MUST appear at the start of the list. All other list elements MUST be arranged with respect to these boundary elements, to produce a list ordering that strictly reflects the relative session history recency of the commands coded by the elements in the list. o Parameter system. A MIDI feature that provides two sets of 16,384 parameters to expand the 0-127 controller number space. The Registered Parameter Names (RPN) system and the Non-Registered Parameter Names (NRPN) system each provides 16,384 parameters. o Parameter system transaction. The value of RPNs and NRPNs are changed by a series of Control Change commands that form a parameter system transaction. A canonical transaction begins with two Control Change commands to set the parameter number (controller numbers 99 and 98 for NRPNs, controller numbers 101 and 100 for RPNs). The transaction continues with an arbitrary number of Data Entry (controller numbers 6 and 38), Data Increment (controller number 96), and Data Decrement (controller number 97) Control Change commands to set the parameter value. The transaction ends with a second pair of (99, 98) or (101, 100) Control Change commands that specify the null parameter (MSB value 0x7F, LSB value 0x7F). Lazzaro/Wawrzynek [Page 40] INTERNET-DRAFT 25 April 2005 Several variants of the canonical transaction sequence are possible. Most commonly, the terminal pair of (99, 98) or (101, 100) Control Change commands may specify a parameter other than the null parameter. In this case, the command pair terminates the first transaction and starts a second transaction. The command pair is considered to be a part both transactions. This variant is legal and recommended in [1]. We refer to this variant as a "type 1 variant". Less commonly, the MSB (99 or 101) or LSB (98 or 100) command of a (99, 98) or (101, 100) Control Change pair may be omitted. If the MSB command is omitted, the transaction uses the MSB value of the most recent C-active Control Change command for controller number 99 or 101 that appears in the session history. We refer to this variant as a "type 2 variant". If the LSB command is omitted, the LSB value 0x00 is assumed. We refer to this variant as a "type 3 variant". The type 2 and type 3 variants are defined as legal, but are not recommended, in [1]. System real-time commands may appear at any point during a transaction (even between octets of individual commands in the transaction). More generally, [1] does not forbid the appearance of unrelated MIDI commands during an open transaction. These commands are considered to be "outside" the transaction, and do not effect the status of the transaction in any way. In particular, these "outside" commands do not terminate the transaction (excepting a Reset State command whose semantics act to reset the transaction system). o Initiated parameter system transaction. A canonical parameter system transaction whose (99, 98) or (101, 100) initial Control Change command pair appears in the session history is considered to be an initiated parameter system transaction. This definition also holds for type 1 variants. For type 2 variants (dropped MSB), a transaction whose initial LSB Control Change command appears in the session history is an initiated transaction. For type 3 variants (dropped LSB), a transaction is considered to be initiated if at least one transaction command follows the initial MSB (99 or 101) Control Change command in the session history. The completion of a transaction does not nullify its "initiated" status. o Session history reference counts. Several recovery journal chapters include a reference count field, which codes the total number of commands of a type that appear in the session history. Examples include the Reset and Tune Request command Lazzaro/Wawrzynek [Page 41] INTERNET-DRAFT 25 April 2005 logs (Chapter D, Appendix B.1) and the Active Sense command (Chapter V, Appendix B.2). Upon the detection of a loss event, reference count fields let a receiver deduce if ANY instances of the command have been lost, by comparing the journal reference count with its own reference count. Thus, a reference count field makes sense, even for command types in which knowing the NUMBER of lost commands is irrelevant (as is true with all of the example commands mentioned above). The chapter definitions in Appendices A.2-9 and B.1-5 reflect the default recovery journal behavior. The ch_default, ch_unused, ch_never, ch_anchor, and ch_active parameters modify these definitions, as described in Appendix C.1.3. The chapter definitions specify if data MUST be present in the journal. Senders MAY also include non-required data in the journal. This optional data MUST comply with the normative chapter definition. For example, if a chapter definition states that a field codes data from the most recent active command in the session history, the sender MUST NOT code inactive commands or older commands in the field. Finally, we note that a channel journal only encodes information about MIDI commands appearing on the MIDI channel the journal protects. All references to MIDI commands in Appendices A.2-9 should be read as "MIDI commands appearing on this channel." Lazzaro/Wawrzynek [Page 42] INTERNET-DRAFT 25 April 2005 A.2 Chapter P: MIDI Program Change A channel journal MUST contain Chapter P if an active Program Change (0xC) command appears in the checkpoint history. Figure A.2.1 shows the format for Chapter P. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| PROGRAM |B| BANK-MSB |X| BANK-LSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.2.1 -- Chapter P format The chapter has a fixed size of 24 bits. The PROGRAM field indicates the data value of the most recent active Program Change command in the session history. By default, the B, BANK-MSB, X, and BANK-LSB fields MUST be set to 0. Below, we define exceptions to this default condition. If an active Control Change (0xB) command for controller number 0 (Bank Select MSB) appears before the Program Change command in the session history, the B bit MUST be set to 1, and the BANK-MSB field MUST code the data value of the Control Change command. If B is set to 1, the BANK-LSB field MUST code the data value of the most recent Control Change command for controller number 32 (Bank Select LSB) that preceded the Program Change command coded in the PROGRAM field and followed the Control Change command coded in the BANK-MSB field. If no such Control Change command exists, the BANK-LSB field MUST be set to 0. If B is set to 1, and if a Control Change command for controller number 121 (Reset All Controllers) appears in the MIDI stream between the Control Change command coded by the BANK-MSB field and the Program Change command coded by the PROGRAM field, the X bit MUST be set to 1. Lazzaro/Wawrzynek [Page 43] INTERNET-DRAFT 25 April 2005 A.3 Chapter C: MIDI Control Change Readers may wish to review the Appendix A.1 definition of "C-active commands" before reading this Appendix. Figure A.3.1 shows the format for Chapter C. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| LEN |S| NUMBER |A| VALUE/ALT |S| NUMBER | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |A| VALUE/ALT | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.1 -- Chapter C format The chapter consists of a 1-octet header, followed by a variable length list of 2-octet controller logs. The list MUST contain at least one controller log. The 7-bit LEN field codes the number of controller logs in the list, minus one. We define the semantics of the controller log fields in Appendix A.3.2. A channel journal MUST contain Chapter C if the rules defined in this Appendix require that one or more controller logs appear in the list. A.3.1 Log Inclusion Rules If a C-active Control Change command for a controller number in the range 0-119 appears in the checkpoint history, the list MUST contain a controller log for the number, with possible exceptions for numbers 0, 6, 32-63 and 96-101. If an active Control Change command for a controller number in the range 120-127 appears in the checkpoint history, the list MUST contain a controller log for the number, with possible exceptions for numbers 124-127. We now define the rules for the exceptions. o MIDI streams may transmit 14-bit controller values using paired Most Significant Byte (MSB, controller numbers 0-31, 99, 101) and Least Significant Byte (LSB, controller numbers 32-63, 98, 100) Control Change commands [1]. If the most recent C-active Control Change command in the session Lazzaro/Wawrzynek [Page 44] INTERNET-DRAFT 25 April 2005 history for a 14-bit controller pair uses the MSB number, Chapter C MAY omit the controller log for the associated LSB number, as the command ordering makes this LSB value irrelevant. However, this exception MUST NOT be applied if the sender is not certain that the MIDI source uses 14-bit semantics for the controller number pair. Note that some MIDI sources ignore 14-bit controller semantics, and use the LSB controller numbers as independent 7-bit controllers. o If C-active Control Change commands for controller numbers 0 (Bank Select MSB) or 32 (Bank Select LSB) appear in the checkpoint history, and if the command instances are also coded in the BANK-MSB and BANK-LSB fields of the Chapter P (Appendix A.2), Chapter C MAY omit the controller logs for the commands. o Several controller numbers pairs are defined to be mutually exclusive. Controller numbers 124 (Omni Off) and 125 (Omni On) form a mutually exclusive pair, as do controller numbers 126 (Mono) and 127 (Poly). If active Control Change commands for one or both members of a mutually exclusive pair appear in the checkpoint history, a log for the controller number of the most recent command for the pair in the checkpoint history MUST appear in the controller list. However, the list MAY omit the controller log for the other number in the pair. If active Control Change commands for one or both members of a mutually exclusive pair appear in the session history, and a log for the controller number of the most recent command for the pair does not appear in the controller list, a log for the other number of the pair MUST NOT appear in the controller list. o Appendix A.3.4 defines exception rules for the MIDI Parameter System controller numbers 6, 38, and 96-101. The ch_active MIME parameter (Appendix C.1.3) may be used to change Chapter C semantics to support MIDI renderers (such as [9] in certain configurations) that exclude particular controller numbers from the semantics of Control Change commands for controller 121 (Reset All Controllers). Appendix C.1.3 defines how ch_active modifies Chapter C semantics. Lazzaro/Wawrzynek [Page 45] INTERNET-DRAFT 25 April 2005 A.3.2 Controller Log Format Figure A.3.2 shows the controller log structure of Chapter C. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NUMBER |A| VALUE/ALT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.2 -- Chapter C controller log The 7-bit NUMBER field identifies the controller number. The 7-bit VALUE/ALT field codes recovery information for the controller number. The A bit defines the coding format of the VALUE/ALT field. Chapter C provides three tools for coding recovery information in the VALUE/ALT field: the value tool, the toggle tool, and the count tool. Implementations may choose among the tools to best code recovery information for a particular controller number. In the value tool, the 7-bit VALUE/ALT field codes the control value of the most recent C-active (controller numbers 0-119) or active (controller numbers 120-127) Control Change command in the session history. This tool works best for controllers that code a continuous quantity, such as number 1 (Modulation Wheel). If the value tool is chosen, the A bit is set to 0. The A bit is set to 1 to code the toggle or count tool. These tools work best for controllers that code discrete actions. Figure A.3.3 shows the controller log for these tools. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NUMBER |1|T| ALT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.3 -- Controller log for ALT tools The T flag is set to 1 to code the toggle tool; T is set to 0 to code the count tool. Both methods use the 6-bit ALT field as an unsigned integer. Lazzaro/Wawrzynek [Page 46] INTERNET-DRAFT 25 April 2005 The toggle tool works best for controllers that act as on/off switches, such as 64 (Damper Pedal (Sustain)). These controllers code the "off" state with control values 0-63 and the "on" state with 64-127. The ALT field codes the total number of toggles (off->on and on->off) due to Control Change commands in the session history, including toggle events caused by Control Change commands for controller number 121 (Reset All Controllers). Toggle counting is performed modulo 64. The toggle count is reset at the start of a session, and whenever a Reset State command (Appendix A.1) appears in the session history. When these reset events occur, the toggle count for a controller is set to 0 (for controllers whose default value is 0-63) or 1 (for controllers whose default value is 64-127). The Damper Pedal (Sustain) controller illustrates the benefits of the toggle tool over the value tool for switch controllers. As often used in piano applications, the "on" state of the controller lets notes resonate, while the "off" state immediately damps notes to silence. The loss of the "off" command in an "on->off->on" sequence results in ringing notes that should have been damped silent. The toggle tool lets receivers detect this lost "off" command but the value tool does not. The count tool is similar to the toggle tool, but is optimized for controllers whose controller value is ignored, such as number 123 (All Notes Off). For the count tool, the ALT field codes the total number of Control Change commands in the session history. Command counting is performed modulo 64. The command count is set to 0 at the start of the session, and is reset to 0 whenever a Reset State command (Appendix A.1) appears in the session history. A.3.3 Log List Coding Rules In this section, we describe the organization of controller logs in the Chapter C log list. In most situations, a controller number SHOULD be coded by a single tool (and thus, a single controller log). If a number is coded with a single tool, and this tool is the count tool, recovery Control Change commands generated by a receiver SHOULD use the default control value for the controller. A controller number MAY be coded by several tool types (and thus, several controller logs, each using a different tool). This technique may improve recovery performance for controllers with complex semantics, such as controller number 84 (Portamento Control), or controller number 121 (Reset All Controllers) when used with a non-zero data octet (with Lazzaro/Wawrzynek [Page 47] INTERNET-DRAFT 25 April 2005 the semantics described in [9]). However, multiple logs for the same controller number that use the SAME tool type MUST NOT appear in the controller list. The Chapter C log list MUST obey the oldest-first ordering rule (defined in Appendix A.1). Note that this ordering codes the information necessary for the recovery of 14-bit controller values, without precluding the use of MSB and LSB controller pairs as independent 7-bit controllers. A.3.4 The Parameter System Readers may wish to review the Appendix A.1 definitions of "parameter system", "parameter system transaction", and "initiated parameter system transaction" before reading this section. Parameter system transactions update a MIDI Registered Parameter Number (RPN) or Non-Registered Parameter Number (NRPN) value. A parameter system transaction is a sequence of Control Change commands that may use the following controllers numbers: o Data Entry MSB (6) o Data Entry LSB (38) o Data Increment (96) o Data Decrement (97) o Non-Registered Parameter Number (NRPN) LSB (98) o Non-Registered Parameter Number (NRPN) MSB (99) o Registered Parameter Number (RPN) LSB (100) o Registered Parameter Number (RPN) MSB (101) Control Change commands that are a part of a parameter system transaction MUST NOT be coded in Chapter C controller logs. Instead, these commands are coded in Chapter M, the MIDI Parameter chapter defined in Appendix A.4. However, Control Change commands that use the listed controllers as general-purpose controllers (i.e. outside of a parameter system transaction) MUST NOT be coded in Chapter M. Instead, the controllers are coded in Chapter C controller logs. The controller logs follow the coding rules stated in Appendix A.3.2 and A.3.3. The rules for coding paired LSB and MSB controllers, as defined in Appendix A.3.1, apply to the pairs (6, 38), (99, 98), and (101, 100) when coded in Chapter C. Lazzaro/Wawrzynek [Page 48] INTERNET-DRAFT 25 April 2005 If C-active Control Change commands for controller numbers 6, 38, or 96-101 appear in the checkpoint history, and these commands are used as general-purpose controllers, the most recent general-purpose command instance for these controller numbers MUST appear as entries in the Chapter C controller list. MIDI syntax permits a source to use controllers 6, 38, 96, and 97 as parameter-system controllers AND general-purpose controllers in the same stream. An RTP MIDI sender MUST deduce the role of each Control Change command for these controller numbers by noting the placement of the command in the stream, and MUST use this information to code the command in Chapter C or Chapter M as appropriate. Specifically, active Control Change commands for controllers 6, 38, 96, and 97 act in a general-purpose way when the most recent active Control Change commands in the session history that set an RPN or NRPN parameter number code the null parameter (MSB value 0x7F, LSB value 0x7F), or when no active Control Change commands that set an RPN or NRPN parameter number appear in the session history. A MIDI source that follows the recommendations of [1] exclusively uses numbers 98-101 as parameter system controllers. Alternatively, a MIDI source may exclusively use 98-101 as general-purpose controllers, and lose the ability perform parameter system transactions in a stream. In the language of [1], the general-purpose use of controllers 98-101 constitutes a non-standard controller assignment. As most real-world MIDI sources use the standard controller assignment for controller numbers 98-101, an RTP MIDI sender SHOULD assume these controllers act as parameter system controllers unless it knows that a MIDI source uses controller numbers 98-101 in a general-purpose way. Lazzaro/Wawrzynek [Page 49] INTERNET-DRAFT 25 April 2005 A.4 Chapter M: MIDI Parameter System Readers may wish to review the Appendix A.1 definitions for "parameter system", "parameter system transaction", and "initiated parameter system transaction" before reading this Appendix. Chapter M protects parameter system transactions for Registered Parameter Number (RPN) and Non-Registered Parameter Number (NRPN) values. Figure A.4.1 shows the format for Chapter M. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|P|E|U|W|Z| LENGTH |Q| PENDING | Log list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.1 -- Top-level Chapter M format Chapter M begins with a 2-octet header. If the P header bit is set to 1, a 1-octet field follows the header, coding the 7-bit PENDING value and its associated Q bit. The 10-bit LENGTH field codes the size of Chapter M, and conforms to semantics described in Appendix A.1. Chapter M ends with a list of zero or more variable-length parameter logs. Appendix A.4.2 defines the bitfield format of a parameter log. Appendix A.4.1 defines the inclusion semantics of the log list. A channel journal MUST contain Chapter M if the rules defined in Appendix A.4.1 require that one or more parameter logs appear in the list. A channel journal also MUST contain Chapter M if the most recent C- active Control Change command involved in a parameter system transaction in the checkpoint history is: o an RPN MSB (101) or NRPN MSB (99) controller, or o an RPN LSB (100) or NRPN LSB (98) controller that completes the coding of the null parameter (MSB value 0x7F, LSB value 0x7F). This rule provides loss protection for partially-transmitted parameter numbers and for the null parameter numbers. If the most recent C-active Control Change command involved in a Lazzaro/Wawrzynek [Page 50] INTERNET-DRAFT 25 April 2005 parameter system transaction in the session history is for the RPN MSB or NRPN MSB controller, the P header bit MUST be set to 1, and the PENDING field (and its associated Q bit) MUST follow the Chapter M header. Otherwise, the P header bit MUST be set to 0, and the PENDING field and Q bit MUST NOT appear in Chapter M. If PENDING codes an NRPN MSB, the Q bit MUST be set to 1. If PENDING codes an RPN MSB, the Q bit MUST be set to 0. The E header bit codes the current transaction state of the MIDI stream. If E = 1, an initiated transaction is in progress. Below, we define the rules for setting the E header bit: o If no C-active parameter system transaction Control Change commands appear in the session history, the E bit MUST be set to 0. o If the P header bit is set to 1, the E bit MUST be set to 0. o If the most recent C-active parameter system transaction Control Change command in the session history is for the NRPN LSB or RPN LSB controller number, and this command acts to complete the coding of the null parameter (MSB value 0x7F, LSB value 0x7F), the E bit MUST be set to 0. o Otherwise, an initiated transaction is in progress, and the E bit MUST be set to 1. The U, W, and Z header bits code properties that are shared by all parameter logs in the list. If these properties are set, parameter logs may be coded with improved efficiency. By default, the U, W, and Z bits MUST be set to 0. If all parameter logs in the list code RPN parameters, the U bit MAY be set to 1. If all parameter logs in the list code NRPN parameters, the W bit MAY be set to 1. If the parameter numbers of all RPN and NRPN logs in the list lie in the range 0-127 (and thus have an MSB value of 0), the Z bit MAY be set to 1. A.4.1 Log Inclusion Rules Parameter logs code recovery information for a specific RPN or NRPN parameter. A parameter log MUST appear in the list if a C-active command that forms a part of an initiated transaction for the parameter appears in the checkpoint history. Lazzaro/Wawrzynek [Page 51] INTERNET-DRAFT 25 April 2005 An exception to this rule applies if the checkpoint history only contains transaction Control Change commands for controller numbers 98-101 that act to terminate the transaction. In this case, a log for the parameter MAY be omitted from the list. A log MAY appear in the list if a C-active Control Change command that forms a part of an initiated transaction for the parameter appears in the session history. Otherwise, a log for the parameter MUST NOT appear in the list. Multiple logs for the same RPN or NRPN parameter MUST NOT appear in the log list. The parameter log list MUST obey the oldest-first ordering rule (defined in Appendix A.1), with the phrase "parameter transaction" replacing the word "command" in the rule definition. Parameter logs associated with the RPN or NRPN null parameter (LSB = 0x7F, MSB = 0x7F) MUST NOT appear in the log list. Chapter M uses the E header bit (Figure A.4.1) and the log list ordering rules to code null parameter semantics. The ch_active MIME parameter (Appendix C.1.3) may be used to change parameter log inclusion semantics, to support renderers (such as [9]) that exclude certain RPN parameters from the semantics of Control Change commands for controller 121 (Reset All Controllers). This support is necessary because an active (but no longer C-active) Control Change command for an RPN or NRPN parameter in the checkpoint history requires loss protection if the renderer ignores Reset All Controllers commands for the parameter. Appendix C.1.3 defines how the ch_active parameter modifies Chapter M semantics. In most cases, parameter logs for RPN and NRPN parameters that are assigned to the ch_never MIME parameter (Appendix C.1.3) MAY be omitted from the list. An exception applies if: o The log codes the most recent initiated transaction in the session history, and o A C-active command that forms a part of the transaction appears in the checkpoint history, and o The E header bit for the top-level Chapter M header (Figure A.4.1) is set to 1. In this case, a log for the parameter MUST appear in the list. This log informs receivers recovering from a loss that a transaction is in progress, so that the receiver is able to correctly interpret RPN or Lazzaro/Wawrzynek [Page 52] INTERNET-DRAFT 25 April 2005 NRPN Control Change commands that follow the loss event. A.4.2 Log Coding Rules Figure A.4.2 shows the parameter log structure of Chapter M. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| PNUM-LSB |Q| PNUM-MSB |J|K|L|M|N|T|V|A| Fields ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.2 -- Parameter log format The log begins with a header, whose default size (as shown in Figure A.4.2) is 3 octets. If the Q header bit is set to 0, the log encodes an RPN parameter. If Q = 1, the log encodes an NRPN parameter. The 7-bit PNUM-MSB and PNUM-LSB fields code the parameter number, and reflect the Control Change command data values for controllers 99 and 98 (for NRPNs) or 101 and 100 (for RPNs). The J, K, L, M, and N header bits form a Table of Contents (TOC) for the log, and signal the presence of fixed-sized fields that follow the header. A header bit that is set to 1 codes the presence of a field in the log. The ordering of fields in the log follows the ordering of the header bits in the TOC. Appendices A.4.2.1-2 define the fields associated with each TOC header bit. The T, V, and A header bits code information about the parameter log, but are NOT part of the TOC. A set T, V, or A bit does NOT signal the presence of any parameter log field. If the rules in Appendix A.4.1 state that a log for a given parameter MUST appear in Chapter M, the log MUST code sufficient information to protect the parameter from the loss of C-active parameter transaction Control Change commands in the checkpoint history. This rule does not apply if the parameter coded by the log is assigned to the ch_never MIME parameter (Appendix C.1.3). In this case, senders MAY choose to set the J, K, L, M, and N TOC bits to 0, coding a parameter log with no fields. Note that logs to protect parameters that are assigned to ch_never are REQUIRED under certain conditions (see Appendix A.4.1). The purpose of the log is to inform receivers recovering from a loss that a transaction is in progress, so that the receiver is able to correctly interpret RPN Lazzaro/Wawrzynek [Page 53] INTERNET-DRAFT 25 April 2005 or NRPN Control Change commands that follow the loss event. Parameter logs provide two tools for parameter protection: the value tool and the count tool. Depending on the semantics of the parameter, senders may use either tool, both tools, or neither tool to protect a given parameter. The value tool codes information a receiver may use to determine the current value of an RPN or NRPN parameter. If a parameter log uses the value tool, the V header bit MUST be set to 1, and the semantics defined in Appendices A.4.2.1 for setting the J, K, L, and M TOC bits MUST be followed. If a parameter log does not use the value tool, the V bit MUST be set to 0, and the J, K, L, and M TOC bits MUST also be set to 0. The count tool codes the number of transactions for an RPN or NRPN parameter. If a parameter log uses the count tool, the T header bit MUST be set to 1, and the semantics defined in Appendices A.4.2.2 for setting the N TOC bit MUST be followed. If a parameter log does not use the count tool, the T bit and the N TOC bit MUST be set to 0. Note that V and T are set if the sender uses value (V) or count (T) tool for the log on an ongoing basis. Thus, V may be set even if J = K = L = M = 0, and T may be set even if N = 0. The A header bit codes the level of protection provided by the value and count tools. If the log parameter is assigned to the ch_active MIME parameter (Appendix C.1.3), the A header bit MUST be set to 1, coding the elevated protection level of the parameter. Otherwise, the A header bit MUST be set to 0, coding the standard protection level. In many cases, all parameters coded in the log list are of one type (RPN and NRPN), and all parameter numbers lie in the range 0-127. As described in Appendix A.4.1, senders MAY signal this condition by setting the top-level Chapter M header bit Z to 1 (to code the restricted range) AND by setting the U or W bit to 1 (to code the parameter type). If the top-level Chapter M header codes Z = 1 and either U = 1 or W = 1, all logs in the parameter log list MUST use a modified header format. This modification deletes bits 8-15 of the bitfield shown in Figure A.4.2, to yield a 2-octet header. The values of the deleted PNUM-MSB and Q fields may be inferred from the U, W, and Z bit values. A.4.2.1 The Value Tool The value tool uses several fields to track the value of an RPN or NRPN parameter. Lazzaro/Wawrzynek [Page 54] INTERNET-DRAFT 25 April 2005 The J TOC bit codes the presence of the octet shown in Figure A.4.3 in the field list. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |X| ENTRY-MSB | +-+-+-+-+-+-+-+-+ Figure A.4.3 -- ENTRY-MSB field The 7-bit ENTRY-MSB field codes the data value of the most recent C- active Control Change command for controller number 6 (Data Entry MSB) in the session history that appears in a transaction for the log parameter. The X bit MUST be set to 1 if the command coded by ENTRY-MSB precedes the most recent Control Change command for controller 121 (Reset All Controllers) in the session history. Otherwise, the X bit MUST be set to 0. Note that in the default case, the ENTRY-MSB field may only code C- active commands, and so X MUST be set to 0. The X bit plays a useful encoding role if an assignment to the ch_active MIME parameter (Appendix C.1.3) permits the ENTRY-MSB field to code commands that are active but not C-active. The K TOC bit codes the presence of the octet shown in Figure A.4.4 in the field list. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |X| ENTRY-LSB | +-+-+-+-+-+-+-+-+ Figure A.4.4 -- ENTRY-LSB field The 7-bit ENTRY-LSB field codes the data value of the most recent C- active Control Change command for controller number 38 (Data Entry LSB) in the session history that appears in a transaction for the log parameter. The X bit MUST be set to 1 if the command coded by ENTRY-LSB precedes Lazzaro/Wawrzynek [Page 55] INTERNET-DRAFT 25 April 2005 the most recent Control Change command for controller 121 (Reset All Controllers) in the session history. Otherwise, the X bit MUST be set to 0. A parameter log that uses the value tool MUST include the ENTRY-MSB field if a C-active Control Change command for controller number 6 appears in the checkpoint history. As a rule, a parameter log that uses the value tool MUST include the ENTRY-LSB field if a C-active Control Change command for controller number 38 appears in the checkpoint history. However, the ENTRY-LSB field MUST NOT appear in a parameter log if the Control Change command associated with the ENTRY-LSB precedes a Control Change command for controller number 6 (Data Entry MSB) in the session history that appears in a transaction for the log parameter. The L TOC bit codes the presence of the octets shown in Figure A.4.5 in the field list. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|X| A-BUTTON | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.5 -- A-BUTTON field The 14-bit A-BUTTON field codes a count of the number of active Control Change commands for controller numbers 96 and 97 (Data Increment and Data Decrement) in the session history that appear in a transaction for the log parameter. The M TOC bit codes the presence of the octets shown in Figure A.4.6 in the field list. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|R| C-BUTTON | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.6 -- C-BUTTON field The 14-bit C-BUTTON field has semantics identical to A-BUTTON, except Lazzaro/Wawrzynek [Page 56] INTERNET-DRAFT 25 April 2005 that Data Increment and Data Decrement Control Change commands that precede the most recent Control Change command for controller 121 (Reset All Controllers) in the session history are not counted. For both A-BUTTON and C-BUTTON, Data Increment and Data Decrement Control Change commands are not counted if they precede Control Changes commands for controller numbers 6 (Data Entry MSB) or 38 (Data Entry LSB) that appear in a transaction for the log parameter in the session history. The A-BUTTON and C-BUTTON fields are interpreted as unsigned integers, and the G bit associated the field codes the sign of the integer (G = 0 for positive or zero, G = 1 for negative). To compute and code the count value, initialize the count value to 0, add 1 for each qualifying Data Increment command, and subtract 1 for each qualifying Data Decrement command. After each add or subtract, limit the count magnitude to 16383. The G bit codes the sign of the count, and the A-BUTTON or C-BUTTON field codes the count m