view Side-By-Side changes
INTERNET-DRAFTMotorolaQ. Xie MotorolaExpiresT. Bova S Hussain T Krivoruchka R. Revis Cisco expires in six months1April 19 1999 MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL<draft-ietf-sigtran-mdtp-03.txt><draft-ietf-sigtran-mdtp-04.txt> Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Internet Draft discusses an experimental call control signaling transport protocol, namely the Multi-network Datagram Transmission Protocol (MDTP), that is intended to provide fault-tolerantreliable/unreliablereliable data transfer between communicatingprocessesentities over IP networks [1]. MDTP is proposed as an application-level protocol which is designed with a high emphasis on supporting redundant networks and transparent fault management. MDTP also gives theapplicationuser a great degree of timing control and configurationflexibilities.flexibilities in order to meet the stringent time constraints often found in telephony signaling protocols. The motivation of developing MDTP is to establish a framework for supporting Internet-based high reliability real-time commercial applications such as signaling and call control for Internet telephony.Stewart & Xie [Page 1] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999TABLE OF CONTENTS 1.Introduction..............................................3Introduction 1.1Multi-network Datagram Transmission Protocol.........3Design Requirements of MDTP 1.2 Interfaces toMDTP...................................4 1.3 Operation of MDTP....................................5MDTP 2.Design Principles.........................................5 3. Header Format.............................................6 3.1MDTPHeaderDatagram FormatDescription.......................9 3.2 Notes on Multicast2.1 Headerformat....................12 4.Field Descriptions 2.2 Data Field 3. TransmissionInitialization..............................12 4.1 Normal Initialization...............................12 4.2 Multiple Network Addresses..........................14 4.3InitializationCollision............................15 4.4 Re-initialization...................................16 4.5 Link rotation.......................................16 5.3.1 Endpoint Association Initialization 3.1.1 Choice of Tag Value 3.2 Data Field Format of Initiation Datagrams 3.3 Initialization Collision 3.4 Association Re-initialization 4. Reliable TransferMode...................................17 5.1of Datagrams 4.1 TimerControl.......................................19 5.2Management Rules 4.1.1 Link Rotation 4.2 GapAcknowledgments.................................21 5.3Acknowledgment for Missing Datagrams 4.3 CongestionControl..................................23 5.4Control 4.3.1 Sending with Window Control 4.3.2 Window Length Adjustment 4.3.3 Flow Control using In-Queue Information 4.3.4 T3-send Timer Adjustment with RTT 4.4 Sequence NumberReset...............................26 5.5 RetransmissionReset 4.5 Datagram Re-transmission 4.5.1 Re-transmission onMultiple Networks.................27 5.5.1 Randomization of the T3-Send timer at resend ...28 5.6 Termination of an Endpoint..........................28 5.7 Endpoint Drain......................................29 5.8 Advisory Acknowledgments...........................29 5.9Redundant networks 4.6 RTT Measurement 4.6.1 RTT Datagram Header Format 4.6.2 Measure RTTMeasurement.....................................30 5.104.7 Link Heart BeatAck.....................................324.8 Advisory Acknowledgment 4.9 Termination of an Association 4.10 Draining of an Association 5. Interface with upper level protocols 6.Unreliable Transfer Mode.................................33 6.1 Ordered reception..................................34Suggested MDTP Protocol Parameter Values 7.Reliable flows...........................................35 7.1 Initiating a flow...................................36 7.2 Flow acknowledgments................................37 7.3 Flow session closing................................41Acknowledgments 8.Mixed Mode Data Transmission.............................42Author's Addresses 9. References Appendix A: Stream-based Reliable and Ordered Delivery A.1 Stream Initiation A.2 Stream Termination A.3 Stream Datagram Transfer A.3.1 Header Format in Stream Datagrams with User Data A.3.2 Transmission of Stream Datagrams A.3.3 Extended Stream Ack A.4 Other Issues with Stream Transfer Appendix B: BundledMessages.........................................43 9.1Message Transfer B.1 Format of BundledDatagram..........................44 9.2Datagram B.2 BundledTransfer....................................45 10.Datagram Transfer Appendix C: FragmentedMessages......................................46 11. Non-protocol Datagrams...................................47 12. Broadcast and Multicast..................................48 12.1 Multicast/Broadcast Initialization.................48 12.2 Transmission of Broadcast Datagrams................48 12.3 Transmission of Multicast Datagrams................49 12.4 Reset of theMessage Transfer Appendix D: Multicast DatagramSequence Number....50 13. Interface with upper level protocols.....................51 13.1 Init.MDTP primitive.....................................52 13.2 Send.Data primitive.....................................52 13.3 Receive.Data primitive..................................52 13.4 Data.Arrive notification................................53 13.5 Send.Failure notification...............................53 13.5 Link.Status.Change notification.........................53 Stewart & Xie [Page 2] Internet Draft Multi-networkTransfer D.1 Multicast Datagram Header Format D.2 TransmissionProtocol Apr 1999 13.6 Communication.Lost notification.........................53 14. Suggested Timer and Protocol Parameter Values............54 15. Acknowledgments.........................................54 16. Author's Addresses.......................................54 17. References...............................................55of Multicast Datagrams Appendix E: Unreliable Delivery E.1 Ordered Unreliable Delivery 1. Introduction This Internet Draft discusses an experimental protocol, namely the Multi-network Datagram Transmission Protocol(MDTP), that(MDTP). The intention of developing MDTP isintendedto providefault-tolerant reliable/unreliablea fault-tolerant, real-time reliable data transfer mechanism between communicatingprocessesendpoints over IP networks [1]. MDTP is proposed as an application-level protocol which is designed with a high emphasis on supporting redundant networks and transparent fault management. MDTP also gives theapplicationuser a great degree of timing control and configurationflexibilities.flexibilities in order to meet the stringent time constraints often found in telephony signaling protocols. The motivation of developing MDTP is to establish a framework for supporting Internet-based high reliability real-time commercial applications such as signaling and call control for Internet telephony.This document describes the functional interface and the details necessaryMDTP is also designed toimplementbe scalable in order to support different signaling transport requirements for different interfaces in a telephony network. For example, the transportation of signaling protocols such as PRI ISDN may not require redundant links, and hence only a subset of MDTP will need to be implemented. On the other hand, redundant networks may be mandated when transporting SS7 signaling messages amongst different components in a carrier-grade telephony core network. In such cases, the transparent support for redundant networks, load sharing, and fault management defined in MDTP become essential and likely need to be fully supported in an implementation. Many of the fundamental concepts that have made TCP such a useful protocol are reused in MDTP, and some of the advantages of UDP are also merged into the design. This has lead to a highly effective, robust protocol for fault tolerant data communications. This document describes the functional interface and the details necessary for implementing MDTP.1.1 Multi-network Datagram Transmission Protocol (MDTP)TheMulti-network Datagram Transmission Protocol (MDTP) presented inmain body of thisInternet Draft is designed to meetdocument contains the minimal set of functionalities of MDTP that must be implemented. In the Appendices, a set of additional MDTP functions, such as reliable stream, multicast, message bundling, message fragmentation, are defined. Those additional functionalities are optional to implementation. 1.1 Design Requirements of MDTP The followingcriticalare some of the design requirementscommonof MDTP, in order to make MDTP capable of supporting real-time call control environmentsemployingwhich potentially may employ redundant networks: A)A processHigh communication fan-out: an endpoint may need to be in simultaneous communication with hundreds or thousands of endpoints performing various call processing functions. These endpoints may be codec converters, SS7 to IP translation applications, or, in the case of mobile networks, data selector and combiner applications. B)A processStringent timer control: an endpoint needs to have a very fine control over the timing for delivering a datagram. The timing should be easily adjusted depending on the message type and the destination. For example, after a few seconds of non-delivery the call which the message is about may not exist anymore. C)A processSupport redundant links: an endpoint communicating with a peer should be able to take advantage of the redundant networks in a transparent way. This means that the application or upperlevellayer protocols need not to be involved in the network fault management. Instead, when network failure occursthe transmission protocolMDTP should be able to automatically re-route the out-bound datagram to the alternateStewart & Xie [Page 3] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999network (if one exists) without intervention from the application. D)DatagramsOrderly delivery: datagrams may arrive out of order, or may arrive in duplicate copies. This is especially truein aif redundantnetwork environment. The transmission protocolnetworks are used. MDTP should be strong enough to properly handle both situations with little intervention from the upperlevel protocollayer protocols orapplication. To accomplishapplications. F) Support stream sequencing: on theabove objectives we have defined MDTP to reside in user-space, i.e., it is not intended to be implemented as a module in an operating system. This givesdemand of theapplication orupperlevellayer protocolsthat use MDTP outstanding flexibility in controlling the timing and other operational characteristics for the data transmissions. MDTP is also made multi-network aware. This means that if more than one path exists between two endpoints (such as redundant LANs),or applications, MDTPwill take advantage of the multiple networks by automatically switchingshould be able tothe alternate LAN if the datagramsupport sequenced deliverybecomes unavailable or inefficient (e.g., too many re-transmissions) on the current LAN. The abilitywith regard tohandle multiple networks by MDTP can also greatly facilitate the implementation of various traffic balancing schemes ineach individual stream, i.e., theapplication or upper level protocols. Indelay caused by theredundant network setting, out-of-order or duplicate datagrams are proven to be most harmful during MDTP transmission initiationsloss andre-initiations. To cope with the problem, MDTP utilizesretransmission of avery efficient tag mechanismdatagram should be isolated toguard against out-of-order or duplicate datagrams. MDTP assumes that a UDP-like [2] transport protocol is available atonly theoperating system level for data transport. We have successfully implemented and tested MDTP over UDP and Sun Microsystem's CLTS transport layers. Comparingstream totraditional TCP [3], MDTP designwhich the datagram belongs. This ismore tuned towardsparticularly important in some call control applications, where aspecial setloss ofapplications, that is the time critical fault tolerant applications using redundant LANs. It is not designed to replace TCP asageneral purpose transmission protocol.message should only affect the call whom the message belongs to. 1.2 Interfaces to MDTPMDTP interfaces with theThe application programs orhigher levelupper layer protocols interface with MDTP through a set offunction calls. Due toprimitives (see section 5. for details). Towards thefactnetworks, it is assumed that a UDP-like data transport protocol will provide the interface between MDTPis an application level protocol, these callsand the operating system. No special interfaces or changes arenot executed within the operating system, but within the user process (i.e., in the user space). The application or higher level protocols pass data to MDTP by making calls to MDTP, which then enqueues the data for transmission. When data arrives, MDTP will distribute the data to the application or higher level protocols via mechanisms predefined by the application. The application also has an interface to change the operational mode of an MDTP endpoint and the default operational mode of the MDTP endpoint. The default operational mode is used in the absence of any Stewart & Xie [Page 4] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 specific direction from the application. More details on the MDTP interface to the upper level protocol/application can be found in section 13. As noted above, it is assumed that a UDP-like data transport protocol will provide the interface between MDTP and the operating system. No other special interfaces or changes are assumedassumed within the operating system, all queuing andinternal pseudo-connectionendpoint association informationisare maintained inside MDTPendpoint. 1.3 Operation oflayer. 2. MDTP Datagram Format MDTPoperates in three different modes. A) Reliable transfer mode B) Unreliable transfer mode C) Raw UDP transfer mode The two ends in a communication connection can operate in different modes with respect to each other, with the exception ofinserts theraw UDP mode. For example, if two endpoints A and B are communicating with each other. Endpoint A may be sending information to B in reliable transfer mode, while B, onfollowing protocol header at theother hand, may be sending information to A in unreliable transfer mode. All communications from A to B will be acknowledged by B, but A will not need to acknowledge data received from B. Raw UDP transfer is used when onebeginning ofthe endpoints in communication does not support MDTP. This allows compatibility with non-MDTP endpoints. Two MDTP capable endpoints are also allowed to engage in communications in raw UDP transfer mode. However, both sides will have to be in raw UDP mode once one of them indicates to use raw UDP transfer mode. MDTP also provides a bundling option for both the reliable and unreliable transfer modes. This allows each side to hold the data before transmission for some period of time, so that small datagrams can be combined and sent in a single larger datagram to improve network utilization efficiency. 2. Design Principles One of the major objectives which dictates the design of MDTP is to provide a data transmission protocol that transparently supports highly fault tolerant implementations. To accomplish this, provisions for two endpoints engaging in communication to use multiple networks is essential. MDTP is therefore designed to yield the best fault tolerance when the application shares the load over multiple network connections. In cases of failed original transmission, MDTP provides the ability of attempting retransmissions using an alternate network connection even Stewart & Xie [Page 5] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 when the upper level protocol or the application is completely ignorant of the existence of the alternate route. Many of the fundamental concepts that have made TCP such a useful protocol are reused, and some of the advantages of UDP are also merged into the design of MDTP. This has lead to a highly effective, robust protocol for fault tolerant data communications. 3. Header Format MDTP inserts at the beginning of every datagram a header. This header is composed of various flags and integers. The integers are always keptevery user datagram. The integer fields shall be transmitted in network byte order.The following table illustrates the common MDTP header overlay. Note that one tick mark represents one bit position.MDTP Header Format- Non Multicast0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDTP Protocol Identifier 2Version |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | VersionFlags | In Queue | | |N N W I F R DA|BA M S W R RBF G U| | | |O O I S IET AC|RC U H N EE UT L A N| | ||G|M B N B RSM TK|OK L U R 12 NC O R R| || +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / data / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 6] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 MDTP Header Format - Multicast Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast To Transmit address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast From - senders base address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ \ / data / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+MDTP2.1 HeaderFormat - RTT Ack 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Field Descriptions MDTP ProtocolIdentifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTPIdentifier: 32 bits This shall be a fixed long value of 0xf7873072. The receiver shall always verify this Protocol Identifier2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent Time Int-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent Time Int-2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 7] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Flow Initiate/Close Message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen/flow num) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (opening) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Flow Extended Acknowledgment 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of flow Acks | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ / / one for each 'Number of flow Acks' \ \ / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 8] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 3.1 MDTP Header Format MDTP Protocol Identifier 1: 32 bits This is a fixed long value of 0xf7873072. MDTP Protocol Identifier 2: 32 bits This is a fixed long value of 0x17074012. MDTP Protocol Identifier 1 and 2 are jointly examined to determine a received datagram is an MDTP protocol datagram. Acknowledgment Number (or Seen): 32 bits If the flag ACK is set this value is the next sequence number that the sender of this datagram expects to receive from the receiver of this datagram. However, during initialization negotiation, multicast and broadcast transmissions, this field will have special meanings (see 4 and 11). Sequence Number (or Send): 32 bits If DAT flag is set, this value represents the sequence number of the first data octet that follows this header. Otherwise, this value will be the sequence number of the first octet of the next data unit that will be sent. However, during initialization negotiation, multicast and broadcast transmissions, this field will have special meanings (see 4 and 11). Part: 8 bits This value represents the Part number of a fragmented message. The first fragment of a message is always part '0'. Of: 8 bits This value represents the total number of fragments in a fragmented message. The valid range for this value is from '1' to '255'. For broadcast and multicast datagrams this value is set to '1' to indicate that no fragmentation should occur. Data Size: 16 bits This value represents, in number of octets, the size of the data field that follows this header in the current datagram. Flags: 8 bits NOG - No Guaranteed delivery. This bit is used in negotiation Stewart & Xie [Page 9] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 and is set to indicate that the sender does not wish to use reliable delivery. When this bit has been set in negotiation, the receiver should prevent its application from putting communication with this endpoint in reliable mode. In normal data transfer (after the initiate sequence) this bit should be set to 0, except when responding to a RTT Ack request. NOB - No Bundling. This bit is used in negotiation and is set to indicate that the sender does not wish to perform of bundling or un-bundling of datagrams. When this bit has been set in negotiation, the receiver should prevent its application from putting communication with this endpoint in bundled mode. In normal data transfer this bit should be set to 0, if this bit is set to 1 then this message is part of a flow. WIN - Window Up. This bit is set by the sender of this datagram to indicate that the sender needs the receiver to acknowledge on previously received datagrams before it can send more datagrams. ISB - Is Bundled. This bit is set by the sender to indicate that this datagram is bundled. This bit should never be set if during negotiation either end set the NOB bit. FIR - First Datagram. This flag is set to indicate that this is a negotiation datagram. RES - Reset Sequence Number. This bit is set to indicate that the sequence number is being reset. The sequence number should be reset whenever the sending count is greater than 0x7fffffff. DAT - Data Present. This bit is set to indicate that, following this header, application data is present in this datagram. ACK - Acknowledge. This bit is set to indicate that the sender is acknowledging receipt of the specified Acknowledgment Number. Mode: 8 bits BRO - Broadcast. This bit is set to indicate a broadcast or multicast datagram. When this bit is set, bit SHU, WNR, BUN, and GAR are not used and should be set to '0'. This datagram is a multicast datagram if the UNR bit is also set. Otherwise, this datagram is a broadcast datagram. SHU - Shutdown. This bit is set when the sender initiates its closing procedure and indicates to the receiver that the sender is no longer a valid destination. If the UNR bit is set in conjunction with the SHU bit, an incomplete shutdown is specified. After an incomplete shutdown, the receiver can still re-establish the communication with the sender by re-initiating with the sender (see 5.7). Stewart & Xie [Page 10] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 WNR - Window Up Response. This bit is set in the acknowledgment reply to a Window Up flag. RE1 - This bit will represent one of two things. If the GAR bit is set to one, then setting the RE1 bit indicates to the receiver that the sender is requesting a advisory ACK. This is normally sent in a datagram when 1/2 of the current window has been sent. If this bit is set to 0 (when the GAR bit is set) then the sender is NOT requesting a advisory ACK. If the UNR bit is set then the RE1 bit is set than the receiver is requested to order the datagrams (if more than one have not been read). If the receiver has already delivered a datagram of higher sequence, then the receiver should discard lower number sequence datagrams that arrive late. RE2 - This bit will represent one of two things. If the GAR bit is set to one, the DAT bit is set to 0 and the ACK bit is set to 1 then this is a ACK with a Round Trip Time Request format. This also identifies the RTT Ack header format it in place. If the UNR bit is set to 1 and DAT bit is set to 0, then this datagram is used in a implementation specific way but carries no data. The datagram can be safely ignored and discarded. BUN - Bundled Mode. This bit is set to indicate that bundled mode is in effect for the sender. This bit should never be set if during negotiation either endpoint set the NOB flag. GAR - Guaranteed Mode. This bit is set to indicate that the reliable mode is in effect for the sender, i.e., the sender expects an acknowledgment. This bit should never be set if either endpoint set the NOG flag during negotiation. UNR - Unreliable Mode. This bit is set to indicate that unreliable mode is in effect for the sender and the sender does not expect an acknowledgment. This bit has special meanings if BRO or SHU bit is set (see above). Version: 8 bits This field represents the version number of the MDTP protocol. If these bits are set to 1, then the sender does not support Round Trip Time (RTT) calculation or Heart Beat of reliable protocol. If these bits are set to 2 then this version does support RTT and Heartbeat. If the Version is set to 3 then the sender/receiver supports reliable flows. In Queue: 8 bits This field contains the number of messages the sender has on its incoming queue, waiting to be read by the application. This gives the receiver an indication of the flow control conditions within the sender. Stewart & Xie [Page 11] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The message header is always followed by the data field. If there is less than 4 octets of application data to send with the datagram, the data field of the datagram should be padded with all '0' to make it four (4) octets. The padded all '0' octets, if there is any, are not counted in the Data Size. The maximal Data Size for a single MDTP datagram is the MTU size of the underlying transport protocol (e.g., UDP) minus the MDTP header size that is twenty four (24) octets. The combination of the maximal 'Of' value, which is 255, and the maximal Data Size will determined the maximal size of a single message that the MDTP can send or receive. 3.2 MDTP Multicast Header Format The multicast header format is identical to the standard MDTP header format, as discussed above, except for the following extensions. Multicast To Transmit address - This is the multicast address, in network byte order, that the sender transmitted the data to. The receiver can use this information for internal tracking purposes. Multicast From - This is the base address (address 0 in the initiate message, see below) of the sender. Since a multicast sender may not have gone through the initiate procedures this address is the base reference that the receiver is to use to lookup the sender. This network byte order address should be used to reference any internal cache rather than the arriving network from address. 4. Transmission Initialization 4.1 Normal Initialization Before the first data transmission can take place from one endpoint (A) to another endpoint (Z), the two endpoints will need to complete an initialization process. The initialization process consists of the following steps. A) Endpoint A should first send an initiation datagram, while withholding the application data from transmission. Endpoint A Endpoint Z [Header Flags=FIR|RES Mode=options Seen=0,Send=Tag_A] -----------------------> (Start T1-init timer) (Enter Tag_A-lock mode) The initiation datagram is identified by setting FIR and RES bits in the Flags field. No user data should be carried in the initiation datagram. Stewart & Xie [Page 12] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The Endpoint A should fill in the appropriate options, e.g., BUN, GAR, or UNR, in the Mode field to indicate the transmission type it has chosen. It may also use NOB and NOG bits in the Flags field to specify to whether or not its peer is allowed for bundling or reliable transfer mode. The Seen field will be set to '0', but an initiation tag, Tag_A, generated by Endpoint A, will be carried in the Send field, as shown in the above diagram. If re-initializations are needed between two endpoints subsequently (see 4.3), a different tag with a unique value should be used for each re-initialization. After sending the initiation datagram, Endpoint A shall start T1-init timer and enter a Tag_A-lock mode. During the Tag_A-lock mode, Endpoint A will wait for the initiation Ack datagram with the Seen value set to Tag_A. Any other incoming datagrams from Endpoint Z, except for new initiation datagrams, will be discarded. The arrival of new initiation datagrams during the Tag_A-lock mode indicates an initialization collision that will be discussed in 4.3. If T1-init timer expires, the same initiation datagram will be retransmitted and the timer restarted. This will be repeated Max.Init.Retransmit timesbeforeEndpoint A considers Endpoint Z unreachable and optionally reports the failure. B) Upon the receipt of the above initiation datagram from Endpoint A, Endpoint Z should respond immediately with an initiation Ack as shown below: Endpoint A Endpoint Z [Header Flags=FIR|RES|ACK Mode=Options /---------- Seen=Tag_A,Send=Tag_Z] / (Enter Tag_Z-lock mode) (Cancel T1-init timer)<-------/ The initiation Ack datagram is specified with FIR, RES, ACK bits set to '1' in the Mode field. Similarly, Endpoint Z will specify its preferred transmission mode and type by setting proper bits in the Mode and Flags fields. In addition, in the out-bound initiation Ack datagram, Endpoint Z should set the Seen field to Tag_A and supply its own initiation tag, Tag_Z, in the Send field. Once the initiation Ack is transmitted, Endpoint Z should enter the Tag_Z-lock mode. In the Tag_Z-lock mode Endpoint Z will ignore any incoming initiation Ack datagrams and also discard any other incoming datagram whose Seen field is not equal to Tag_Z, except for new initiation datagrams. Stewart & Xie [Page 13] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 If a new initiation datagram is received when Endpoint Z is in Tag_Z-lock mode, Endpoint Z will acknowledged the initiation datagram only when the tag carried in the Send field matches Tag_A previously recorded by Endpoint Z. Otherwise, Endpoint Z will send an initiation datagram with Send field set to Tag_Z back to Endpoint A to elicit an initiation Ack. C) After transmitted the initiation Ack, Endpoint Z can start transmitting datagrams with user data. However, the Seen fieldit proceeds any further in interpreting thefirst out-bound datagram with user data must be set to Tag_A. D) Uponheader fields. Version: 8 bits This field represents thereceiptversion number of theinitiation Ack with Seen equalMDTP protocol (value TBD). Flags: 16 bits NOM - shall be set toTag_A, Endpoint A can start transmitting datagrams with user data. However, the first datagram with application data transmitted by Endpoint A should have the Seen value1 (reserved for fragmentation, see Appendix C) NOB - shall be set toTag_Z, which1 (reserved for bundling, see Appendix B) WIN - Window Up. This bit isobtained from the initiation Ack. Endpoint A Endpoint Z {first app message} [Header Flags=ACK|DAT Mode=options Seen=Tag_Z,Send=1] [data field] -----------\ \ \-------> (Leave Tag_Z-lock mode) E) Uponset by thereceiptsender ofthe firstthis datagramwith user data from Endpoint A and with the Seen value settoTag_Z, Endpoint Z should leave the Tag_Z-lock mode. F) Similarly, upon the receipt ofindicate that thefirst datagram with user data andsender needs theSeen value setreceiver toTag_A from Endpoint Z, Endpoint A should leave the Tag_A-lock mode. The upper level protocol or applicationacknowledge on previously received datagrams before it canpredefine asend more datagrams. ISB - shall setof default transmission modes, which will be used by the endpointto 0 (reserved forinitialization. However, it should be pointed out that the transmission modes between two endpoints are allowedbundling, see Appendix B) FIR - First Datagram. This flag is set tochange onindicate that this is adatagram by datagram basis, as been illustrated in later chapters. 4.2 Multiple Network Addresses In order to support multiple networks, both endpoints need to have knowledge of all network addresses availableInitiation datagram. RTM - normally set toeach other.0 (used for Link Heart Beat and RTT measurement, see sections 4.6 and 4.7) DAT - Data Present. Thisinformation needsbit is set tobe passedindicate that, following this header, application data is present in this datagram. ACK - Acknowledge. This bit is set to indicate that theother end duringsender is acknowledging theinitialization. The data fieldreception of theinitiation and initiation Ack datagrams is usedspecified Acknowledgment Number. MUL - shall be set to 0 (reserved forthis purpose. Stewart & Xie [Page 14] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Depending onmulticast, see Appendix D) SHU - Shutdown. This bit is set when theunderlying network configuration,sender initiates its closing procedure and indicates to thedata field will be filled in one ofreceiver that thetwo following ways: A)sender is no longer a valid destination. If thesending endpoint ofUNR bit is set in conjunction with theinitiation or initiation Ack datagram does not have access to multiple networks,SHU bit, an incomplete shutdown is specified. After an incomplete shutdown, thedata field willreceiver can still re-establish the communication with the sender by re-initiating with the sender (see 4.7). WNR - Window Up Response. This bit is set in the acknowledgment reply to a Window Up flag. RE1 - normally set to 0 (used for advisory ACK, see section 4.8) RTC - normally set to 0, (used for RTT, see section 4.6) FLO - shall be set tothe pad value of 4 octets of '0's. B) If the sending endpoint has access0 (reserved for reliable stream, see Appendix A) GAR - shall be set tomultiple networks (for example two redundant LANs),1 (reserved for unreliable mode, see Appendix E) UNR - shall be set to 0 (reserved for unreliable mode see Appendix E) In Queue: 8 bits This field contains thefirst 4 octetsnumber of messages thedata field will be an unsigned long integer (in network order) specifying how many networks the endpointsender hasaccess to. Following these 4 octets willon its incoming queue, waiting to bea list of network addresses. Each address begins with a header of 4 octets followedread by theactual address. The first 2 octets ofapplication. This gives theheader isreceiver anunsigned integer indicating the size of the actual address. The next 2 octets of the header is the typeindication of theaddress. For an IPv4 address, the address header will haveflow control conditions within thesize set to 8 andsender. Acknowledgment Number (or Seen): 32 bits If thetypeflag ACK is setto AF_INET (2). Of the 8 octets used by the actual IPv4 address, the first 4 octets will contain the IP address (in network order) of the path. The next two octets will containthis value is theUDP port number (in network byte order). Thelasttwo octets will be padded with 0's. The data field of the initiation or initiation Ack datagram from an endpoint with access to two IPv4 networks would looksequence number that thefollowing: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Networks = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of address=8 | Type of Address=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address of Network 1 = 0x88b68108 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port = 52212 | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sizesender ofaddress=8 | Typethis datagram received from the receiver ofAddress=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Addressthis datagram. Sequence Number (or Send): 32 bits If DAT flag is set, this value represents the sequence number ofNetwork 2 = 0x0a100001 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port = 52212 | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Anythe current data unit following this header. Otherwise, this value will be theinitiate network list cansequence number of the next data unit that will beignored. Implementations are at option to use additionalsent. Data Size: 16 bits This value represents, in number of octets, the size of the datasentfield that follows this header insubsequent locationsthe current datagram. Part: 8 bits shall have value '0' (reserved for fragmentation, see Appendix C) Of: 8 bits shall have value '1' (reserved for fragmentation, see Appendix C) 2.2 Data Field When the DAT flag is set to 1, the MDTP datagram header will be followed by a data field. An implementationspecificmay choose to pad some '0's at the end of the dataexchanges. No user data, however, is allowedfield so as to align with certain memory boundaries. However, the padded '0' octets, if there are any, shall not betransportedcounted inthis datagram. 4.3the Data Size. The maximal Data Size for a single MDTP datagram is the MTU size of the underlying transport protocol (e.g., UDP) minus the MDTP header size. 3. Transmission InitializationCollision If both3.1 Endpoint Association Initialization Before the first data transmission can take place from one endpoint ("A") to another endpoint ("Z"), the two endpointsattemptwill need to complete an initialization process in order to set up an association between them. The initialization procedure should be made transparent toinitializethecommunication at aboutupper layer protocol, i.e., it should take place automatically whenever theStewart & Xie [Page 15] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 same instance, a collision will occur. Inupper layer tries to send acollision each endpoint will receivedatagram to aninitiationendpoint which has never been sent to before. The user datagram shall be withheld by MDTP from transmission till theother side after it transmitted its own. Both sides must acknowledgecompletion of theinitiation datagram ininitialization. A tag-and-lock mechanism is employed during thenormal procedure as describedinitialization in4.1 The followingorder to guard against erroneous or stale datagrams (this isan example ofespecially true if redundant networks are deployed). The initializationcollision: Endpoint A Endpoint Z [Header Flags=FIR|RES [Header Flags=FIR|RES Mode=options Mode=options Seen=0,Send=Tag_A] --------\ /----- Seen=0, Send=Tag_Z] (Start T1-init timer) \ / (Start T1-init timer) / / \ / \ [Header Flags=FIR|RES|ACK <------/ \ Mode=options \---> [Header Flags=FIR|RES|ACK Seen=Tag_Z,Send=Tag_A]----\ Mode=options \ /------- Seen=Tag_A,Send=Tag_Z] \ / \-------> (Cancel T1-init timer) (Cancel T1-init timer) <------/ .. [Header Flags=ACK|DAT Mode=options Seen=Tag_Z,Send=1] ------------------> .. [Header Flags=ACK|DAT Mode=options <----------------- Seen=Tag_A,Send=1] 4.4 Re-initialization An endpoint is allowed to re-initialize an established communication. In the caseprocess consists ofre-initialization,theendpoint which initiatesfollowing steps (assuming there-initialization (i.e,upper layer at "A" tries to send data to "Z" for theinitiator) should use a tag different fromfirst time): A) "A" first sends an Initiation (FIR) to "Z", with Seen field set to 0 and Send field set to Tag_A, and then enters theone used inTag-lock mode (see below). B) "Z" responds immediately with an Initiation Ack (FIR|ACK), with Seen set to Tag_A and Send set to Tag_Z, and then enters theprevious initialization. The initiatorTag-lock mode, too (see below). Note that no user data shouldfollow the standard initialization procedure as statedbe carried in4.1. Upon the arrival oftheinitiation datagram,Initiation or Initiation Ack datagram. At this point "Z" is ready to send user data to "A". And upon thepeerreceipt of theinitiator shouldabove Initiation Ack from "Z", "A" can alsofollow the procedure stated in 4.1start sending user data torespond. Note that any outstanding flows that were open are considered closed once re-initialized. 4.5 Link Rotation When multiple networks exist between two communicating endpoints, every time"Z". However, theapplication transmits a datagram,first datagram with user data transmitted by "A" to "Z" shall have theMDTP implementation MUST keep track ofSeen value set to Tag_Z, whichnetwork the transmission was sent on (if more than one network exists) inis obtained from theMDTP protocol variable 'last.sent.intf'. IfInitiation Ack. And similarly, the first datagram with userdoes not specifically override rotation, Stewart & Xie [Page 16] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 each send should be rotated in a round robin fashion amongst all available networks anddata transmitted by "Z" to "A" shall have theprotocol variable 'last.sent.intf' should be updatedSeen value set toindicateTag_A, whichinterface was used last. The MDTP implementation should considercomes from therules defined in "5.5 Retransmission on Multiple Networks" to consider ifInitiation datagram. In the Tag-lock mode, each side will silently discard any datagrams with user data from the other side until it receives the first datagram with user data and with anetworkSeen value that matches its own Tag. Once that datagram is"available" The MDTP implementation MUST allowreceived, that endpoint will leave the Tag-lock mode and immediately send back auserdata acknowledgment, and start using the sequence numbers tooverride this rotation defeating MDTP's rotation upon each send. 5. Reliable Transfer Mode Reliable transfer modefilter out missing and duplicate datagrams. If another Initiation from "A" isindicated ifreceived by "Z" after it sent out thesending endpoint setsInitiation Ack, "Z" will acknowledge this Initiation by re-sending theGAR option onInitiation Ack only when thecurrent datagram. IfSend field of this new Initiation has thesending endpoint was previously transmitting in unreliable mode (by setting UNR bit in each previous datagram),same tag as that of thereceiver must resetoriginal Initiation. Otherwise, "Z" will send an Initiation of itsSeen counterown with Send field set to Tag_Z back to "A" to elicit an Initiation Ack from "A". In theSend value of this current datagram upon receiving it. Thefollowingexample illustrates both piggy-backedexample, "A" initiates the association first andnon-piggy-backed acknowledgmentsthen sends a datagram withboth ends transmitting in reliable mode:user data to "Z": Endpoint A Endpoint Z{App sends 3 messages}{first app message to Z} [HeaderFlags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer)Flags=FIR & other options Seen=0,Send=Tag_A] -----------------------> (StartT3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=101,Size=100]-----------> (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=201,Size=100]-----------> (Stop and restart T3-sendT1-init timer){Timer T2 expires} <----------------------------(Enter Tag_A-lock mode) [HeaderFlags=ACK Mode=0 Part=0,Of=0 Seen=301,Send=1] StewartFlags=FIR|ACK &Xie [Page 17] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 (cancel T3-send timer) .. {App sends 1 message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=301,Size=100]-----------> (Start T2-receive timer) (Start T3-send timer) {App sends 1 message} (cancel T2-receive timer) <----------------------------other options /---------- Seen=Tag_A,Send=Tag_Z] / (Enter Tag_Z-lock mode) (Cancel T1-init timer)<-------/ [HeaderFlags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=401,Send=1,Size=45] (Start T3-send timer) (cancel T3-send timer)Flags=ACK|DAT & other options Seen=Tag_Z,Send=1] [data field] -----------\ (StartT2-receive timer) .. {Timer T2 Expires} [Header Flags=ACK Part=0,Of=0 Seen=46,Send=401]------------------> (cancelT3-send timer)In the above example, the first series of 3 messages of 100 octets each are sent by Endpoint A. The messages are unbundled in this example, i.e., each message will be transmitted in a single datagram. Endpoint A starts its send timer T3 after sending the first datagram, and each subsequent send will stop and restart the send\ \----> (Leave Tag_Z-lock mode) If T1-init timerT3, extending the life ofexpires at "A" after thesend timer. Endpoint Z upon receivingInitiation sent, thefirstsame Initiation datagramstartswith thereceive timer T2. When timer T2 in Endpoint Z expires, Endpoint Z transmits an Ack. Upon receipt of this Ack by Endpoint A, it stops timer T3same Tag_A value will be retransmitted anddiscardsthefirst 3 datagrams (held for possible retransmissions). Aftertimer restarted. This will be repeated Max.Init.Retransmit times before "A" considers "Z" unreachable and optionally reports thefirst three messages were transmitted successfully,failure. 3.1.1 Choice of Tag Value Tag values should be selected from theapplication at Endpoint A sends another messagerange of100 octets. After sending this datagram, Endpoint A starts timer T3 again. Upon receipt0x80000000 to 0xffffffff. 3.2 Data Field Format of Initiation Datagrams If redundant networks exist between two endpoints, thedatagram, Endpoint Z starts Timer T2. Before Endpoint Z's T2 timer expires, the application at Endpoint Z sends a messagedata field of45 octets to Endpoint A. This causes Endpoint Z to canceltheT2 timerInitiation andto piggyback anInitiation Ackondatagrams will carry theout-bound datagram being transmitted to Endpoint A. Afterredundant network information. The following shows thetransmission, Endpoint Z then starts its T3 timer. Upon receiptdata field format carrying N IPv4 redundant network information: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Networks = N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of address=8 | Type of Address=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address of Network 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port # 1 | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ ... \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size ofthis datagram Endpoint A cancels its T3 timer (since all data it has sent is acknowledged), and starts a receive timer T2. At the expirationaddress=8 | Type ofthe T2 timer Endpoint A acks the receiptAddress=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address ofthe last datagram from Endpoint Z. This Ack causes Endpoint Z to cancel its T3-send timer. ItNetwork N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port # N | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Additional implementation-specific data isvery important to notice in the above example that the acknowledgments to the received datagrams are always delayed by timer T2. This delay gives the receiving endpoint a window to piggyback the Stewart & Xie [Page 18] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Acks onto subsequent datagrams traveling inallowed after theopposite direction, thusredundant network information. No user data, however, is allowed toavoid sending the Acksbe transported inseparateInitiation or Initiation Ack datagrams.5.1 Timer Control The basic rules for timer control are as follows: A) When all outstanding datagrams are acknowledged, the T3-send timer shall be stopped, if one is running. B) When a datagram with application data (i.e.,3.3 Initialization Collision If two endpoints attempt to initialize an association withDAT flag set) is received,each other at about theendpoint shall startsame instance, aT2-receive timer if no timer is running. C) Upon the expiration of the T2-receive timer, the endpoint shall ack to the sender allcollision will occur, i.e., each side will receive an Initiation datagram from theun-acked dataother side after ithas received. D) Whentransmitted its own. In such adatagram with application data is sent out, the sending endpointcase, both sides shallstart a T3-send timer. Ifacknowledge theT3-send timer is already running,Initiation datagram of the other side in the normal procedure as described above. 3.4 Association Re-initialization An endpoint shallfirst stop the old T3 timer and then startbe allowed to re-initialize an established association with another endpoint. In such anew one. If the T2-receive timer is running,case, the endpointshall first stopthat initiates theT2 timer, piggyback an Ack untore-initialization (i.e, theout-bound datagram, and then startinitiator) shall use aT3-send timer. E) Iftag different from theT3-send timer expires,one used in theendpointprevious initialization. And the initiator shallattempt re-transmission according tofollow therules describednormal initialization procedure as stated in5.5. F) No more than one timersection 3.1. Once left the Tag-lock mode ofany type should be running onthe current association initialization, an endpointat any given moment. G) When a T2-receive timer expires,shall treat anybundled data waiting to be transmitted should be sent immediately with a piggy-backed Ack to acknowledge all un-acked data previously received. H) Whenevernew incoming Initiation from its peer as aT3-send timer is to be started, any running timer should be stopped and supplanted by the T3-send timer. I) In bundling mode, ifre-initialization event. Upon thetotal sizearrival ofall application messages pending to be sent is less thanthebundle size,new Initiation datagram from themessages should be withheld andpeer, theT4-bundle timer should be started. J) Ifreceiving endpoint shall also follow thetotal size of all application messages pendingprocedure stated in section 3.1 tobe sent exceeds the bundle size, the T4-bundle timer should be stopped and the message(s) should be immediately sent. K) If a T4-bundle timerrespond. 4. Reliable Transfer of Datagrams Reliable transfer isrunning and data arrives,indicated if theT2-receive timer should not be started. L) A T4-bundle timer should never be canceled unless it isdatagram beingsupplanted by a T3-send timer. M) Whentransferred has GAR bit set to 1 and thefirstUNR bit set to 0. The receiver of a reliable datagramwith the Tag which unlocksshall always acknowledgment theinitiationsender. Normally, delayed acknowledgment isreceived, no T2-receive timer should be started, instead an Stewart & Xie [Page 19] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999used, and the acknowledgmentmustcan either be sentwithout delay.separately or piggy-backed on a datagram traveling in the opposite direction. The following exampleshows the use of various timers.illustrates both separate and piggy-backed acknowledgments with both ends transmitting in reliable mode: Endpoint A Endpoint Z {App sends23 messages} [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=1,Send=501,Size=100]----------->Seen=0,Send=1,Size=100]-------------> (Start T2-receive timer) (Start T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1 Seen=0,Send=2,Size=100]-----------> (Restart T3-send timer) [Header Flags=DAT|ACK|GAR Part=0,Of=1 Seen=0,Send=3,Size=100]-----------> (Restart T3-send timer) ... {Timer T2 expires} /----------- [Header Flags=ACK / Part=0,Of=0 / Seen=3,Send=1] / (cancel T3-send timer) <------ ... ... {App sends 1 message}Seen=1,Send=601,Size=100]-\ /-- (cancel[Header Flags=DAT|ACK|GAR Part=0,Of=1 Seen=1,Send=4,Size=100]-----------> (Start T2-receive timer)(stop and restart(Start T3-send timer)\ /... {App sends 1 message} (cancel T2-receive timer) /----------- [HeaderFlags=DAT|ACK \ / Mode=GAR \Flags=DAT|ACK|GAR / Part=0,Of=1\ Seen=601,Send=1,Size=100]/\ (Start T3-send timer)Seen=4,Send=1,Size=45] /\ <----/ \--> .. {T3-send timer expires} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=101,Send=601,Size=100]---------> (Cancel(Start T3-send timer)(Restart(cancel T3-send timer) <------ (Start T2-receive timer) .. {Timer T2expires} (Cancel T3-send timer) <--------------Expires} [Header Flags=ACKMode=0Part=0,Of=0Seen=701,Send=101] In this example,Seen=1,Send=5]------------------> (cancel T3-send timer) Note that if theapplication at Endpoint A sends 2 messagesdatagrams previously received from the same sending endpoint was transmitted in Unreliable transfer mode (see Appendix E for details on Unreliable transfer), the receiving endpoint must reset its Seen counter toEndpoint Z. Both messages are 100 octetsthe value of the Send field inlength. Beforethesecondcurrent reliable datagram. 4.1 Timer Management Rules The the following rules shall be used to manage the timers during normal Reliable transfer, unless otherwise stated for some special cases: A) When a reliable datagramarrives at Endpoint Z, Endpoint Z's application sendswith user data (i.e., with DAT flag set) is received, the endpoint shall start amessage to Endpoint A. This causes Endpoint Z to cancel itsT2-receive timer if no other timer is running, andpiggybackupon theAck toexpiration of thefirst received datagram onT2-receive timer, theout-bound datagram destinedendpoint shall ack toEndpoint A. After transmittingthedatagram Endpoint Z starts its T3-send timer. Whensender all theT3-send timer at Endpoint A expires,un-acked datagrams itwill re-send its earlier datagram. The retransmittedhas received. B) When a reliable datagram with user data is sent out, thesame except for now it acknowledges all outstanding packets that Endpoint Z has sent. After retransmitting the datagram Endpoint A restarts itssending endpoint shall start a T3-send timer.The arrival ofIf theretransmitted datagram causes Endpoint Z to cancel itsT3-send timer is already running, the endpoint shall first stop the old T3 timer anddiscardthen start a new one. If theduplicateT2-receive timer is running, the endpoint shall first stop the T2 timer, piggyback an Ack unto the out-bound datagram, andit now Stewart & Xie [Page 20] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 starts its T2-receivethen start a T3-send timer.AtUpon the expiration of theT2-receive timer Endpoint Z sendsT3-send timer, theAck to Endpoint A. Endpoint A upon receipt ofendpoint shall follow theAck Cancels its T3 timer. 5.2 Gap Acknowledgments If a datagram becomes missing during a series of transmissions, a special typerules described in 4.5 for possible re-transmission ofacknowledgment known asthegap Ack will be sent. The gap Ack tellsun-acked datagrams. Whenever thesender ofT3-send timer is started themissing datagramRTT estimate last calculated for thatretransmissionnetwork should be added to the base T3-send timer value (if a RTT value is measured, see section 4.6). C) When all outstanding datagrams are acknowledged, the T3-send timer shall be stopped if one isneeded.still running. The following example shows the use ofgap Ack.various timers. Endpoint A Endpoint Z {App sends32 messages} [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=701,Size=100]-------->Seen=1,Send=6,Size=100]-----------> (Start T2-receive timer) (Start T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=801,Size=100]-----X (lost){App sends 1 message} Seen=1,Send=7,Size=100]---\ /--- (cancel T2-receive timer) (Restart T3-send timer) \ / [Header Flags=DAT|ACK|GAR \ / Part=0,Of=1 \/ Seen=6,Send=2,Size=100] /\ (Start T3-send timer) / \ <----/ ----> ... ... {T3-send timer expires} (re-transmit 2nd datagram) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=901,Size=100]--------> (A gap detected in data)Seen=2,Send=7,Size=100]---------> (Cancel T3-send timer) (Restart T3-send timer) (Start T2-receive timer) ..{T2-receive timer{Timer T2 expires}/------(Cancel T3-send timer) <-------------- [Header Flags=ACK/ Mode=0 / Seen=801,Send=146, / Part=1,Of=1 / data=(long integer)901] (Prepare retransmit) <--------/ In this example, when Endpoint Z receivedPart=0,Of=0 Seen=7,Send=3] 4.1.1 Link Rotation When multiple networks exist between two communicating endpoints, every time thethird datagram from Endpoint A it realizes thatapplication transmits agap existsdatagram, the MDTP implementation MUST keep track of which network the transmission was sent on (if more than one network exists) in thereceived data. AtMDTP protocol variable 'last.sent.intf'. If theexpiration of T2-receive timer, Endpoint Z sends a gap Ack,user does not specifically override rotation, each send should be rotated inplace ofanormal Ack, to Endpoint Around robin fashion amongst all available networks and the protocol variable 'last.sent.intf' should be updated to indicatethe missing data. In the gap Ack, the Part and Of fields are both setwhich interface was used last. The MDTP implementation MUST allow a user to'1', as opposedoverride this rotation defeating MDTP's rotation upon each send. The implementation must also provide a interface to'0' as inadd and remove anormal Ack. The data field of the gap Ack islink from rotation eligibility. 4.2 Gap Acknowledgment for Missing Datagrams If reliable datagrams become missing during afour (4) octet long integer containing the sequence numberseries ofthe last octettransmissions, a special type of acknowledgment known as thegap (which is 901 in this example). The Seen field in the gapGap Ack willcontain the sequence number of the first octet of the gap. Stewart & Xie [Page 21] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Using these two values, Endpoint A shouldbeablesent back tocalculateinform theposition and size ofsender to re-transmit the missingdata (which is 801-900 in this example) and thus determine which datagrams will need to be retransmitted. Gap Acks cannot be piggy-backed with application data.datagrams. The followingis anotherexample shows the use ofusing gap Ack:Gap Ack. Endpoint A Endpoint Z {App sends 3 messages} [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=701,Size=100]-------->Seen=3,Send=8,Size=100]-----------> (Start T2-receive timer) (Start T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=801,Size=100]-----XSeen=3,Send=9,Size=100]-----X (lost) (Restart T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|ACK|GAR Part=0,Of=1Seen=146,Send=901,Size=100]-------->Seen=3,Send=10,Size=100]-----------> (A gapis detected) (Restart T3-send timer) .. {App sends a message} (Cancel T2-receive timer) /------ [Header Flags=ACK / Mode=0 / Seen=801,Send=146, / Part=1,Of=1 / data=(network long)901] (Retransmit missing data) <-----/ [Header Flags=DAT|ACK - [Header Flags=DAT|ACK Mode=GAR / Mode=GAR Part=0,Of=1 / Part=0,Of=1 Seen=146,Send=801,Size=100]- / Seen=801,Send=146,Size=100] (Restart T3-send timer) \ / (Start T3-send timer) \/ /\ <---------/ \ \ Stewart & Xie [Page 22] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 \--> .. {T3-Send timer expires} (Retransmit appdetected in data)(Cancel T3-send timer) <--------------- [Header Flags=DAT|ACK (Start T2-receive timer) Mode=GAR Part=0,Of=1 Seen=1001,Send=146,Size=100](Restart T3-send timer) .. {T2-receive timer expires} /------- [Header Flags=ACKPart=0,Of=0 Seen=246,Send=1001]----------------> (Cancel T3-send timer)/ Seen=9,Send=3, / Part=1,Of=1 / data=(long integer)10] (Prepare retransmit) <--------/ In this example,Endpoint Z detectedwhen "Z" receives the third datagram from "A" it realizes that a gap exists in the received data. At the expiration of T2-receive timer, "Z" sends a Gap Ack, in place of a normal Ack, to "A" to indicate the missingdata when it received the seconddatagram.However, beforeIn theT2-receive timer expired,Gap Ack, theapplication at Endpoint Z requestedPart and Of fields are both set tosend a message (of 100 octets in length). This caused Endpoint Z'1', as opposed tocancel its T2-receive timer and send'0' as in a normal Ack. The data field of thegapGap Ackbefore it sent out the datagramis a four (4) octet long integer containing theapplication message. After transmitting the application message Endpoint Z started its T3-send timer. When Endpoint Z's T3-send timer expired it retransmittedsequence number of thepreviousnext datagramand at the same time acked all of Endpoint A's outstanding datagrams. Uponafter thereceipt ofGap (which is 10 in this example). The Seen field in theretransmission from Endpoint Z, Endpoint A started its own T2-receive timer. AtGap Ack will contain theexpirationsequence number ofits T2-receive timer Endpoint A sent an Ack to Endpoint Z and resolvedtheoutstandingdatagramat Endpoint Z. 5.3 Congestion Control Three different mechanismsof the gap. Using these two values, "A" should beused jointlyable toachieve flow and congestion control in MDTP. First, a limit should be set oncalculate thenumber of out-bound messages queued up at an endpoint. Ifthelimitmissing datagram numbers (which isreached, new send requests from9 in this example) and thus determine which datagrams will need to be retransmitted. Note that Gap Acks cannot be piggy-backed with user data; if there is user data to be sent when a gap is detected, theapplication shouldGap Ack must be sent out first before the datagram carrying user data can be sent. 4.3 Flow and Congestion Controls Several different mechanisms shall berejected until the number of messagesused jointly to achieve flow and congestion controls inthe queue drops back. Secondly, MDTP usesMDTP. 4.3.1 Sending with Window Control The sending endpoint shall use a transmission window to control the number of outstanding datagrams, i.e., datagrams that have been sent, but yet to be acknowledged. The length of the window is defined as the maximal number of outstanding datagrams a sending endpoint can allow. This length is adjusted dynamically, depending on the current number of successful transmissions as well as the number of lost datagrams. When the number of outstanding datagrams reaches the current window length, the endpointmayshall still accept send requests fromthe application,its upper layer, butwillshall transmit no moredatagramdatagrams until an Ack is received.Also,Moreover, when the window length is reached, the next send request from theStewart & Xie [Page 23] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 applicationupper layer will trigger the sending endpoint to transmit a special Window Up message. Upon receiving this Window Upmessage(WIN|ACK) the receiver must respond with a Window Up Responsemessage,(WNR|ACK), as illustrated by the followingdiagram (assumeexample (assuming current window length is 3): Endpoint A Endpoint Z {App sends 3 messages} [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|GAR|ACK Part=0,Of=1Seen=146,Send=1001,Size=100]-------->Seen=0,Send=11,Size=100]-----------> (StartT2-receiveT2-recv timer) (Start T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|GAR|ACK Part=0,Of=1Seen=146,Send=1101,Size=100]-------->Seen=0,Send=12,Size=100]-----------> (Restart T3-send timer) [HeaderFlags=DAT|ACK Mode=GARFlags=DAT|GAR|ACK Part=0,Of=1Seen=146,Send=1201,Size=100]-------->Seen=0,Send=13,Size=100]-----------> (Restart T3-send timer) {App sends1 messages} { queue 100 bytea new message} (queue new message}and send Win Up) [Header Flags=WIN|ACKSeen=146,Send=1301]----------------->Seen=0,Send=14]--------------------> (cancelT2-receiveT2-recv timer)/---/----- [HeaderFlags=ACK / Mode=WNRFlags=WNR|ACK / Part=0,Of=0 /Seen=1301,Send=146]Seen=14,Send=0] [HeaderFlags=DAT|ACK <---------/ Mode=GARFlags=DAT|GAR|ACK <--------/ Part=0,Of=1Seen=146,Send=1301,Size=100]-------->Seen=0,Send=15,Size=100]-----------> (StartT2-receiveT2-recv timer) (Restart T3-send timer) Inthisthe above example, after the transmission of the first three datagrams,Endpoint A"A" reached its window length. The next message from theapplicationuser triggered a Window Upmessagethat was sent toEndpoint Z."Z". The Window Upmessage always containsshall contain nodata and has its WIN flag set.user data. In response,Endpoint Z"Z" cancelled timer T2 and immediately sentan Ack with the WNR set in the Mode field.a Window Up Response. The arrival of thisAck from Endpoint ZWindow Up Response effectively resolved all the outstanding datagrams atEndpoint A,"A", thus allowedEndpoint A"A" to send out the next datagram. 4.3.2 Window Length Adjustment The window lengthisshall be initially set to 2, andisshall then be dynamically adjusted based on theperformancedatagram loss and acknowledgment conditions of the underlyingnetworks. If the current window length is equal to or greater than 4, every time Stewart & Xie [Page 24] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 whennetwork. When 4 consecutive outstanding datagrams are acknowledged at once by the receiver, the sender's window length will be raised by 1 until it reaches20.the protocol parameter 'Max.Outstanding.dg' (which should be a user configurable parameter). If the current window length is less than 4, every time when the number of consecutivelyacknowledgedoutstanding datagrams acknowledged in a single Ack is equal to or greater than the current window length, the sender's windowwilllength shall be raised by11, until it reaches20. The'Max.Outstanding.dg'. In the following circumstances, the sender's window lengthwillshall be decreased. However, when the window length reaches 2 it shall not be decreasedif datagram loss occurs.any further. If between 1 to 3 consecutive datagrams are lost, the window length will be decreased by 1. If between 4 to 7 datagrams are lost, the window length will be decreased by 2. If 8 or more datagrams are lost, the window length will be decreased by 4.When the window length reaches 2 it will not be decreased any further.Moreover, any time a Window Up is sent to the receiving endpoint the sender's window length will be decreased by 1. Also, if a timeout forces a retransmission the sender's window length will bedecreased by 1. Moreover if a duplicate Ack is received by a sender, this should indicate a network congestion situation and the numberreduced to half ofoutstanding packets allowed should be decreased by 4.its currently value. The following table summarizes these rules: - ----------------------------------------------------------------------- Duplicate Ack received by sender | Adjust down by 4 - ----------------------------------------------------------------------- Greater than 8 datagrams lost | Adjust down by 4 - ----------------------------------------------------------------------- Greater than 4 datagrams lost | Adjust down by 2 - ----------------------------------------------------------------------- Greater than 0 datagrams lost | Adjust down by 1 - ----------------------------------------------------------------------- Timeout forces retransmission | Adjust down by11/2 of the current | window. - ----------------------------------------------------------------------- Window Up sent | Adjust down by 1 - ----------------------------------------------------------------------- 4 or more consecutive datagrams | Adjust up by 1 acknowledged (window length > 4) | - ----------------------------------------------------------------------- 1/2 Window length or more acked | Adjust up by 1 (window length<=4) | ----------------------------------------------------------------------- Finally, the third flow control mechanism is to exchange incoming queue information between the two communicating endpoints. By using the In Queue field in the MDTP header, the sender can inform the receiver the number of pending datagrams which the sender has received, but yet to deliver to its application. The following example shows how the endpoints use In Queue value to accomplish flow control. Assume that Endpoint A sent Endpoint Z 20 datagrams, and when Endpoint Stewart & Xie [Page 25] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Z acked the receipt of all the 20 datagrams, only the first one of the 20 datagrams was delivered to the application at Endpoint Z. In the last Ack sent by Endpoint Z, the In Queue field would then have a value of 19, indicating the number of datagrams pending for delivery to its application. This value would be checked by Endpoint A before it sent the next datagram to Endpoint Z. If this value was found to be greater than its current window length, Endpoint A would not send the next datagram. Instead, Endpoint A would start its T3-send timer and send a Window Up message to Endpoint Z at the expiration of the timer. This would force Endpoint Z to send an Ack with an updated In Queue value. If the new In Queue value was still greater than its window length, Endpoint A would restart its T3-send timer, repeating this procedure until<=4) | - ----------------------------------------------------------------------- 4.3.3 Flow Control using In-Queue Information By using the In Queuevalue of Endpoint Z dropped below the current window length of Endpoint A. Then, the transmission at Endpoint A would resume. 5.4 Sequence Number Reset It may become necessary for an endpoint to resetfield in thesequence number while it is sending data to a peer. However,MDTP header, theendpoint mustsender can inform thepeer about this event by: 1) sending a Window Up message to forcereceiver thepeer to acknowledge all receivednumber of pending datagrams whichhave not been acknowledged, and 2) sending the next datagram with RES bit set in the Flags field. 3) A sending endpoint should always reset it sequence counter before the counter reaches 0x7fffffff. When the counter reaches this valuethesending endpoint is requiredsender has received, but yet to deliver toreset its sequence counter. 4) A sending endpoint should never resetitssequence counter until after reaching 0x7fff05ff. Note: This section will be obsoleted in a future version of the draft and be replaced by a deterministic roll-over algorithm.application. The following exampleillustratesshows how thesequence number reset procedure (assumeendpoints use In Queue value to accomplish Flow control. Assume that Endpoint Aopts to do a reset when the data sequence number becomes greater than 0x7fffff000). Stewart & Xie [Page 26] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999has sent EndpointAZ 20 datagrams, and when Endpoint Z{App sends 2 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=46,Send=0x7ffff000,Size=100]----> (Start T2-receive timer) (Start T3-send timer) (Reset sequence number) [Header Flags=WIN|ACK Seen=146,Send=0x7ffff100]------------> (cancel T2-receive timer) /------- [Header Flags=ACK / Mode=WNR / Part=0,Of=0 / Seen=7fffff100,Send=46] (Cancel T3-send timer) <------/ [Header Flags=DAT|ACK|RES Mode=GAR Part=0,Of=1 Seen=46,Send=2,Size=100]-------------> (Start T2-receive timer) (Restart T3-send timer) .. {App sends 1 message} (cancel T2-receive timer) (Cancel T3-send timer) <---------------- [Header Flags=DAT|ACK (Start T2-receive timer) Mode=GAR Part=0,Of=1 Seen=102,Send=46,Size=100] (Start T3-send timer) Insends an Ack on theabove example, after transmittingreception of these 20 datagrams, only the firstdatagramone of them has been delivered to the upper layer at EndpointA determines that its data sequenceZ. In the Ack sent by Endpoint Z, the In Queue field would then have a value of 19, indicating the numberneedsof datagrams pending for delivery to its upper layer. This value would beresetchecked by Endpoint A before ittransmitssent the next datagram to Endpoint Z. If this value was found to be greater than its current window length, Endpoint A would not send the next datagram.It first sends outInstead, Endpoint A would start its T3-send timer and send a Window Up message to Endpoint Z at the expiration of the timer. This would force Endpoint Z to sendback a Window Up Response to ack allanother Ack with an updated In Queue value. If theoutstanding received data.new In Queue value was still greater than its window length, Endpoint A would re-start its T3-send timer, and repeat this procedure until the In Queue value of Endpoint Z dropped below the current window length of Endpoint A. Then,it transmitsthedatagram it has been withholding,transmission at Endpoint A would resume. 4.3.4 T3-send Timer Adjustment with RTT If thenew sequence numberRTT measurement is available on a specific network, the sender shall adjust the T3-send timer each time when sending datagram using this network. The calculation and adjustment of theRES flag set. Upon detectingtimer should follow theRES flagmethod described in [4]. RTT measurement shall be tracked for each network if redundant networks are in use. MDTP defines two optional methods to obtain RTT measurements, see sections 4.6 and 4.7. 4.4 Sequence Number Reset When the datagram sequence number reaches theheader ofvalue 0x7fffffff theincoming datagram, Endpoint Z resets its datanext sequencecounter on Endpoint A. 5.5 Retransmission on Multiple Networksnumber shall be set to 1. 4.5 Datagram Re-transmission Whenever a T3-send timer expires, the endpointwill take one of the following three actions: A) Ifshall re-transmit thecurrent window length is not reached (see 5.3) and there is application data pending, a newun-acked datagramwill be sent out. B)that has the lowest Send value, unless: A) If the current window length is reached, a Window Up message will be sentout. C)out (see 4.3 Congestion Control), or B) If the current window length is notreached, but there is no pending Stewart & Xie [Page 27] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 application data to send, The datagram with the lowest Send value that is still outstanding (i.e., not been acked) will be retransmitted. When multiple networks exist between two communicating endpoints, the re-transmission should be attempted on the network specified in the MDTP protocol variable 'last.good.intf'. The value of 'last.good.intf' is always updated to refer to the network on which the last datagram from the peer endpoint arrived. Moreover, the number of consecutive re-transmissions is also recorded in a variable 'retran.count' for each network. Every time a datagram is received from a network, the corresponding retran.count is reset to '0'. If the value in the retran.count of the current network exceeds a half of the value of the protocol parameter 'Max.Retransmit', the 'last.good.intf' will be changed, so as to force the next re-transmission to be directed to an alternate network. The total number of consecutive re-transmissions across all the networks is also recorded. If this value exceeds the limit defined by 'Max.Retransmit', the sending endpoint should consider the peer endpoint unreachablereached andstop transmittingthere is still user datato it,pending for transmission, a new datagram with user data shall be sent out andoptionally report the failure. 5.5.1 Randomization of theT3-send timerat retransmissionshall be restarted. When a T3-send timer is startedafter retransmittingat apacket,re-transmission, thevaluelength of the next T3-send timer for this destination should beextended by a random amount. The amount must be bounded so that the application can predict with some reasonable degree of precision when the destination endpoint is declared unreachable. For performance considerations, this can be implemented by pre-calculating a set of random valuesdoubled andthen using a different value to extendtheT3-send timerlast estimated RTT value foreach re-transmission to the same destination endpoint. 5.6 Termination of an Endpoint When an endpoint terminates, itthat network shouldsend a shutdown message to each of the peer endpoints it has ever initiated for a communication. The shutdown message is sent in unreliable transfer mode and need not tobeacknowledged. When an endpoint receives a shutdown message from its peer, it will remove the sender from its record, and optionally report the termination of that peer. The following sequence shows an example of the termination of an endpoint (Endpoint A). Endpoint A Stewart & Xie [Page 28] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 {App indicates termination} [Header Flags=FIR Mode=SHU Seen=146,Send=1301,------------------------> to Endpoint X [Header Flags=FIR Mode=SHU Seen=1496,Send=101,------------------------> to Endpoint Y [Header Flags=FIR Mode=SHU Seen=1460,Send=201-------------------------> to Endpoint Z As shown in this example, the shutdown message is indicated by having both FIR flag and SHU mode bit set. Also, notice that no acknowledgment is sent back by Endpoint X, Y, or X. 5.7 Endpoint Drain An endpoint may decideadded to"drain" a connection without completely shutting it down. By draining a connection, both endpoints will remove any record and pending datagrams associated withtheconnection. Further communicationstimer. 4.5.1 Re-transmission on Redundant networks When redundant networks exist betweenthetwoendpoints cancommunicating endpoints, the re-transmission shall beresumed by going through a re-initialization procedure. A "drain" message is specified withattempted on theUNR bit setnetwork specified ina shutdown message. No Ack is required for a "drain" message.the MDTP protocol variable 'last.good.intf'. Thefollowing sequence shows an example. Endpoint A {App indicates termination} [Header Flags=FIR|UNR Mode=SHU Seen=146,Send=1301]------------------------> to Endpoint X 5.8 Advisory Acknowledgments. To increase bandwidth utilization a sending endpoint may (at its option) request an advisory acknowledgment. A endpoint would typically do this when 1/2value ofits window'last.good.intf' isunacknowledged and upon itsalways updated to refer to the network on which the last datagramthat will fill its window. Upon receptionfrom the peer endpoint arrived. Moreover, the number of consecutive re-transmissions is also recorded in aadvisory Acknowledgment requestvariable 'retran.count' for each network. Every time a datagram is received on a network, thereceivercorresponding 'retran.count' shallwith no delay transmit an acknowledgmentbe reset to 0. If the value in the 'retran.count' ofall received packets canceling any T2-Receive timer that maythe current network exceeds half of the value of the protocol parameter 'Max.Retransmit', the 'last.good.intf' will berunning. The sequence would lookchanged, so asfollows: Stewart & Xie [Page 29] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=101,Size=100]-----------> (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR|RE1 Part=0,Of=1 Seen=1,Send=201,Size=100]-----------> (Stopto force the next re-transmission to be directed to an alternate network and optionally report a failure condition. The total number of consecutive re-transmissions across all the networks in an association is also recorded. If this value exceeds the limit defined by 'Max.Retransmit', the sending endpoint shall consider the peer endpoint unreachable andrestart T3-send timer) (cancel T2-receive timer) <---------------------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=301,Send=1] 5.9stop transmitting data to it, and optionally report the failure. 4.6 RTT Measurement This defines the mechanism for round-trip-time (RTT) measurement in MDTP. Onoccasionoccasions eitherendside of an association maywishneed todo a Round Trip Timeperform an RTT measurement ofa network. There are two methodsthe network (or one ofmeasuring Round Trip Time. Methodthe redundant networks) between them. 4.6.1 RTT Datagram Header Format The following shows the header format an endpoint shall use for RTT measurement: MDTP Header Format - RTT measurement 0 1involves a ping-pong using a special ACK, Method2involves a rider on top of a datagram. If Method3 0 1 2is invoked then the Round Trip3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1 C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent Time Int-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent TimeincludesInt-2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Two long integers are used in the data field to carry theT2-Receive timer (this actually may be more useful then pure RTTtimesince each endpoint may have a different T2-Receive timer value). Method 1: When avalue. The RTT datagram is identified by setting the RTC or RTM bit to 1. 4.6.2 Measure RTT AT the request of its upper layer, an endpointwishes ashall initiate an RTT measurementit shall send a ACKby sending an RTT datagram withRE2 set to 1, GAR set to 1GAR, ACK, andDATRTC bits set to0.1 (to a specific network if redundant networks exist). No user data shall be carried. The sendershouldshall also place in TimeInt 1Int-1 and Timeint 2Int-2 the value of the current time of day inseconds/microseconds.seconds and microseconds. Uponreceiptthe reception ofa datagram with RE2 set to 1, GAR set to 1 and DAT set to 0,this RTT datagram, the recipientshouldshall immediately return the datagram to the senderover(over thearrivingsame networkwith the NOG bit set. The sender can then use the Time Int 1 and Time Int 2 to calculate the current RTT. Stewart & Xie [Page 30] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z RTT - Request Now=x.y [Header Flags=ACK Mode=GAR|RE2 Part=0,Of=1 Seen=1,Send=301,Size=0 Time-Int1=x Time-Int2=y]-------------> <---------------------------- [Header Flags=ACK|NOG Mode=0 Part=0,Of=0 Seen=301,Send=1 Time-Int1=x Time-Int2=y] Endpoint A uses current time subtracted from X.y (in arriving Datagram) to calculate the RTT. Method 2: If a endpoint wishes to piggyback a RTT test including the T2-Timer at the remote endpoint the sending endpoint fills out the datagram in the normal way for reliable communication but also sets the RE2 flag, and places at the end ofon which the datagram(outside the length of the data) two long integers has a trailer. When the receiving endpoint recognizes the RE2 flag, it should extractarrives if redundant networks exist), with thetwo integersRTM andplace them in internal storage until the next datagram is scheduledACK bits set tobe returned (i.e. at the expiration of the T2-Recv timer). If the The T2-Recv timer expires the receiving endpoint should send the acknowledgment as above with the addition of the NOB flag as well. If the receiving endpoints upper layer sends a datagram causing1. Upon theT2-Recv timer to be canceled thenreception of this reply, thedatagram should includesender shall use theTrailing integersTime Int-1 andhaveTime Int-2 in theNOB flag set. In cases where a intervening Window UP is receivedreply datagram to calculate thereceiving endpoint should respond with a window Up Response (perRTT (of thewindow up procedure) but NOT cancel its T2-Recv timer. Stewart & Xie [Page 31] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 1 - T2-Recv timer expiresspecific network if redundant networks exist). Endpoint A Endpoint Z RTT - Request Now=x.y [HeaderFlags=ACK|DAT Mode=GAR|RE2Flags=ACK|GAR|RTC Part=0,Of=1Seen=1,Send=301,Size=100 {data of 100 octets}Seen=1,Send=31,Size=0 Time-Int1=xTime-Int2=y]-------------> (started T2-Recv) {T2-Recv Expires } <----------------------------Time-Int2=y] -----------------------> ------ [HeaderFlags=ACK|NOG|NOB Mode=0Flags=ACK|RTM / Part=0,Of=0Seen=301,Send=1/ Seen=31,Send=1 / Time-Int1=x / Time-Int2=y]Example 2 - Datagram causes T2-Recv timer cancel Endpoint/ (Endpoint AEndpoint Z RTT - Request Now=x.y [Header Flags=ACK|DAT Mode=GAR|RE2 Part=0,Of=1 Seen=1,Send=301,Size=100 {data of 100 octets} Time-Int1=x Time-Int2=y]-------------> (started T2-Recv) {datagram sent by application} (cancel T2-Recv) <---------------------------- [Header Flags=DAT|ACK|NOG Mode=GAR Part=0,Of=1,Size=100 Seen=401,Send=1 {data of 100 octets} Time-Int1=x Time-Int2=y] 5.10uses <----------- current time subtracted from x.y to calculate RTT) 4.7 Link Heart BeatAckThis defines the mechanism for activating and transmitting of link heart beats in MDTP. At request bythe application, the user may wish a Heart Beat acknowledgment sent. The Heart Beat should only be allowed to be enabled if the senders Mod is Gar (reliable delivery) and version is 2. Once enabled when no datagrams are being transmitted, a T5-Heart Beat timer should be started. When the T5 timer expiresits upper layer, an endpoint shall enable heart beat on aACK should be sent using the next available link, following the link rotation procedure outlinedspecific peer with which it has an established association in"4.5 Link Rotation". After sending the Ack another T5-Heart Beat timer should be started. If, beforetheexpiration of T5-Heart Beat, aReliable transfer mode. The RTT datagramis transmitted or received, Stewart & Xie [Page 32] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 the T5 timer should be stopped and the appropriate T2-T4 timer shoulddefined in section 4.6.1 shall bestarted. The T5 timer hasused as thelowest precedence of all timers. When sendingHeart Beat. After having heart beat enabled, the endpoint shall transmit a Heart BeatAck, the format should beto thatofspecific peer and start a T5-heartBeat timer. The peer shall immediately respond to the Heart Beat in the same manner as an RTTtime test.as described in section 4.6. Thiswill requireresponse shall be stored by thereceiverfirst endpoint (also can be used torespond onupdate its RTT measurement). When thenetwork. IfT5-heartBeat timer expires, thesender does not get a response onendpoint shall first check if thenetworkprevious heart beat has been responded (on theheartbeat arrived on bysame network it was sent in thetime a next heartbeat is to be sent, thencase of redundant network). If not, the network that the lastheartbeatHeart Beat was sent uponshouldshall be counted as a transmissionfailure hasfailure, and be handled following the rules described in section"5.5 Retransmission on Multiple Networks", and should counted against4.5. Then, the'retran.count' and protocol parameter 'Max.Retransmit'. 6. Unreliable Transfer Mode The unreliable transfer mode allows two endpoints toendpoint shall sendto each other without acknowledging the receiving. This can usually achieve higher data throughput than the reliable transfer mode. To indicate the unreliable transfer mode the sender of a datagram simply sets the UNR inanother Heart Beat and re-start themode field. The following sequence illustrates unreliable data transfer. Endpoint A Endpoint Z {App sends 2 messages} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=1,Send=11001,Size=100]--------> [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=1,Send=11101,Size=100]--------> {App sends 1 message} <------- [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=11201,Send=1,Size=450] {App sends 2 more messages} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11201,Size=100]------> [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11301,Size=100]------> Note that no timers are started by either end. Also note that even Stewart & Xie [Page 33] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 though both ends are in UNR mode,T5-heartBeat timer. In theACK flag is still set bycase where redundant networks exist, thesendersending of Heart beats shall follow thedatagram. This means that the Seen fieldlink rotation rules outlined in section 4.1.1. If, before the expiration of T5-heartBeat timer, a datagramheaderisstill valid to indicating the sequence number of the last octettransmitted or received by thesender. However,endpoint, thesender makesT5-heartBeat timer shall be stopped and the appropriate T2-T4 timer shall be started. In other words, the T5-heartBeat timer has the lowest precedence. When noclaim asdatagram towhether pieces of datasend and no other timers aremissing.running, the T5-heartBeat timer shall be start and the above procedure shall continue. Theupper application cansuggested interval for T5-heartBeat timer is 4000 ms. 4.8 Advisory Acknowledgment This defines the mechanism for sending and handling of the Advisory Acknowledgments in MDTP. An endpoint may usethis information to help detecting missing or duplicated pieces. In unreliable mode, MDTP makes no effortAdvisory Acks tore-transmit missing data orincrease bandwidth utilization when transmitting over a reliable association. An Advisory Ack shall be indicated by setting RE1 flag toscreen out duplicated datagrams. 6.1 Ordered reception In unreliable transfer if1 in thesender setsdatagram. The endpoint shall send an Advisory Ack to its peer when it reaches half of its current window length, and also when it detects that theRE1 bitnext send will reach thereceiver should orderfull window length. Upon thedatagrams upon arrival. Any datagrams that have not been read byreception of an Advisory Ack, thereceivers application should be ordered so thatpeer endpoint shall immediately acknowledge all the datagramswill beit has receivedin order the datagrams were transmitted (using the sendStartsAt field). If a datagram arrives after a new datagrambut yet acked upon, and then cancel thedatagram should be discarded. The sequence would look as follows:T2-recv timer if one is still running. The following shows an example of using Advisory Ack: Endpoint A Endpoint Z {App sends43 messages} [HeaderFlags=DAT|ACK Mode=UNR|RE1 Part=0,Of=1 Seen=1,Send=11001,Size=100]--------> [Header Flags=DAT|ACK Mode=UNR|RE1Flags=DAT|GAR|ACK Part=0,Of=1Seen=1,Send=11101,Size=100]\ /--> \ / \ / (User reads/Receives allSeen=0,Send=1,Size=100]-------------> (Start T2-recv timer) (Start T3-send timer) [HeaderFlags=DAT|ACK \ / datagrams 11001 & 11201) Mode=UNR|RE1 \Flags=DAT|GAR|ACK Part=0,Of=1/ \ Seen=451,Send=11201,Size=100]/ \---> { Datagram is discarded }Seen=0,Send=2,Size=100]-----------> (Restart T3-send timer) {detects window half full, use Advisory Ack} [HeaderFlags=DAT|ACK Mode=UNR|RE1Flags=DAT|GAR|ACK|RE1 Part=0,Of=1Seen=1,Send=11301,Size=100]\ /--> \ /Seen=0,Send=3,Size=100]------\ (Stop and restart T3-send timer) \/\----> (cancel T2-receive timer) <---------------------- [HeaderFlags=DAT|ACK \ / Mode=UNR|RE1 \ Part=0,Of=1 / \ Seen=451,Send=11401,Size=100]/ \--->(User reads/Receives all datagrams in order 11301 & 11401) Stewart & Xie [Page 34] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 7. Reliable flows A flow is a ordered reliable sequenceFlags=ACK Part=0,Of=0 Seen=3,Send=1] 4.9 Termination ofdatagrams that is deliveredan Association When an endpoint terminates, it shall send a Shutdown datagram (FIR|SHU) to each of thereceiverpeer endpoints inorder without constraint to other flows. Thereall its existing associations. The Shutdown datagram itself isa set way to initiate (open) a flowsent in unreliable transfer mode andclose a flow. Each flow is initiated by the sender. Multiple flows maythus needs not to beinitiated between two endpoints at the same time. Once initiatedacknowledged. When aflowpeer endpoint receives the Shutdown, it willfollowremove thesame retransmissionsender from its record, andlink rotation schema's hasoptionally report theresttermination ofMDTP. However each flow is independentthe sender to the upper layer. The following shows an example of the termination of Endpoint A: Endpoint A {App indicates termination} [Header Flags=FIR|SHU Seen=3,Send=14, ------------------------> to Endpoint X [Header Flags=FIR|SHU Seen=1496,Send=101,------------------------> to Endpoint Y [Header Flags=FIR|SHU Seen=14,Send=2 -------------------------> to Endpoint Z 4.10 Draining of an Association An endpoint in a association may decide to "drain" the association without completely shutting it down. By draining an association, both endpoints will remove anyother flow, so if datagram 1record and2 of flow 5 arrives, butpending datagrams associated with the association. Further communications between the two endpoints can be resumed by going through a re-initialization procedure (see section 3). In such a case, a Drain datagram1 of flow 4(FIR|SHU|UNR) islost (having beensentahead of flow 5's datagrams), flow 5's datagrams are deliveredto theapplication without blocking for retransmissionpeer endpoint of thelost datagram from flow 4 (datagram 1association, and no Ack is required. The following sequence shows an example offlow 4). All flow related datagrams will have the NOB bit set. Each flow will also have a separate timer associatedDraining: Endpoint A {App indicates draining} [Header Flags=FIR|SHU|UNR Seen=146,Send=1301]------------------------> to Endpoint X 5. Interface withit that is uniqueupper level protocols The upper layer protocols (ULP) shall request for services by passing primitives to MDTP anddifferentshall receive notifications fromany non-flow related timers that are running.MDTP for various events. TheSeen and Send fields will be broken downprimitives andinterpreted in the following manner. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow Number | Datagram number in flow | (Seen) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow Number | Datagram numbernotifications described inflow | (Send) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Send field will contain the flow number ofthisdatagram, flow 0 is always reservedsection should be used as a guideline for implementing MDTP. A) Init.MDTP primitive This primitive allows MDTP to initialize its internal data structures and allocate necessary resources for setting up its operation environment. Note that once MDTP isNOT used. The datagram number is the sequential numberinitialized, ULP can communicate directly with any other endpoints without re-invoking this primitive. Mandatory attributes: None. Optional attributes: The following types of attributes may be passed along with thedatagram. The Seen field is usedprimitive: o Timer selection and its operation syntax -- toacknowledge receipt ofindicate to MDTP an alternative timer theindicated datagramMDTP should use forthe specified flow. The flow number in the acknowledgment does NOT needits operation. o Initial MDTP operation mode; o IP port number, if ULP wants it to bethe same as the flow number in the Send field.specified; B) Send.Data primitive Thisformatisonly used for flow datagrams. A flow can have bundledthe main method to send datagrams via MDTP. Mandatory attributes: o data(see section 9) but cannot have fragmented messages. The reason fragmented messages are not supported- This istwo fold, to attemptthe payload ULP wants tosimplifytransmit; o size - The size of theflows a little bit. And flows are thoughtpayload in number ofhas call control related limiting there size to be no larger than one datagram per message. If a flow packetoctets; o to-address - The IP address and port numberreaches 0xffff, thenof thenext packet number should wrap to 1. Before a flowintended receiver. In case of redundant networks, to-address can beused it must be initiated, afterany one of theflow is complete it shouldmultiple IP addresses of the receiver. The network which the datagram will actually beclosed. Note it is assumed that before any flows cansent through will beopeneddetermined by MDTP due to the link rotation, unless the current mode prohibits MDTPinitiate sequence has taken placelink rotation; in such case the datagram will be sent through the network specified by to-address (see section4). When4.5). Optional attributes: o mode-flags - This indicates a new MDTPinitiate sequence occurs, any endpoint being re-initialized will cause a closing of all outstanding flows during that re-initialization. Before opening a flowoperation mode, taking effect immediately including theopening end should verifycurrent datagram send; o context - optional information that will be carried in theversion numberSend.Failure notification to the ULP if the transportation of this datagram fails. C) Receive.Data primitive This primitive shall return the first datagram in thereceivingMDTPendpointin-queue to ULP, if there isat least 3. Ifone available. It may, depending on theversion number is less than 3 thenspecific implementation, also return other informations such as theMDTP endpoint must NOT attempt to open a flow. Stewart & Xie [Page 35] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 7.1 Initiating a flow. A flowsender's address, whether there are more datagrams available for retrieval, etc. The behavior isinitiated by sending a Flow Initiate/Close Message. In all flowundefined if no datagramthe NOB bitisset. Foravailable when this primitive is invoked. Mandatory attributes: o buffer - theFlow Initiate Messagememory location indicated by theUNR mode bit set as well. The Acknowledgment number (Seen)ULP to store the received datagram and other information. Optional attributes: None. D) Data.Arrive notification MDTP shall invoke this notification on theSequence Number (Send)ULP when a datagram isset to 0 unlesssuccessfully received and ready for retrieval. E) Send.Failure notification If a datagram can not be delivered MDTP shall invoke thisisnotification on thefirst message in which caseULP. The following may be optionally passed with theTAG unlock value is set innotification: o data - theSendlocation ULP can find the un-delivered datagram. o context - optional information associated with this datagram (seesection 4.1). Until13.2). F) Link.Status.Change notification When aflowlink isopen successfully a receiver of a non-opened flow datagram will silently discard the datagram. Upon sendingmarked down (e.g., when MDTP detects aflow initiationlink failure), or marked up (e.g., when MDTP detects aT3-Send timer will be startedlink recovery), MDTP shall invoke this notification onflow 0. The timer will followthesame rules for retransmission and timing as outlined in section 5.ULP. The followingillustration demonstratesshall be passed with theopeningnotification: o link-address - This indicates the IP address offlow 5: Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB Mode=UNR Part=0,Of=1 Seen=00000000,Send=0x0000 0000,Size=0, flow=0x0005 dg=0000 ]------> (Start T3-send timer f=5) (Cancel T3-send timer f=5) <----------------- [Header Flags=NOB|ACK Mode=UNR Part=0,Of=1 Seen=0x00000005,Send=0x00000000, Size=0, flow=0000 dg=0000] Intheabove example note that for flow 0, unlike all others, no T2-Recv timer is ever started. Each flow open/close must be independently acknowledged. Note also that inaffected link; o new-status - This indicates thereply acknowledgmentnew status of theACK bitlink; G) Communication.Up notification This notification isset. If unlikely event that Endpoint-Z wishedused when MDTP becomes ready topiggy back the open of flow 5 withsend or receive datagrams, or when aflow open of its ownlost communication to an endpoint is restored. The following shall be passed with thesequence would look as follows: Stewart & Xie [Page 36] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB Mode=UNR Part=0,Of=1 Seen=0,Send=0, Size=0, flow=5, dg=0 ]------> (Start T3-send timer f-5) {App Initiates flow 8} (Cancel T3-send timer f-5) <----------------- [Header Flags=NOB|ACK Mode=UNR Part=0,Of=1 Seen=5, Send=0, Size=0,flow=0008 dg=0000] (Start T3-send timernotification: o status -f8) [Header Flags=NOB|ACK Mode=0 Part=0,Of=1 Seen=8,Send=0,Size=0, flow=0, dg=0]-------------------------------->(Cancel T3-send timerThis indicates what type of event that has occurred; o endpoint-id -f8) NoteThe IP address and port number to identify the endpoint; H) Communication.Lost notification When MDTP loses communication to an endpoint completely or detects thatattheinitiate ofendpoint has performed aflow, the timer started is considered the first timer for the flow, butshut-down operation, itis sent over flow 0. Note also that a piggyback open is not allowed ifshall invoke this notification on theTAG sequences have not been exchanged. 7.2 Flow acknowledgments Normal dataflow's followULP. The following shall be passed with thenormal MDTP transmission formats (see section 5) Acknowledgments when possible are piggy-backed on datagrams. Each flow maintains its own send timer. When no piggybacknotification: o status - This indicates what type ofdataevent that has occurred; o endpoint-id - The IP address andacknowledgments is possible, more than one flow can be be acknowledged at the same time by usingport number to identify theFlow Extend Acknowledgment format.endpoint; o packets-enqueue - TheSend field (now considered thenumber and location ofextended acknowledgments) will containun-sent datagrams still holding by MDTP; o last-acked - the sequence numberof acknowledgments in the array. During data transfer if the whenlast acked by that peer endpoint; o last-sent - thedatagramsequence numberreaches 0xfffflast sent to that peer endpoint; I) Change.Link.Rotation primitive When thenext packet should be labeled 1. Pkt 0 is never usedupper layer wants to inform MDTP to make a specific network eligible or ineligible fordatagram transfer. One T2-Recv timerin link rotation, the upper layer will send this primitive to MDTP. Mandatory attributes: o action - This indicates if the network ismaintainedto be made eligible or ineligible forall flows. If more than one flowlink rotation. o network-id - This isbeing timedthe IP address anda datagram is to be transmitted then oneport of theflows willnetwork to beacknowledgedadded or removed from link rotation consideration. J) Open.Stream primitive This shall be used by the upper layer to open a new stream. Mandatory attributes: o endpoint-id - The IP address and port number to identify theT2-Recv timer will be left running until expiration,peer endpoint to whichwill then causetheFlow Extended Acknowledgmentstream is to besent, acknowledging all remaining flows. The following examples illustrate examplesopened. An association must have existed at the time offlow acknowledgments. For this example we assumestream open. Returned attributes: o The stream number thatEndpoint A has 3 flows open 5,7 and 9. Endpoint Z has 4 flows open 0x11, 8 4 and 1. Stewart & Xie [Page 37] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 1: Endpoint A sends to Endpoint Z T2-Recv timer expires Endpoint A Endpoint Z { App sends first datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0001,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { T2-Recv Timer Expires } (Cancel T3-send timer) <--------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00050001,Send=0x00000000, Size=0] (Start T3-send timer) Example 1: Endpoint A sends to Endpoint Z T2-Recv timer expires Endpoint A Endpoint Z { App sends first datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0001,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { T2-Recv Timer Expires } (Cancel T3-send timer) <--------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00050001,Send=0x00000000, Size=0] (Start T3-send timer) Stewart & Xie [Page 38] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 2: Endpoint A sends multiple messagesis opened. K) Close.Stream primitive This shall be used by the upper layer toEndpoint Z and T2-Recv timer expires Endpoint A Endpoint Z { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0002,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 datagram on flow 9} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0009 0004,Size=20]------> (Start T3-send timer-f9) { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0003,Size=20]------> { App sends 1 datagram on flow 7} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0007 0011,Size=20]------> { T2-Recv Timer Expires } (Cancel T3-send timer-f5) <-------------- [Header Flags=NOB|ACK (Cancel T3-send timer-f9) Mode=REL (Cancel T3-send timer-f7) Part=0,Of=1 Seen=0x00050003, Send=0x00000002, Size=0, ex[0]=0x00090004, ex[1]=0x00070011 ] Stewart & Xie [Page 39] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 3: Endpoint A sends a messagerequest toEndpoint Z, Endpoint Z piggy-backsclose aack. { App sends 1 datagramstream. Mandatory attributes: o endpoint-id - The IP address and port number to identify the peer endpoint to which the stream is to be closed. o stream number - The stream number to identify the stream to be closed (this should be the number returned by the Stream.Open primitive onflow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0004,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 message flow 0x11} ( cancel T2-Recvthis stream). 6. Suggested MDTP Protocol Parameter Values The following are suggested timer values for MDTP: T1-init Timer) (Cancel T3-send timer-f5) <----------------- [Header Flags=NOB|DAT|ACK (Start T2-Recv timer) Mode=REL Part=0,Of=1 Seen=0x0005 0004, Send=0x0011 0008, Size=10] (Start- 160 ms T2-receive Timer - 20 ms T3-sendtimer-f0x11) { T2-RecvTimerExpires } [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0011 0008,Size=0]------>(Cancel T3-send-f0x11)- 160 ms + Last calculated RTT for that network. The following protocol parameters are recommended: Max.Outstanding.dg - 20 messages Max.Retransmit - 10 attempts Max.Init.Retransmit - 8 attempts Min.Mcast.Time.To.Reset - 5 seconds Num.Of.Mcast.Reset.Msg - 5 messages 7. Acknowledgments The authors wish to thank Brian Wyld, A. Sankar, Henry Houh, Gary Lehecka, Ken Morneault, Lyndon Ong, and others for their very valuable comments. 8. Author's Addresses Randall R. Stewart&Tel: +1-847-632-7438 Cellular Infrastructure Group EMail: stewrtrs@cig.mot.com Motorola, Inc. 1475 W. Shure Drive, #2C-6 Arlington Heights, IL 60004 USA Qiaobing Xie[Page 40]Tel: +1-847-632-3028 Cellular Infrastructure Group EMail: xieqb@cig.mot.com Motorola, Inc. 1501 W. Shure Drive, #2309 Arlington Heights, IL 60004 USA Tom Bova Tel: +1-703-484-3331 Cisco Systems Inc. EMail: tbova@cisco.com 13615 Dulles Technology Drive Herndon, VA 20171 Suheel Hussain Tel: +1-919-472-2312 Cisco Systems Inc. EMail:ssh@cisco.com 7025 Kit Creek Road Research Triangle Park, NC 27709 Ted Krivoruchka Tel: +1-703-484-3331 Cisco Systems Inc. EMail: tedk@cisco.com 13615 Dulles Technology Drive Herndon, VA 20171 Renee Revis Tel: +1-703-472-5681 Cisco Systems Inc. EMail: drrevis@cisco.com 7025 Kit Creek Road Research Triangle Park, NC 27709 9. References [1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program Protocol Specification", RFC 791, USC/Information Sciences Institute, September 1981. [2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences Institute, August 1980. [3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/ Information Sciences Institute, September 1981. [4] Jacobson V., "Congestion Avoidance and Control", Proceedings of SIGCOMM '88, pp 314-329, August, 1988. [5] Seth, T., etc. "Performance Requirements for Signaling in InternetDraft Multi-network Datagram Transmission Protocol Apr 1999 Example 4: Endpoint A sends a multiple message to Endpoint Z, Endpoint Z piggy-backs a ackTelephony", Internet-Draft <draft-seth-sigtran-req-00.txt>, May, 1999. Appendix A: Stream-based Reliable andsendsOrdered Delivery This defines aExtended flow Ack. { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0005,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 datagram on flow 9} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0009 0004,Size=20]------> (Start T3-send timer-f9) { App sends 1 message flow 0x4} (Cancel T3-send timer-f5) <-------------- [Header Flags=NOB|DAT|ACK (Start T2-Recv timer) Mode=REL Part=0,Of=1 Seen=0x00050005,Send=0x00040004, Size=10] (Start T3-send timer-f0x4) { T2-Recv Timer Expires } (Start T3-send timer) (Cancel T3-send timer) <-------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00090004,Send=0x00000000, Size=0] { T2-Recv Timer Expires } [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0004 0004,Size=0]------>(Cancel T3-send-f0x4) Retransmissionsreliable andresends are handled per section 5 but using the flow formats (i.e. the NOB bit set) as described above. The rulesordered stream mechanism forretransmission, windowing, flow control and declaration of endpoint death are applied has definedMDTP. It is optional for implementation. A stream insection 5. NoteMDTP is defined as a sequence of user datagrams thatmessagesneeds tothe different flows are handed up ordered correctly within the flow but not delayedbe reliably delivered withrespect to anysequence preservation of its own. In otherflows transmission or retransmission. 7.3 Flow session closing The application may signal a closingwords, the delivery of aflow. If this occurs the implementation will inform its peerstream shall not be delayed because of theclosing so that resources used to track and maintain the flow can be reused/freed. The following sequence is used to release a flowlosses or re-transmissions occurred inthis example we seeother streams within theclosing of flow 5. Note itsame MDTP association. This capability isup to the sender to assure that all outstanding datagrams are acknowledged before closingaflow: Stewart & Xie [Page 41] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB|RES Mode=UNR Part=0,Of=1 Seen=0,Send=0,Size=0, flow=5, dg=0 ]------> (Start T3-send timer f-5) (Cancel T3-send timer f-5) <----------------- [Header Flags=NOB|ACK|RES Mode=UNR Part=0,Of=1 Seen=5,Send=0, Size=0, flow=0, dg=0] Datagrams receivedcritical requirement of some telephony call signaling protocols [5]. Stream datagrams are identified bya endpoint directedsetting FLO bit toa closed flow should be silently discarded. 8. Mixed Mode Data Transmission An endpoint can switch1. A.1 Stream Initiation First, an MDTP association betweenreliable and unreliable transfer modes atthe two endpoints must be initiated before anytime duringstream operation. A stream shall be initiated (opened) by thedata transfer. The following sequence illustrates such a transfer mode change,sender before datagrams can be sent inwhich both endpoints starts withtheunreliable transfer mode,stream, andthen Endpoint A switchesafter the stream is complete it shall be terminated (closed) by the user. Also, both sides of the association shall be able toreliable transfer mode. Endpoint A Endpoint Z {App sendinitiate or terminate streams independently. The sender initiates a stream by sending a Stream Initiation (NOB|UNR), using the following header format: Stream Initiation 0 1message} <------------------ [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=11201,Send=1,Size=450] .. {App send2 3 0 1message} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11201,Size=100]------> Stewart & Xie [Page 42] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 .. {App send2 3 4 5 6 7 8 9 0 1message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=451,Send=11301,Size=100]------> (Start T2-receive timer) (Start T3-send timer) {App sends2 3 4 5 6 7 8 9 0 1message} (Cancel T2-receive timer) /------- [Header Flags=DAT|ACK / Mode=UNR / Part=0,Of=1 / Seen=11401,Send=1,Size=450] (Cancel T3-send timer) <-------/ .. {App sends2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=451,Send=11401,Size=100]------> (Start T2-receive timer) (Start T3-send timer) .. {Timer T2 Expires} (Cancel T3-send timer) <------------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=11501,Send=146]C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seen = 0x0 (or Tag) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Send = 0x0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | New Stream Number | 0x0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note that in thesecond datagram sent by Endpoint A the mode is switched to reliable transfer mode (with GAR bit set). This causes Endpoint A to start its T3-send timer. When Endpoint Z receives the datagram and realizesStream Initiation, themode change, it starts its T2-receive timer. At this point, Endpoint Z also must update its Seen value to 11301. This will allow Endpoint Z to align itsSeencounterand Send shall be set to 0, and theSeen valuenumber ofthisthe new stream being initiated shall be indicated in the firstreliable datagram from Endpoint A. This prevents Endpoint Z from requesting retransmissiontwo octets of the datathat Endpoint A may not have. 9. Bundled Messages In order to increase network utilization, MDTP allows an endpoint to bundle small application messages into one single datagram for transmission. This bundled mode can be applied to both reliable and unreliable datagrams. An endpoint indicates to its peer that itfield. However, if this iscurrently in bundled Stewart & Xie [Page 43] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 mode by settingtheBUN bit infirst datagram sent out after receiving themode field. 9.1 Format of Bundled Datagram The ISB bit inInitiation Ack from theflagpeer (see section 3.1), the Seen fieldisof above Stream Initiation shall be set toindicatethecurrent datagram is bundled, i.e., it contains multiple messages. The formatTag value carried in the Initiation Ack. Upon the reception of the Stream Initiation, the peer shall respond immediately with abundled datagram is defined as follow:Stream Initiation Ack (NOB|UNR|ACK), using the following header format: Stream Initiation Ack 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Version | Flags |Mode | Version | Num OnIn Queue | | |N N W I F R DA|BA M S W R RBF G U| | | |O O I S IET AC|RC U H N EE UT L A N| | ||G|M B N B RSM TK|OK L U R 12 NC O R R| ||+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seen = Stream NumberOf Messages | Size of first message B1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | B1 octets of data | || +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Size of second message B2 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | B2 octets of data | |Send = 0x0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ \ / / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sizeof last message BL | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BL octets of data| Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Data Size in a bundled datagram indicates the actually size of the data field of the datagram, including both the bundling overhead and the actually application data. Since no fragmentation is allowed in a bundled datagram, the Part field will always be '0' and the Of field always be '1'. Stewart & Xie [Page 44] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999Thefirst two octets of the data field is a 16 bit integer indicatingfollowing example shows thenumberopening ofmessages bundled in the datagram. This is followed immediatelystream 5 bya list of bundled messages. Each bundled message starts with an integer of two octets indicating the size of the data in"A": Endpoint A Endpoint Z {App Initiates stream 5} [Header Flags=FLO|UNR Part=0,Of=1 Seen=0,Send=0,Size=0, Stream=5 ]---------------------------> (Start T3-send timer) (Cancel T3-send timer) <--------------------- [Header Flags=FLO|UNR|ACK Mode=UNR Part=0,Of=1 Seen=5,Send=0] A.2 Stream Termination For an existing stream, either side shall be allowed to terminate themessage, followedstream by sending a Stream Termination (FLO|UNR|SHU) to thedata itself. All integers inother side. Besides flag RES, The Stream Termination shall use the same header format as that used in Stream Initiation datagramshould(see A.2) A Stream Termination Ack (FLO|UNR|SHU|ACK) shall betransmitted in the network byte order. 9.2 Bundled Transfer Two protocol parameters, namely the Min.Bundle and Max.Bundle, are used to controlsent by theassembly of bundled datagrams. Ifpeer endpoint in response. The following example shows thecurrent sizetermination of stream 5 by "A": Endpoint A Endpoint Z {App terminates stream 5} [Header Flags=FLO|UNR|SHU Part=0,Of=1 Seen=0,Send=0,Size=0, Stream=5 ]---------------------> (Start T3-send timer s-5) (Cancel T3-send timer s-5) <------------ [Header Flags=FLO|UNR|SHU|ACK Part=0,Of=1 Seen=5,Send=0] Datagrams associated to abundled datagramterminated stream received by either side should be silently discarded. It issmaller than Min.Bundle, the endpoint will withholdup to thedatagram from transmission and start T4-bundle timer. If new out-bound data becomes available for transmission,side which terminates theendpoint will attemptstream tobundleassure that all outstanding user datagrams in thenew data withstream are acknowledged before thecurrent withheldtermination. A.3 Stream Datagram Transfer A.3.1 Header Format in Stream Datagrams with User Data The MDTP header in a stream datagramby usingwith user data shall have the followingrules: A) If the size of the newformat: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1 C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seen | | Stream Number | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Send | | Stream Number | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / datais greater than or equal to Min.Bundle, the current withheld datagram will be transmitted/ \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The stream number andT4-bundle timer will be canceled. Then, the new data will be transmittedsequence number ina separate datagram. B) If the size of the new data is less than Min.Bundle, but the combined size ofthecurrent datagram and the new data is greater than or equal to Max.Bundle, the current datagram willSend field shall besent andused by thenew data will be withheld assender to identify thenewcurrent stream datagram.C) If the size of the new data is less than Min.Bundle, and the combined size of the current datagram and the new data is less than Max.Bundle, the new data will be bundled intoAnd, thecurrent datagramstream number and sequence number in thebundled datagram willSeen field shall beimmediately transmitted. D) Ifused by thesizesender to acknowledgment ofthe new data is less than Min.Bundle,stream datagrams it has received. Stream number 0 andthe combined size of the current datagramsequence number 0 are reserved for special purposes and are not valid stream number or sequence number. A.3.2 Transmission of Stream Datagrams The rules of using thenew data is less than Min.Bundle, the new data will be bundled into the current datagram. AndSeen Sequence Number and Send Sequence Number are similar to those defined for normal MDTP non-stream datagram transmissions (see section 4), except that for stream transfer theT4-bundle timer will be restarted. E) If T4-bundlesequence numbers shall roll-over to 1 after 0xFFFF. Moreover, each stream maintains its individual T3-send timer, but only one global T2-receive timerexpires, the currentis maintained for all existing streams. Acknowledgment to a stream datagramwillshall either be sentimmediately. F) If the size of the new data is greater thanseparately or be piggy-backed with a stream datagram (not necessarily belonging to theMax.Bundle,same stream) traveling in thecurrent datagram will be sent. Then,opposite direction. For a separate Stream Ack, thenew dataSend field will befragmented for transmission (see 9). Stewart & Xie [Page 45] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999set to 0000:0000. The followingisshows an example ofbundled data transfer, assuming Max.Bundle=4096transmitting a stream datagram (FLO|REL|DAT) andMin.Bundle=1700:a separate Stream Ack (FLO|REL|ACK): Endpoint A Endpoint Z {App sends1 messagesfirst data on stream 5} [Header Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=5-1,Size=20]----\ (Start T3-send timer-s5) \--->(Start T2-recv) ... {T2-recv Timer Expires} (Cancel T3-send timer-s5) <--------------- [Header Flags=FLO|REL|ACK Part=0,Of=1 Seen=5-1,Send=0-0,Size=0] The following example shows the use of100 octets} (withhold and Start T4-Bundle timer) ..a piggy-backed Stream Ack. {App sends1 messages of 100 octets} (bundling into current datagram) ..new data on stream 5} [Header Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=5-4,Size=20]--------->(Start T2-recv) (Start T3-send timer-s5) ... {App sends1 messages of 100 octets} (bundling into current datagram) .. {T4-bundle timer expires}data on stream 11} (cancel T2-recv Timer) /----- [HeaderFlags=DAT|ACK Mode=GAR|BUNFlags=FLO|REL|DAT|ACK / Part=0,Of=1Seen=146,Send=1001,Size=308]-------->/ Seen=5-4,Send=11-8,Size=10] / (StartT2-receiveT3-send timer-s11) (Cancel T3-send timer-s5) <-----/ (Start T2-recv timer)(T3-send timer starts) .. {Timer T2... {T2-recv Timer Expires}(cancel T3-send) <----------------[HeaderFlags=ACK Mode=0 Part=0,Of=0 Seen=1309,Send=146] NoticeFlags=FLO|REL|ACK Part=0,Of=1 Seen=11-8,Send=0-0,Size=0]--------->(Cancel T3-send-s11) Note thatthe Data Size in thewhen piggy-back a Stream Ack with an out-bound stream datagramsent by Endpoint A is not 300 but 308. This is due to the fact that this size reflects the size ofwhen more than one streams have un-acked datagrams, thedata fieldendpoint shall choose one stream and piggy-back a Stream Ack on one of thedatagram including the bundling overhead. When the bundled datagram arrives at the receiving endpoint, each message is unbundleddatagrams, anddelivered separately toshall leave theupper level application. 10. Fragmented Messages WhenT2-recv timer running. A.3.3 Extended Stream Ack Upon thesizeexpiration ofan out-bound message exceeds the value defined inT2-recv timer, if there are more than one stream datagrams received but yet acked upon by theprotocol parameter Max.Bundle,endpoint, an Extended Stream Ack shall be used. The following defines theendpoint will fragmentheader format of themessage into smaller piecesExtended Stream Ack that acknowledges N stream datagrams received: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1 C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seen | | Stream Number #0 | Sequence Number #0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number ofsizes equal to or smaller than Max.Bundle and send each piece out in a separate datagram. TheExtra Acks = N-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Partand| Offields are used| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream Number #1 | Sequence Number #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ / / \ \ / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream Number #N-1 | Sequence Number #N-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note that an Extended Stream Ack is identified by setting the Seen field todisassemble and reassemblethefragmented message.number of extra acks carried in its data field, as shown above. Also, Extended Stream Acks shall not be piggy-backed. The following example shows thetransmissionusing ofa fragmented message (assuming Max.Bundle=4096, Min.Bundle=1700): Stewart & Xie [Page 46] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999an Extended Stream Ack (NOB|REL|ACK) by "Z": Endpoint A Endpoint Z {App sends1 messages 8544 octets long}data on stream 5} [HeaderFlags=DAT|ACK Mode=GAR|BUN Part=0,Of=3 Seen=146,Send=1001,Size=4072]------->Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=5-2,Size=20]----------> (StartT2-receive timer)T2-recv) (Start T3-send timer-s5) {App sends data on stream 9} [HeaderFlags=DAT|ACK Mode=GAR|BUN Part=1,Of=3 Seen=146,Send=5073,Size=4072]------->Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=9-4,Size=20]----------> (Start T3-send timer-s9) {App sends more data on stream 5} [HeaderFlags=DAT|ACK Mode=GAR|BUN Part=2,Of=3 Seen=146,Send=9145,Size=400]-------->Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=5-3,Size=20]----------> (Restart T3-send timer-s5) {App sends data on stream 7} [Header Flags=FLO|REL|DAT Part=0,Of=1 Seen=0-0,Send=7-11,Size=20]---------> (Start T3-sendtimer) .. {Timer T2timer-s7) ... {T2-recv Timer Expires}/-----------(Cancel T3-send timer-s5) <-------------- [HeaderFlags=ACK / Mode=0 / Part=0,Of=0 (cancel timer T3) <-----------/ Seen=9545,Send=146] Notice that Endpoint A is using the reliable transfer mode to send the fragmented message. In this mode, Endpoint Z will hold the fragments and request retransmission if a fragment is found missing, i.e., a gap is found in the received data (see 5). When all the parts of the fragmented message are received, the endpoint will re-assemble the message and dispatch it to the upper level application. It is also allowed in MDTP to send fragmented message using unreliable transfer mode. However, in unreliable mode, each fragment datagram will be dispatch to the application upon its arrival, and no retransmission will be requested even if a fragment is found missing. Bundling is prohibited if the current datagram contains a fragment of a fragmented message. 11. Non-protocol Datagrams The MDTP protocol allows an endpoint to send and receive non-protocol datagrams such as the traditional UDP datagrams. Non-protocol datagrams are detected by the absence of the MDTP protocol identifiers atFlags=FLO|REL|ACK (Cancel T3-send timer-s7) Part=0,Of=1 (Cancel T3-send timer-s9) Seen=5-3,NumExtAck=2, Size=0, ext[0]=9-4, ext[1]=7-11] A.4 Other Issues with Stream Transfer - -- Congestion control, including thebeginning ofrules for timer management and window management, shall apply to Stream Transfer thedatagram. A non-protocol transmission received by an MDTP endpoint is termedsame way asa "raw" datagram. When a raw datagram arrives, the receiving endpoint will set itself into raw mode and start sending backit does toits peer in raw modenon-Stream based transfer, aswell. Oncedefined in section 4.3. - -- When anendpointassociation isin raw modere-initialized (see section 3.4), all existing stream within that association will be automatically terminated. - -- The receiver shall silently discard any datagrams associated with apeer, only a change of operational mode by the applicationstream which has not been initiated ora reception of a MDTPhas already been terminated. - -- The same re-transmission and link rotation rules as defined in section 4 shall apply to Stream Transfer. - -- Bundled Message (see Appendix B) may be allowed in Stream Transfer, but fragmentation (see Appendix C) shall not be allowed. Appendix B: Bundled Message Transfer This defines the mechanism for bundled datagramwill bringtransport in MDTP. It is optional for implementation. Bundling is sometimes desired by theendpoint outuser when transferring small datagrams, as a way ofraw mode.improving network utilization. Inthe latter case, the Stewart & Xie [Page 47] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 endpoint will use the default MDTP operational mode predefined by the application forbundled transfer, MDTPtransmissions. Whenallows an endpointchanges from raw mode into MDTP mode, the normal MDTP initiationto bundle small application messagesmust be exchanged between the two endpoints, as described in 4. 12. Broadcast and Multicast Broadcast and multicast are supported by MDTP when the underlying transport layer supports them. Both types of transmissions are carried out in unreliable transfer mode. For broadcast datagrams, the BRO bit willinto one single datagram for transmission. This bundled mode can besetapplied to'1'both reliable andthe UNRunreliable datagrams (see Appendix E for Unreliable Delivery). Note that an endpoint shall never send bundled messages to a peer if that peer endpoint set NOB bitwill beto 1 during their association initialization (see section 3). B.1 Format of Bundled Datagram The ISB bit in the flag field is set to'0'indicate the current datagram is bundled, i.e., it contains multiple messages. The format of a bundled datagram is defined as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1 C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Total Number Of Messages=N | Message #1 Size = B1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | B1 octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message #2 Size = B2 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | B2 octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message #N Size = BN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BN octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Data_Size in a bundled datagram indicates themode field. For multicast datagrams,actually size of the data field of the datagram, including both theBRO bitbundling overhead and theUNR bit will be set to '1'. For multicast datagrams, the valueactually user data. Since no fragmentation is allowed in a bundled datagram, theSendPart field willindicatealways be '0' and thenumberOf field always be '1'. The first two octets ofmulticast datagrams transmitted bythesender. This information makes it possible fordata field is a 16 bit integer indicating thereceivernumber ofthe multicast to detect duplicated multicast datagrams and also to detect lost multicast datagrams. A multicast datagram transmission MUST use the alternate multicast header filling in both the multicast transmit to address as well as its lowest network addressmessages bundled in themulticast from address. Bundling and fragmentation are not allowed in either multicast or broadcast datagrams. 12.1 Multicast/Broadcast initialization. No initiation is needed for an endpoint to transmit multicast or broadcast datagrams. However, caution should be taken when transmitting non-protocol datagrams (i.e., datagrams with no MDTP protocol header) in multicast or broadcast transmission.current datagram. This isbecause the non-protocol datagrams may inadvertently force all the receiving endpoints of the multicast or broadcast transmission into raw mode (see 10). 12.2 Transmission of Broadcast Datagrams. When sendingfollowed immediately by abroadcast datagram, the endpoint will not take effort to prevent duplicate transmissions (this is likely to occur especially when multiple networks exist). The application at the receiving end must be prepared to handle duplicate broadcastlist of bundled messages.Stewart & Xie [Page 48] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The following isEach bundled message starts with anexampleinteger ofbroadcast datagram transmission: Endpoint A Endpoint Z {application sends 2 messages } [Header Flags=DAT Mode=BRO Part=0,Of=1 Seen=0,Send=0,Size=200]--------------> (Datagram may appear more than once.) [Header Flags=DAT Mode=BRO Part=0,Of=1 Seen=0,Send=0,Size=100]--------------> Notice that no timers are used on either end, and Seen and Send values intwo octets indicating thedatagrams are always '0'. 12.3 Transmissionsize ofMulticast Datagrams. Unlikethebroadcast transmission, when multicast datagrams are transmitteddata in thereceiving endpointsmessage, followed by the data itself. All integers in the datagram shouldtake effortbe transmitted in the network byte order. B.2 Bundled Datagram Transfer The T4-bundling timer and two protocol parameters, namely the Min.Bundle and Max.Bundle, are used toprevent duplicate copiescontrol the bundling ofdatagrams from being distributed to their applications. This is possible becauseuser datagrams. The endpoint will withhold the datagram from transmission and start T4-bundle timer, if the combined size ofmulticastall user datagrams currently pending for transmission in the out-bound buffer isusually addressed tosmaller than 'Min.Bundle'. Each time aspecial multicast network address. The receiving endpoints can thus use this multicast address in combination withnew out-bound user data becomes available for transmission, thesender's addressendpoint will attempt todetect duplicate transmissions of a multicast datagram. Thebundle the new data with the current withheld datagram by using the followingexample illustrates multicast transmissions between two endpoints. Endpoint A Endpoint Z {app multicasts a message} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=5,Size=250]--------------> (may receive morerules: A) If the size of the new data is greater thanone copy) .. {app multicastsor equal to 'Min.Bundle', the current withheld datagram will be transmitted and T4-bundle timer will be canceled. Then, the new data will be transmitted in amessage} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=6,Size=500]--------------> (may receive moreseparate datagram. B) If the size of the new data is less thanone copy) Notice'Min.Bundle', but thevaluescombined size of theSend field incurrent datagram and themulticast datagrams (which are 5new data is greater than or equal to 'Max.Bundle', the current datagram will be sent and6, respectively). They representthesequence numbersnew data will be withheld as the new current datagram. C) If the size of themulticast datagrams Endpoint A has sent out. Endpoint Z should usenew data is less than 'Min.Bundle', and theStewart & Xie [Page 49] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Send value found incombined size of theincoming multicast datagrams to detect any missing or duplicate datagrams. Duplicate datagramscurrent datagram and the new data is greater than 'Min.Bundle', but less than 'Max.Bundle', the new data will bediscardedbundled into the current datagram andno effortthe bundled datagram will bemade to retransmit lost multicast datagrams. For example, each endpoint can trackimmediately transmitted. and T4-bundle timer will be canceled. D) If thelast 32 datagrams received by using a sliding windowsize of32 bits. Each time athe newdatagram with a sequence number higherdata is less than 'Min.Bundle', and the combined size of the currentwindow headdatagram and the new data isreceived,less than Min.Bundle, thewindow cannew data will bemoved up. If a datagram received has a sequence number belowbundled into the currentwindow head, then a check ofdatagram. And thelast 32 received datagrams' sequence numbers can determine whetherT4-bundle timer will be restarted. E) If T4-bundle timer expires, thenewcurrent datagramiswill be sent immediately. F) When a T2-receive timer expires, any bundled data waiting to be transmitted should be sent immediately with aduplicate.piggy-backed Ack to acknowledge all un-acked data previously received. G) Ifthe sequence number of the new datagrama T4-bundle timer isbelow the current window tail thenrunning and data arrives, thedatagramT2-receive timer should not be started. H) A T4-bundle timer should never beconsideredcanceled unless it is being supplanted by aduplicate and discarded. 12.4 Reset of the Multicast Datagram Sequence Number If the Seen field inT3-send timer. When amulticastbundled datagram arrives at the receiving endpoint, each message issetunbundled and delivered separately to'1', it is an indication thatthesender has reset its multicast datagram sequence number.upper layer. Thereceiving endpoint, upon detecting this reset indicator infollowing are theincoming multicast datagram, should start a procedure to adoptsuggested protocol parameter values for bundled datagram transfer: T4-bundle Timer - 40 ms Min.Bundle - 1000 octets Max.Bundle - 1432 octets Appendix C: Fragmented Message Transfer This defines thenew sequence numbermechanism forerror detection. However, caution should be taken to prevent false resets due to duplicated datagrams with reset indicator propagating through multiple networks. To guarantee that all receiversfragmented datagram transport in MDTP. It is optional for implementation. When the size of an out-bound user message exceeds themulticast group adoptvalue defined in thenew sequence number,protocol parameter Max.Bundle, thereset indicator should be repeated withinendpoint shall fragment thefirst N multicast datagrams sentmessage into smaller pieces of size equal to or smaller than 'Max.Bundle' and send each piece outafter the reset. N is predefined by the protocol parameter Num.Of.Mcast.Reset.Msg. Atin a separate datagram. The "Part" and "Of" fields are used to disassemble and reassemble thereceiving endpoint, whenfragmented message. The combination of thereset indicatormaximal 'Of' value, which isdetected255, and thenew sequence numbermaximal Data Size (see section 2.2) willbe adopted.determined the maximal size of a single user message that the MDTP can send or receive in fragmented message transfer mode. However,if two reset events are detected withinan endoint shall never send fragmented datagrams to apredefined time interval (Min.Mcast.Time.To.Reset),peer if that peer set thesecond reset indicator will be ignored. Stewart & Xie [Page 50] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999NOM bit to 1 during their association initialization. The followingis anexample shows the transmission of a fragmented message (assumingNum.Of.Mcast.Reset.Msg = 4):Max.Bundle=1432, Min.Bundle=1000): Endpoint A Endpoint Z[Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=17859,Size=300]----------> `< {reset{App sends messagesequence number indicated} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=1,Size=250]--------------> (record new sequence number, datagram may appear more than once)size=3300 octets} [HeaderFlags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=2,Size=250]--------------> (may appear more than once)Flags=DAT|ACK|GAR Part=0,Of=3 Seen=3,Send=16,Size=1432]-------> (Start T2-receive timer) [HeaderFlags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=3,Size=500]--------------> (may appear more than once)Flags=DAT|ACK|GAR Part=1,Of=3 Seen=3,Send=17,Size=1432]-------> [HeaderFlags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=4,Size=500]--------------> (may appear more than once)Flags=DAT|ACK|GAR Part=2,Of=3 Seen=3,Send=18,Size=436]--------> (Start T3-send timer) .. {Timer T2 Expires} /----------- [HeaderFlags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=5,Size=100]--------------> (may appear more than once) InFlags=ACK / Mode=0 / Part=0,Of=0 (cancel timer T3) <-----------/ Seen=18,Send=4] Notice that "A" is using theabove example Endpoint Z would detectreliable transfer mode to send thereset indicator infragmented message, therefore "Z" will hold thesecond multicast datagramfragments andadopt the new sequence number whichrequest retransmission if a fragment is1. Then, it would ignore the reset indicatorfound missing, i.e., if a gap is found in thesubsequent three (3) datagrams since they arrived within a very short time interval. 13. Interface with upper level protocols The upper level protocols (ULP) shall request for services by passing primitives to MDTP and shall receive notifications from MDTP for various events. The primitivesreceived data (see ). When all the parts of the fragmented message are received, the receiving endpoint will re-assemble the message andnotifications describeddispatch it to the upper layer. It is also allowed inthisMDTP to send fragmented message using Unreliable Transfer mode (see sectionshould4.5). However, in unreliable mode, each fragment will beStewart & Xie [Page 51] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 used as a guideline for implementing MDTP. 13.1 Init.MDTP primitive This primitive allows MDTPdispatch toinitializethe application upon itsinternal data structuresarrival, andallocate necessary resources for setting up its operation environment. Note that once MDTPno retransmission will be requested even if a fragment isinitialized, ULP can communicate directly with any other endpoints without re-invoking this primitive. Mandatory attributes: None. Optional attributes: The following typesfound missing. Bundling is prohibited if the current datagram contains a fragment ofattributes may be passed along witha fragmented message. Appendix D: Multicast Datagram Transfer This defines theprimitive: o Timer selectionmechanism for unreliable transportation of multicast datagrams in MDTP. It is optional for implementation. D.1 Multicast Datagram Header Format Multicast datagrams are identified by setting MUL, UNR, andits operation syntax --DAT bits toindicate1. Two new fields are added toMDTP an alternative timerthe standard MDTPshould use for its operation. o Initial MDTP operation mode; o IP port number, if ULP wants it to be specified; 13.2 Send.Data primitive This is the main methoddatagram header tosend datagrams via MDTP. Mandatory attributes: o datasupport multicast: Multicast To Transmit address - This is thepayload ULP wants to transmit; o size - The size of the payloadmulticast address, innumber of octets; o to-addressnetwork byte order, that the sender transmitted the data to. The receiver can use this information for internal tracking purposes. Multicast From -The IPThis is the network addressand port number of(or theintended receiver. In caseIP Address of Network 1 as described in 3.2, if redundantnetworks, to-address can be any one of the multiple IP addressesnetworks exist) of thereceiver. The network which the datagram will actually be sent through will be determined by MDTP due to the link rotation, unless the current mode prohibits MDTP link rotation;sender, insuch case the datagram will be sent through thenetworkspecified by to-address (see section 4.5). Optional attributes: o mode-flagsbyte order. MDTP Header Format -This indicates a newMulticast Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTPoperation mode, taking effect immediately including the current datagram send; o contextProtocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | In Queue | | |N N W I F R D A M S W R R F G U| | | |O O I S I T A C U H N E T L A N| | | |M B N B R M T K L U R 1 C O R R| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast To Transmit address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast From -optional information that will be carried insenders base address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / data / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ For multicast datagrams, theSend.Failure notification tovalue in theULP ifSend field shall indicate thetransportationsequence number ofthis datagram fails. 13.3 Receive.Data primitivemulticast datagrams transmitted by the sender. Thisprimitive shall returninformation helps thefirst datagram inreceiver of theMDTP in-queuemulticast toULP, if there is one available. It may, depending on the specific implementation,detect duplicated multicast datagrams and alsoreturn other informations such asto detect lost multicast datagrams from thesender's Stewart & Xie [Page 52] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 address, whether theresame sender. The Seen field shall normally be set to 0, unless in some special cases stated below. Bundling and fragmentation aremore datagrams availablenot allowed in either multicast or broadcast datagrams. No initiation shall be needed forretrieval, etc.an endpoint to transmit to a multicast address. D.2 Transmission of Multicast Datagrams Thebehavior is undefined if no datagram is available when this primitive is invoked. Mandatory attributes: o buffer -following example illustrates multicast transmissions between two endpoints. Endpoint A Endpoint Z {App multicasts a message} [Header Flags=MUL|UNR|DAT Part=0,Of=1 Seen=0,Send=5,Size=250]--------------> (no Ack necessary) ... {App multicasts a message} [Header Flags=MUL|UNR|DAT Part=0,Of=1 Seen=0,Send=6,Size=500]--------------> (no Ack necessary) Notice that thememory location indicated byvalues of theULP to storeSend field in thereceived datagrammulticast datagrams (which are 5 andother information. Optional attributes: None. 13.4 Data.Arrive notification MDTP shall invoke this notification on6, respectively). They represent theULP when a datagram is successfully receivedsequence numbers of the multicast datagrams "A" has sent out. Endpoint Z should use this value to detect missing or duplicate datagrams. Duplicate datagrams will be discarded andready for retrieval. 13.5 Send.Failure notificationno effort will be made to retransmit lost multicast datagrams. D.3 Reset of the Multicast Datagram Sequence Number If the Seen field of a received multicast datagramcan not be delivered MDTP shall invokeequals to '1', thisnotification onindicates that theULP.sender has reset its multicast datagram sequence number. Thefollowing mayreceiving endpoint, upon detecting this reset indicator in the incoming multicast datagram, should start a procedure to adopt the new sequence number for error detection. However, caution should beoptionally passedtaken to prevent false resets due to duplicated datagrams with reset indicator propagating through multiple networks. To guarantee that all receivers of thenotification: o data -multicast group adopt thelocation ULP can findnew sequence number, theun-delivered datagram. o context - optional information associated with this datagram (see 13.2). 13.5 Link.Status.Change notification When a linkreset indicator should be repeated within the first N multicast datagrams sent out after the reset. N ismarked down (e.g., when MDTP detects a link failure), or marked up (r.g.,predefined by the protocol parameter 'Num.Of.Mcast.Reset.Msg'. At the receiving endpoint, whenMDTP detectsthe reset indicator is detected the new sequence number will be adopted. However, if two reset events are detected within alink recovery), MDTP shall invoke this notification onpredefined time interval (Min.Mcast.Time.To.Reset), theULP. The following shallsecond reset indicator will bepassed withignored. The suggested values for these two protocol parameters are: Min.Mcast.Time.To.Reset - 5 seconds Num.Of.Mcast.Reset.Msg - 5 messages Appendix E: Unreliable Delivery This defines the support for sending Unreliable datagrams in MDTP. It is optional for implementation. The unreliable transfer mode allows two endpoints to send to each other without acknowledging thenotification: o link-address -receiving. Thisindicatescan usually achieve higher data throughput than theIP address ofreliable transfer mode. To indicate theaffected link; o new-status - This indicatesunreliable transfer mode thenew statussender ofthe link; 13.6 Communication.Lost notification When MDTP loses communication to an endpoint completely or detects that the endpoint has performedashut-down operation, it shall invoke this notification ondatagram with user data simply sets theULP.UNR flag to 1. The following sequence illustrates unreliable data transfer. Endpoint A Endpoint Z {App sends 2 messages} [Header Flags=UNR|DAT|ACK Part=0,Of=1 Seen=0,Send=4,Size=100]--------> [Header Flags=UNR|DAT|ACK Part=0,Of=1 Seen=0,Send=5,Size=100]--------> {App sends 1 message} <------- [Header Flags=UNR|DAT|ACK Part=0,Of=1 Seen=5,Send=1,Size=450] ... {App sends 2 more messages} [Header Flags=UNR|DAT|ACK Part=0,Of=1 Seen=1,Send=6,Size=100]------> [Header Flags=UNR|DAT|ACK Part=0,Of=1 Seen=451,Send=7,Size=100]------> Note that no timers shall bepassed withstarted by either end, and that even though both ends are in Unreliable transfer mode, thenotification: o status - This indicates what typeACK flag is still set by the sender ofeventthe datagram. This means thathas occurred; o endpoint-id - The IP address and port number to identifytheendpoint; o packets-enqueue - The number and location of un-sent datagramsSeen field in the datagram header is stillholding by MDTP; Stewart & Xie [Page 53] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 o last-acked -valid to indicating the sequence number of the lastackeddatagram received by the sender. The upper layer can use this information to help detecting missing or duplicated datagrams. However, MDTP shall make no effort to detect or retransmit missing data or to screen out duplicated datagrams. E.1 Ordered Unreliable Delivery In unreliable transfer, the sender should be allowed to request ordered delivery bythat peer endpoint; o last-sent -setting the RE1 flag to 1. When Ordered Unreliable Delivery is indicated, the receiver shall order the newly arrived datagram with any datagrams it has received but yet passed to its upper layer. If it receives a datagram which is older than thesequence numberlastsentdatagram it has passed to the upper layer, thatpeer endpoint; 14. Suggested timer and MTU values. The following are suggested timer values for MDTP: T1-init Timer - 160 ms T2-receive Timer - 20 ms T3-send Timer - 160 ms T4-bundle Timer - 40 ms T5-Heart Beat - 4000 ms The following protocol parameters are recommended: Min.Bundle - 1000 octets Max.Bundle - 1432 octets Max.Retransmit - 10 attempts Max.Init.Retransmit - 8 attempts Min.Mcast.Time.To.Reset - 5 seconds Num.Of.Mcast.Reset.Msg - 5 messages 15. Acknowledgments The authors wish to thank Brian Wyld, Sankar A, Henry Houh, Gary Lehecka, Ken Morneault, Lyndon Ong, and others for their very valuable comments. 16. Author's Addresses Randall R. Stewart Tel: +1-847-632-7438 Cellular Infrastructure Group EMail: stewrtrs@cig.mot.com Motorola, Inc. 1475 W. Shure Drive, #2C-6 Arlington Heights, IL 60004 USA Qiaobing Xie Tel: +1-847-632-3028 Cellular Infrastructure Group EMail: xieqb@cig.mot.com Motorola, Inc. 1501 W. Shure Drive, #2309 Arlington Heights, IL 60004 USA Stewart & Xie [Page 54] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 17. References [1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program Protocol Specification", RFC 791, USC/Information Sciences Institute, September 1981. [2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences Institute, August 1980. [3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/ Information Sciences Institute, September 1981.datagram shall be silently discarded. This Internet Draft expires in 6 months from April 1999.Stewart & Xie [Page 55]----