view Side-By-Side changes
INTERNET-DRAFTEddie Kohlerdraft-ietf-dccp-spec-04.txtINTERNET-DRAFT UCLA/ICIR draft-ietf-dccp-spec-05.txt Mark Handley Expires: April 2004 Sally Floyd ICIR Jitendra Padhye Microsoft Research30 June 2003 Expires: December27 October 2003 Datagram Congestion Control Protocol (DCCP) Status of thisDocumentMemo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC 2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document specifies the Datagram Congestion Control Protocol (DCCP), which implements a congestion-controlled, unreliable flow of datagrams suitable for use by applications such as streamingmedia.media, Internet telephony, and on-line games. Kohler/Handley/Floyd/Padhye [Page 1] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: Changes since draft-ietf-dccp-spec-04.txt: * Rearchitected feature negotiation (Junwen Lai). * Added figures, and modified text, to the Overview section. Figures and text partly from Eric Rescorla. * New synchronziation mechanism: DCCP-Sync. * DCCP-Move: Add Mobility ID and remove Old Address and Old Port, because they wouldn't work through a NAT. * The MD5 ID Regime is now number 1. (It is still the default.) ID Regime 0 is the Null Regime. Also switch the meaning of the ID Regime feature. * Rename Drop States to Drop Codes, and renumber them. * Ignored cannot contain more option data bytes than the offending option. * Rename Service Name to Service Code (Gorry Fairhurst). * Rename Cslen/Checksum Length to CsCov/Checksum Coverage and change its values by analogy with UDP-Lite. * Be more specific about what Slow Receiver means. * Allow a textual error message in DCCP-Reset. * Mention new PMTUD, but this mention needs work. * CCID 1: Specify when acks may be sent. * Specify Request retransmission strategy. * Other changes throughout. Changes since draft-ietf-dccp-spec-03.txt: * Specify how the Loss Window is arranged. * Ignored can contain multiple bytes of option data. * Refine the tables in Section 8.5.1, on Ack Vector Consistency. Kohler/Handley/Floyd/Padhye [Page 2] INTERNET-DRAFT Expires: April 2004 October 2003 * CC mechanisms must treat Data Dropped like ECN Marked unless otherwise specified. * An MTU is mandatory (although PMTUdiscoveryismore mandatory.not), and CCIDs can affect the MTU. * Clarifications in response to reviewer comments. Changes since draft-ietf-dccp-spec-02.txt: * Identification options include the Acknowledgement Number in their hash. * Added an additional condition to accepting a packet with an invalid Sequence Number: the Acknowledgement Number must be valid, as well as the Identification options. * Explicitly allow Connection Nonces to be negotiated in other ways than the Connection Nonce feature. * Bad Moves are ignored, not reset, to avoid leaking information to attackers. Changes since draft-ietf-dccp-spec-01.txt: * Revise definition of when packets are reported as received, due to ECN Nonce verification problems with the previous definition and options. * Replace Receive Buffer Drops with Data Dropped. * Remove Data Discarded in favor of Data Dropped with Drop State 0. * Remove Buffer Closed in favor of Data Dropped with Drop State4.4 [NB: now Drop Code 1]. * Add Initial Sequence Number setting guidelines.Kohler/Handley/Floyd/Padhye [Page 2] INTERNET-DRAFT Expires: December 2003 June 2003* Add sections on retransmission of Requests, and a table to the state diagram. * Made the 4-bit Reserved field in the DCCP generic header available for use by CCIDs. * Refine description of CCID 1. * Add Middlebox Considerations. Kohler/Handley/Floyd/Padhye [Page 3] INTERNET-DRAFT Expires: April 2004 October 2003 * Change Identification option to allow middleboxes to change port numbers, DCCP options, and/or packet data without disrupting the connection. * Specify that Ignored should be sent only on packets with Acknowledgement Numbers. * Add Aggression Penalty Reset Reason. * Add Payload Checksum option. * Add Elapsed Time option (formerly specific to CCID 3). * Timestamp Echo option can omit Elapsed Time, or provide a two-byte Elapsed Time value. Elapsed Time is measured in tenths of milliseconds, not microseconds. * Clean up DCCP-Move and feature-negotiation options discussions. * Confirm(Connection Nonce) sends no data. * Ack Vector implementation supports ECN Nonce Echo. * Add CSlen and Partial Checksumming Design Motivation. * Clarify that Ack Vectors may be sent even if Use Ack Vector is false. Kohler/Handley/Floyd/Padhye [Page3]4] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . .6. . . 8 2. Design Rationale. . . . . . . . . . . . . . . . . . . .7. . . 9 3.ConceptsConventions andTerminology.Terminology . . . . . . . . . . . . . . .8. . 10 3.1.Anatomy of a DCCP ConnectionRobustness Principle . . . . . . . . . . . .8 3.2. Congestion Control. . . . . . 10 3.2. Packet Types . . . . . . . . . . .9 3.3. Connection Initiation and Termination.. . . . . . .9 3.4. Features. . . . 11 3.3. States . . . . . . . . . . . . . . . . . .10 4. DCCP Packets.. . . . . . . 11 3.4. Parts of a Connection. . . . . . . . . . . . . . .10 4.1. Examples of DCCP Congestion Control.. . . 13 4. Overview. . . . . .12 4.1.1. DCCP with TCP-like Congestion Control. . . . . .12 4.1.2. DCCP with TFRC Congestion Control. . . . . . . .14 5. Packet Formats.. . . . . . . 14 4.1. Connection Initiation and Termination. . . . . . . . . . 14 4.2. Congestion Control . . . .15 5.1. Generic Packet Header.. . . . . . . . . . . . . . .15 5.2. Sequence Number Validity16 4.2.1. CCID 2. . . . . . . . . . . . . . .18 5.3. DCCP State Diagram. . . . . . . . 16 4.2.2. CCID 3. . . . . . . . . .19 5.4. DCCP-Request Packet Format. . . . . . . . . . . . .21 5.5. DCCP-Response Packet Format.16 4.3. Features . . . . . . . . . . . .22 5.6. DCCP-Data, DCCP-Ack, and DCCP-DataAck Packet Formats. . . . . . . . . . . . 16 4.4. Example Connection . . . . . . . . . . . . .24 5.7. DCCP-CloseReq and DCCP-Close Packet Format. . . . .26 5.8. DCCP-Reset Packet Format. 18 4.5. Examples of DCCP Congestion Control. . . . . . . . . . . 19 4.5.1. DCCP with TCP-like Congestion Control . . .27 5.9. DCCP-Move Packet Format.. . . . 19 4.5.2. DCCP with TFRC Congestion Control . . . . . . . . . 21 5. Packet Formats. . .28 6. Options and Features.. . . . . . . . . . . . . . . . .30 6.1. Padding Option. . . . 22 5.1. Generic Packet Header. . . . . . . . . . . . . . . .31 6.2. Ignored Option. . 22 5.2. Sequence Number Synchronization. . . . . . . . . . . . . 27 5.2.1. Variables . . . . .31 6.3. Feature Negotiation.. . . . . . . . . . . . . . . .32 6.3.1. Feature Numbers27 5.2.2. Appropriate Sequence Numbers. . . . . . . . . . . . 28 5.2.3. Appropriate Acknowledgement Numbers . . . . . .33 6.3.2. Change Option. . 29 5.2.4. Sequence-Validity By State. . . . . . . . . . . . . 29 5.2.5. Handling Sequence-Invalid Packets . . . .33 6.3.3. Prefer Option. . . . . 31 5.2.6. Examples. . . . . . . . . . . . . .34 6.3.4. Confirm Option.. . . . . . . . 31 5.3. Extended Sequence Numbers. . . . . . . . . .34 6.3.5. Example Negotiations.. . . . . . 32 5.3.1. Transitioning to Extended Sequence Num- bers . . . . . . . .35 6.3.6. Unknown Features.. . . . . . . . . . . . . . . .35 6.3.7. State Diagram. . . 34 5.4. DCCP State Diagram . . . . . . . . . . . . . . .36 6.4. Identification Options. . . . 36 5.5. DCCP-Request Packet Format . . . . . . . . . . .39 6.4.1. Identification Regime Feature. . . . 37 5.6. DCCP-Response Packet Format. . . . . . .39 6.4.2. Connection Nonce Feature.. . . . . . . . 38 5.7. DCCP-Data, DCCP-Ack, and DCCP-DataAck Packet Formats . . . .40 6.4.3. Identification Option. . . . . . . . . . . . . .40 6.4.4. Challenge Option.. . . . . . . . . 40 5.8. DCCP-CloseReq and DCCP-Close Packet Format . . . . . . . 426.5. Init Cookie Option .5.9. DCCP-Reset Packet Format . . . . . . . . . . . . . . . .43 6.6. Timestamp Option42 5.10. DCCP-Move Packet Format . . . . . . . . . . . . . . . . 44 5.11. DCCP-Sync Packet Format . .43 6.7. Elapsed Time Option.. . . . . . . . . . . . . . 46 6. Options and Features. . .43 6.8. Timestamp Echo Option.. . . . . . . . . . . . . . .44 6.9. Loss Window Feature.. . . 47 6.1. Padding Option . . . . . . . . . . . . .45 7. Congestion Control IDs.. . . . . . . . 48 6.2. Ignored Option . . . . . . . .46 7.1. Unspecified Sender-Based Congestion Control.. . . .47 Kohler/Handley/Floyd/Padhye [Page 4] INTERNET-DRAFT Expires: December 2003 June 2003 7.2. TCP-like Congestion Control.. . . . . . . . . 48 6.3. Mandatory Option . . .48 7.3. TFRC Congestion Control.. . . . . . . . . . . . . .48 7.4. CCID-Specific Options and Features. . . 49 6.4. Feature Negotiation. . . . . . .48 8. Acknowledgements.. . . . . . . . . . . . 49 6.4.1. Value Types . . . . . . .49 8.1. Acks of Acks and Unidirectional Connections.. . . .50 8.2. Ack Piggybacking. . . . . . . . . 51 6.4.2. Feature Numbers . . . . . . . . .51 8.3. Ack Ratio Feature.. . . . . . . . . 52 6.4.3. Change L Option . . . . . . . .52 8.4. Use Ack Vector Feature. . . . . . . . . . 52 Kohler/Handley/Floyd/Padhye [Page 5] INTERNET-DRAFT Expires: April 2004 October 2003 6.4.4. Confirm L Option. . . . . .52 8.5. Ack Vector Options. . . . . . . . . . . . 53 6.4.5. Change R Option . . . . .53 8.5.1. Ack Vector Consistency.. . . . . . . . . . . . .55 8.5.2. Ack Vector Coverage53 6.4.6. Confirm R Option. . . . . . . . . . . . . . . .56 8.6. Slow Receiver Option. . 54 6.4.7. Unknown Features. . . . . . . . . . . . . . .57 8.7. Data Dropped Option.. . . 54 6.4.8. State Diagram . . . . . . . . . . . . .58 8.8. Payload Checksum Option.. . . . . . 55 6.4.9. Streamlined Negotiation . . . . . . . .60 9. Explicit Congestion Notification.. . . . . . 58 6.5. Identification Options . . . . .61 9.1. ECN Capable Feature.. . . . . . . . . . . . 58 6.5.1. Identification Regime Feature . . . .62 9.2. ECN Nonces. . . . . . . 59 6.5.2. Connection Nonce Feature. . . . . . . . . . . . . . 59 6.5.3. Identification Option .62 9.3. Other Aggression Penalties. . . . . . . . . . . . .63 10. Multihoming and Mobility. 60 6.5.4. Challenge Option. . . . . . . . . . . . . . .64 10.1. Mobility Capable Feature.. . . 61 6.6. Init Cookie Option . . . . . . . . . .64 10.2. Security.. . . . . . . . . 62 6.7. Timestamp Option . . . . . . . . . . . .64 10.3. Congestion Control State.. . . . . . . . 63 6.8. Elapsed Time Option. . . . . .65 10.4. Loss During Transition.. . . . . . . . . . . . . 63 6.9. Timestamp Echo Option. .65 11. Maximum Transfer Unit.. . . . . . . . . . . . . . . .66 12. Middlebox Considerations64 6.10. Loss Window Feature . . . . . . . . . . . . . . .67 13. Abstract API. . . 65 7. Congestion Control IDs. . . . . . . . . . . . . . . . . . .68 14. Multiplexing Issues.. 65 7.1. Unspecified Sender-Based Congestion Control . . . . . . . . . . . . . . . .68 15. DCCP and RTP. . . . . . . . . . . 66 7.2. TCP-like Congestion Control. . . . . . . . . . .69 16. Security Considerations.. . . . 67 7.3. TFRC Congestion Control. . . . . . . . . . . .70 17. IANA Considerations.. . . . . 68 7.4. CCID-Specific Options, Features, and Reset Reasons . . . . . . . . . . . . . .71 18. Design Motivation.. . . . . . . . . . . . . 68 8. Acknowledgements. . . . . .71 18.1. CSlen. . . . . . . . . . . . . . . . . 70 8.1. Acks of Acks andPartial Checksumming.Unidirectional Connections . . . . . . . . . . . .71 19. Thanks. . . . . . . . . . . . . 70 8.2. Ack Piggybacking . . . . . . . . . . . .73 20. Normative References. . . . . . . . 72 8.3. Ack Ratio Feature. . . . . . . . . . .73 21. Informative References. . . . . . . . . 72 8.4. Use Ack Vector Feature . . . . . . . .74 22. Authors' Addresses. . . . . . . . . 73 8.5. Ack Vector Options . . . . . . . . .75 23. Appendix:. . . . . . . . . . 73 8.5.1. Ack VectorImplementation Notes.Consistency. . . . . . . . . . . . . . . 7523.1. New Packets8.5.2. Ack Vector Coverage . . . . . . . . . . . . . . . . 77 8.6. Slow Receiver Option . . . . . . . . . . . . . . . . . . 7723.2. Sending Acknowledgements.8.7. Data Dropped Option. . . . . . . . . . . . . . . . . . . 7823.3. Clearing State.8.7.1. Data Dropped and Normal Congestion Response . . . . . . . . . . . . . . . . . .79 23.4. Processing Acknowledgements. . . . . . . 81 8.7.2. Particular Drop Codes . . . . .80 Kohler/Handley/Floyd/Padhye [Page 5] INTERNET-DRAFT Expires: December 2003 June 2003 1. Introduction This document specifies the Datagram. . . . . . . . . . 81 8.8. Payload Checksum Option. . . . . . . . . . . . . . . . . 82 9. Explicit CongestionControl Protocol (DCCP). DCCP provides the following features: o An unreliable flow of datagrams, with acknowledgements. o A reliable handshake for connection setup and teardown. o Reliable negotiation of options, including negotiation of a suitable congestion control mechanism. o Mechanisms allowing a server to avoid holding any state for unacknowledged connection attempts or already-finished connections. o Optional mechanisms that tell theNotification. . . . . . . . . . . . . . . 83 9.1. ECN Capable Feature. . . . . . . . . . . . . . . . . . . 83 9.2. ECN Nonces . . . . . . . . . . . . . . . . . . . . . . . 84 9.3. Other Aggression Penalties . . . . . . . . . . . . . . . 85 10. Multihoming and Mobility . . . . . . . . . . . . . . . . . . 85 10.1. Mobility Capable Feature. . . . . . . . . . . . . . . . 86 10.2. Mobility ID . . . . . . . . . . . . . . . . . . . . . . 86 10.3. Security. . . . . . . . . . . . . . . . . . . . . . . . 87 10.4. Congestion Control State. . . . . . . . . . . . . . . . 87 10.5. Loss During Transition. . . . . . . . . . . . . . . . . 87 Kohler/Handley/Floyd/Padhye [Page 6] INTERNET-DRAFT Expires: April 2004 October 2003 11. Maximum Packet Size. . . . . . . . . . . . . . . . . . . . . 88 12. Middlebox Considerations . . . . . . . . . . . . . . . . . . 90 13. Abstract API . . . . . . . . . . . . . . . . . . . . . . . . 91 14. Multiplexing Issues. . . . . . . . . . . . . . . . . . . . . 91 15. DCCP and RTP . . . . . . . . . . . . . . . . . . . . . . . . 92 16. Security Considerations. . . . . . . . . . . . . . . . . . . 93 16.1. Security Considerations for Mobility. . . . . . . . . . 94 16.2. Security Considerations for Partial Check- sums. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 17. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 95 18. Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 A. Appendix: Ack Vector Implementation Notes . . . . . . . . . . 97 A.1. Packet Arrival . . . . . . . . . . . . . . . . . . . . . 99 A.1.1. New Packets . . . . . . . . . . . . . . . . . . . . 99 A.1.2. Old Packets . . . . . . . . . . . . . . . . . . . . 100 A.2. Sending Acknowledgements . . . . . . . . . . . . . . . . 101 A.3. Clearing State . . . . . . . . . . . . . . . . . . . . . 102 A.4. Processing Acknowledgements. . . . . . . . . . . . . . . 103 B. Appendix: Design Motivation . . . . . . . . . . . . . . . . . 104 B.1. CsCov and Partial Checksumming . . . . . . . . . . . . . 104 Normative References . . . . . . . . . . . . . . . . . . . . . . 105 Informative References . . . . . . . . . . . . . . . . . . . . . 106 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 107 Kohler/Handley/Floyd/Padhye [Page 7] INTERNET-DRAFT Expires: April 2004 October 2003 1. Introduction This document specifies the Datagram Congestion Control Protocol (DCCP). DCCP provides the following features: o An unreliable flow of datagrams, with acknowledgements. o A reliable handshake for connection setup and teardown. o Reliable negotiation of options, including negotiation of a suitable congestion control mechanism. o Mechanisms allowing a server to avoid holding any state for unacknowledged connection attempts or already-finished connections. o Optional mechanisms that tell the sender, withhigh reliability,high reliability, which packets reached the receiver, and whether those packets were ECN marked, corrupted, or dropped in the receive buffer. o Congestion control incorporating Explicit Congestion Notification (ECN) and the ECN Nonce, as per [RFC 3168] and [ECN NONCE]. o Path MTU discovery, as per [RFC 1191]. DCCP is intended for applications that require the flow-based semantics of TCP, but which do not want TCP's in-order delivery and reliability semantics, or which would like different congestion control dynamics than TCP. Similarly, DCCP is intended for applications that do not require features of SCTP [RFC 2960] such as sequenced delivery within multiple streams. Applications that could make use of DCCP include those with timing constraints on the delivery of data such that reliable in-order delivery, when combined with congestion control, is likely to result in some information arriving at the receiver after it is no longer of use. Such applications might include streaming media and Internet telephony. To date most such applications have used either TCP, with the problems described above, or used UDP and implemented their own congestion control mechanisms (or no congestion control at all). The purpose of DCCP is to provide a standard way to implement congestion control and congestion control negotiation for such applications. One of the motivations for DCCP is to enable the use of ECN, along with conformant end-to-end congestion control, for applications that would otherwise be using UDP. In addition, DCCP implements reliable connection setup, teardown, and feature Kohler/Handley/Floyd/Padhye Section 1. [Page 8] INTERNET-DRAFT Expires: April 2004 October 2003 negotiation. A DCCP connection contains acknowledgement traffic as well as data traffic. Acknowledgements inform a sender whether its packets arrived, and whether they were ECN marked. Acks are transmitted as reliably as the congestion control mechanism in use requires, possibly completely reliably. 2. Design Rationale DCCP is intended to be used by applications that currently use UDP without end-to-end congestion control. The desire is for many applications to have little reason not to use DCCP instead of UDP, once DCCP is deployed. Thus, DCCP was designed to have as little overhead as possible, in terms both of the size of the packet header and in terms of the state and CPU overhead required at the end hosts. This desire for minimal overhead results in the design decision to include only the minimal necessary functionality in DCCP, leaving other functionality, such as FEC or semi-reliability, to be layered on top of DCCP as desired. The desire for minimal overhead is also one of the reasons to propose DCCP instead of just proposing an unreliable version of SCTP for applications currently using UDP. Different forms of conformant congestion control are appropriate for different applications, and a second motivation behind the design of DCCP is to allow applications to choose between several forms of congestion control. One choice, TCP-like Congestion Control, halves the congestion window in response to a packet drop or mark, as in TCP. Applications using this congestion control mechanism will respond quickly to changes in available bandwidth, but must be able to tolerate the abrupt changes in congestion window typical of TCP. A second alternative, TCP-Friendly Rate Control (TFRC), a form of equation-based congestion control, minimizes abrupt changes in the sending rate while maintaining longer-term fairness with TCP. TCP- like Congestion Control is appropriate for applications such as on- line games that want to make use of all the available bandwidth quickly, but can tolerate rapid reductions in rate without serious consequences. TFRC is more appropriate for applications such as streaming media, where rapid rate changes cause unacceptable UI glitches (audible pauses or clicks in the playout stream, for example). These applications would prefer to give up on rapid consumption of available bandwidth in favor of a steadier rate. DCCP also allows unreliable traffic to use ECN safely. A UDP kernel API might not allow applications to set UDP packets as ECN-capable, since the API could not guarantee the application would properly Kohler/Handley/Floyd/Padhye Section 2. [Page 9] INTERNET-DRAFT Expires: April 2004 October 2003 detect or respond to congestion. DCCP kernel APIs will have no such issues, since DCCP itself implements congestion control. In proposing a new transport protocol, it is necessary to justify the design decision not to require the use of the Congestion Manager, as well as the design decision to add a new transport protocol to the current family of UDP, TCP, and SCTP. The Congestion Manager [RFC 3124] allows multiple concurrent streams between the same sender and receiver to share congestion control. However, the current Congestion Manager can only be used by applications that have their own end-to-end feedback about packet losses, and this is not the case for many of the applications currently using UDP. In addition, the current Congestion Manager does not lend itself to the use of forms of TFRC where the state about past packet drops or marks is maintained at the receiver rather than at the sender. While DCCP should be able to make use of CM where desired by the application, we do not see any benefit in making the deployment of DCCP contingent on the deployment of CM itself. 3. Conventions and Terminology Each DCCP connection runs between two endpoints, which we often name DCCP A and DCCP B. Data may pass over the connection in either or both directions. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]. All multi-byte numerical quantities in DCCP, such as Sequence Numbers and arguments to options, are transmitted in network byte order (most significant byte first). We occasionally refer to the "left" and "right" sides of a bit field. "Left" means towards the most significant bit, and "right" means towards the least significant bit. Reserved bitfields in DCCP packet headers MUST be ignored by receivers, and MUST be set to zero by senders, unless otherwise specified. 3.1. Robustness Principle DCCP implementations should follow TCP's "general principle of robustness": be conservative in what you do, be liberal in what you accept from others. Kohler/Handley/Floyd/Padhye Section 3.1. [Page 10] INTERNET-DRAFT Expires: April 2004 October 2003 3.2. Packet Types DCCP has ten different packet types. The DCCP-Request and DCCP-Response packets are used in connection initiation, and the DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets are used in connection termination, as described in Section 4.1. The other five packet types are as follows: DCCP-Data Used to transmit data. It carries no acknowledgement information. DCCP-Ack Used for pure acknowledgements. DCCP-DataAck Used for piggybacked data-plus-acknowledgements. DCCP-Move Supports multihoming and mobility. DCCP-Sync Used to resynchronize sequence numbers after a large burst of loss. All of these packets except for DCCP-DataAck, DCCP-Move, and DCCP- Sync are shown in the example diagram below. 3.3. States DCCP endpoints progress through different states during the course of a connection. The figure below shows the typical progress through these states for a client and server. Kohler/Handley/Floyd/Padhye Section 3.3. [Page 11] INTERNET-DRAFT Expires: April 2004 October 2003 Client State: Server State: ------------- ------------- CLOSED LISTEN REQUEST DCCP-Request -> <- DCCP-Response RESPOND OPEN DCCP-Ack -> <- DCCP-Data OPEN DCCP-Ack -> <- DCCP-CloseReq CLOSEREQ CLOSING DCCP-Close -> <- DCCP-Reset CLOSED TIME-WAIT CLOSED The client and server's typical progress through states. CLOSED Represents a nonexistent connection. LISTEN Represents a server socket in the passive listening state. LISTEN and CLOSED are not associated with any particular DCCP connection. REQUEST The client socket enters this state, from CLOSED, after sending a DCCP-Request packet to try to initiate a connection. RESPOND A server socket enters this state, from LISTEN, after receiving a DCCP-Request from a client. OPEN The central, data transfer portion of a DCCP connection. Client and server enter into this state from REQUEST and RESPOND, respectively. Sometimes we speak of SERVER-OPEN and CLIENT-OPEN states, corresponding to the server's OPEN state and the client's OPEN state. CLOSEREQ A server socket enters this state, from SERVER-OPEN, to signal that the connection is over, but the client must hold Time-Wait state. CLOSING Either server or client can enter this state to close the connection. Kohler/Handley/Floyd/Padhye Section 3.3. [Page 12] INTERNET-DRAFT Expires: April 2004 October 2003 TIME-WAIT A socket remains in this state for 2MSL after the connection has been torn down, to prevent mistakes due to the delivery of old packets. 3.4. Parts of a Connection The DCCP connection between DCCP A and DCCP B consists of four sets of packets, as follows: (1) Data packets from DCCP A to DCCP B. (2) Acknowledgements from DCCP B to DCCP A. (3) Data packets from DCCP B to DCCP A. (4) Acknowledgements from DCCP A to DCCP B. These four subflows are grouped into two half-connections, illustrated as follows: +--------+ A-to-B half-connection: +--------+ | | + - - - - - - - - - - - - - - - - - - - + | | | | | (1) | | | | | data packets --> | | | | | (2) | | | | | <-- acknowledgements | | | | + - - - - - - - - - - - - - - - - - - - + | | | DCCP A | | DCCP B | | | B-to-A half-connection: | | | | + - - - - - - - - - - - - - - - - - - - + | | | | | (3) | | | | | <-- data packets | | | | | (4) | | | | | acknowledgements --> | | +--------+ + - - - - - - - - - - - - - - - - - - - + +--------+ We use the following terms to refer to subsets and endpoints of a DCCP connection. Subflows A subflow consists of either data or acknowledgement packets, sent in one direction. Each of the four sets of packets above is a subflow. (Subflows may overlap to some extent, since acknowledgements may be piggybacked on data packets.) Kohler/Handley/Floyd/Padhye Section 3.4. [Page 13] INTERNET-DRAFT Expires: April 2004 October 2003 Sequences A sequence consists of all packets sent in one direction, regardless of whether they are data or acknowledgements. The sets 1+4 and 2+3, above, are sequences. Each packet on a sequence has a different sequence number. Half-connections A half-connection consists of the data packets sent in one direction, plus the corresponding acknowledgements. The sets 1+2 and 3+4, above, are half-connections. Half-connections are named after the direction of data flow, so the A-to-B half- connection contains the data packets from A to B and the acknowledgements from B to A. HC-Sender and HC-Receiver In the context of a single half-connection, the HC-Sender is the endpoint sending data, while the HC-Receiver is the endpoint sending acknowledgements. For example, in the A-to-B half- connection, DCCP A is the HC-Sender and DCCP B is the HC- Receiver. 4. Overview 4.1. Connection Initiation and Termination Every DCCP connection is actively initiated by one DCCP, which connects to a DCCP socket in the passive listening state. We refer to the active endpoint as "the client" and the passive endpoint as "the server". Client Server ------ ------ DCCP-Request -> [Ports, service, features] <- DCCP-Response [Features, cookie] DCCP-Ack -> [Features, cookie] DCCP connection initiation. In the DCCP-Request message, the client tells the server the ports it wants to communicate on and possibly the Service Code of the service it wants to talk to. The DCCP-Request message also starts feature negotiation, which, for pedagogical reasons, we will present separately in the next section. Kohler/Handley/Floyd/Padhye Section 4.1. [Page 14] INTERNET-DRAFT Expires: April 2004 October 2003 In the DCCP-Response message, the server tells the client that it is willing to accept the connection and continues feature negotiation. In order to prevent SYN-flood style DOS attacks, DCCP incorporates a cookie exchange: The server can provide the client with a cookie that contains all the negotiation state. This cookie must be echoed by the client in the DCCP-Ack, thus removing the need for the server to keep state. In the DCCP-Ack message, the client acknowledges the DCCP-Response and returns the cookie to permit the server to complete its side of the connection. This message may also include feature negotiation messages. DCCP does not support TCP-style simultaneous open. In particular, a host MUST NOT respond to a DCCP-Request packet with a DCCP-Response packet unless the destination port specified in the DCCP-Request corresponds to a local socket opened for listening. This preserves the invariant that every connection has one client and one server. The server sends a DCCP-CloseReq packet to the client to ask it to close the connection with a DCCP-Close. The server sends DCCP- CloseReq, rather than DCCP-Close, when it wants the client to hold Time-Wait state for the connection. Only the server may generate a DCCP-CloseReq packet. This means that the client cannot force the server to maintain connection state after the connection is closed. An endpoint sends a DCCP-Close packet to request that the other endpoint tear down the connection via DCCP-Reset. Every explicitly- terminated connection ends with a DCCP-Reset packet. The receiver of DCCP-Reset holds Time-Wait state for the connection. DCCP-Reset is sent in response to DCCP-Close during normal connection termination, or due to some inappropriate protocol event. Client Server ------ ------ <- DCCP-CloseReq DCCP-Close -> <- DCCP-Reset DCCP connection termination. DCCP shuts down both half-connections as a unit; it has no states analogous to TCP's FINWAIT and CLOSEWAIT states, where one TCP "half-connection" is closed and the other remains open. However, DCCP implementations SHOULD allow applications to declare that they are no longer interested in receiving data. This would allow DCCP implementations to streamline state for certain half-connections. Kohler/Handley/Floyd/Padhye Section 4.1. [Page 15] INTERNET-DRAFT Expires: April 2004 October 2003 See Section 8.7, on the Data Dropped option---and particularly its Drop Code 1---for more information. 4.2. Congestion Control Each half-connection is managed by a congestion control mechanism named by a single-byte congestion control identifier, or CCID. The CCID for a half-connection describes how the HC-Sender limits data packet rates; how it maintains necessary parameters, such as congestion windows; how the HC-Receiver sends congestion feedback via acknowledgements; and how it manages the acknowledgement rate. The endpoints negotiate their CCIDs at connection setup; the CCIDs for the two half-connections need not be the same. Section 7 introduces the currently allocated CCIDs, which are defined in separate profile documents. 4.2.1. CCID 2 CCID 2's congestion control is extremely similar to that of TCP. The sender maintains a congestion window and sends packets until that window is full. Packets are acknowledged by the receiver. Dropped packets and ECN [RFC 3168] are used to indicate congestion. The response to congestion is to halve the congestion window. One subtle diference between DCCP and TCP is that the acknowledgements in DCCP contain the sequence numbers of all received packets within a given window, not just the highest sequence number as in TCP's cumulative ackowledgement. 4.2.2. CCID 3 CCID 3 is an equation-based form of congestion control which is intended to provide a smoother response to congestion than CCID 2. The sender maintains a "transmit rate". The receiver sends acknowledgement packetsreachedwhich also contain information about the receiver's estimate of packet loss. The sender uses this information to update its transmit rate. Although CCID 3 behaves somewhat differently from TCP in its short term congestion response, it is designed to operate fairly with TCP over the long term. 4.3. Features In DCCP, feature negotiation is performed by attaching options to other DCCP packets. Thus feature negotiation can be piggybacked on any other DCCP message. This allows feature negotiation during connection initiation as well as feature renegotiation during data flow. Kohler/Handley/Floyd/Padhye Section 4.3. [Page 16] INTERNET-DRAFT Expires: April 2004 October 2003 DCCP features are one-sided. Thus, it's possible to have a different congestion control regime for data sent from client to server than from server to client. The endpoint in charge of a particular feature is called its feature location; the other endpoint is called the feature remote. Feature negotiation is done with the Change L, Confirm L, Change R, and Confirm R options, with the "L" options sent by the feature location, and "R" options sent by the feature remote. A Change R message says to the peer "change this option setting on your side". The peer responds with a Confirm L, meaning "I've changed it". Some sample exchanges follow: Client Server ------ ------ Change R(CCID, 2) -> <- Confirm L(CCID, 2) * agreement that (CCID,Server) = 2 * In this exchange, the peers agree to set thereceiver, and whether those packets were ECN marked, corrupted,server's CCID to 2. Client Server ------ ------ Change R(CCID, 3 4) -> <- Confirm L(CCID, 4, 4 2) * agreement that (CCID,Server) = 4 * In this exchange, the client requests CCID value 3 ordropped in4 for thereceive buffer. o Congestion control incorporating Explicit Congestion Notification (ECN) andserver's CCID, with 3 preferred. Note that theECN Nonce,client can offer multiple values. The server chooses 4, giving its preference list of "4 2". If a party wants to change one of his own options, he issues a "Change L", asper [RFC 3168]shown below. Client Server ------ ------ <- Change L(CCID, 3 2) Confirm R(CCID, 3, 3 2) -> * agreement that (CCID, Server) = 3 * In this example, the server requests CCID value 3 or 2 for the server's CCID, with 3 preferred, and[ECN NONCE]. o Path MTU discovery, as per [RFC 1191].the client agrees. Retransmissions make feature negotiation reliable. Section 6.4 describes these options further. Kohler/Handley/Floyd/Padhye Section 4.3. [Page 17] INTERNET-DRAFT Expires: April 2004 October 2003 4.4. Example Connection The progress of a typical DCCP connection isintended for applicationsas follows. (This description is informative, not normative.) Client Server ------ ------ (1) DCCP-Request -> <- (2) DCCP-Response (3) DCCP-Ack -> (5) DCCP-Data -> <- (5) DCCP-Ack <- (5) DCCP-Data (5) DCCP-Ack -> <- (6) DCCP-CloseReq (7) DCCP-Close -> <- (8) DCCP-Reset Typical DCCP Connection. (1) The client sends the server a DCCP-Request packet specifying the client and server ports, the service being requested, and any features being negotiated, including the CCID that the client would like the server to use. The client may optionally piggyback some data on the DCCP-Request packet---an application- level request, say---which the server may ignore. (2) The server sends the client a DCCP-Response packet indicating thatrequireit is willing to communicate with theflow-based semantics of TCP, but which do not want TCP's in-order deliveryclient. The response indicates any features andreliability semantics,options that the server agrees to, begins or continues other feature negotiations if desired, and optionally includes an Init Cookie that wraps up all this information and whichwould like different congestion control dynamics than TCP. Similarly, DCCP is intendedmust be returned by the client forapplicationsthe connection to complete. (3) The client sends the server a DCCP-Ack packet thatdo not require features of SCTP [RFC 2960] suchacknowledges the DCCP-Response packet. This acknowledges the server's initial sequence number and returns the Init Cookie if there was one in the DCCP-Response. It may also continue feature negotiation. (4) Next comes zero or more DCCP-Ack exchanges assequenced delivery within multiple streams. Applications that could make use of DCCP include those with timing constraintsrequired to finalize feature negotiation. The client may piggyback an application-level request on its final ack, producing a DCCP- DataAck packet. Kohler/Handley/Floyd/Padhye Section 4.4. [Page 18] INTERNET-DRAFT Expires: April 2004 October 2003 (5) The server and client then exchange DCCP-Data packets, DCCP-Ack packets acknowledging that data, and, optionally, DCCP-DataAck packets containing piggybacked data and acknowledgements. If thedelivery ofclient has no datasuch that reliable in-order delivery, when combined with congestion control, is likelytoresult in some information arriving atsend, then the server will send DCCP- Data and DCCP-DataAck packets, while the client will send DCCP- Acks exclusively. (6) The server sends a DCCP-CloseReq packet requesting a close. (7) The client sends a DCCP-Close packet acknowledging thereceiver after itclose. (8) The server sends a DCCP-Reset packet whose Reason field isno longer of use. Such applications might include streaming mediaset to "Closed", andInternet telephony. To date most such applications have used eitherclears its connection state. In DCCP, unlike TCP,withResets are part of normal connection termination; see Section 5.9. (9) The client receives theproblems described above, or used UDPDCCP-Reset packet andimplemented their own congestion control mechanisms (or no congestion control at all). The purposeholds state for a reasonable interval ofDCCP istime toprovideallow any remaining packets to clear the network. An alternative connection closedown sequence is initiated by the client: (6) The client sends astandard wayDCCP-Close packet closing the connection. (7) The server sends a DCCP-Reset packet with Reason field set toimplement congestion control"Closed" andcongestion control negotiation for such applications. One ofclears its connection state. (8) The client receives themotivationsDCCP-Reset packet and holds state forDCCP isa reasonable interval of time toenableallow any remaining packets to clear theusenetwork. This arrangement ofECN, alongsetup and teardown handshakes permits the server to decline to hold any state until the handshake withconformant end-to-end congestion control, for applicationsthe client has completed, and ensures thatwould otherwise be using UDP. In addition,the client must hold the Time-Wait state at connection closedown. 4.5. Examples of DCCP Congestion Control Before giving the detailed specifications of DCCP, we present two more detailed examples showing DCCP congestion control in operation. Again, these examples are informative, not normative. 4.5.1. DCCPimplements reliablewith TCP-like Congestion Control The first example is of a connectionsetup, teardown, and feature negotiation.where both half-connections use TCP-like Congestion Control, specified by CCID 2 [CCID 2 PROFILE]. In this example, the client sends an application-level request to Kohler/Handley/Floyd/Padhye Section1.4.5.1. [Page6]19] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003A DCCP connection contains acknowledgement traffic as well as data traffic. Acknowledgements inform a sender whether its packets arrived,the server, andwhether they were ECN marked. Acks are transmitted as reliably asthecongestion control mechanism in use requires, possibly completely reliably. Previous draftsserver responds with a stream ofthis specification called the protocol DCP, or Datagram Control Protocol.data packets. This example is of a connection using ECN. (1) Thename was changed to makeclient sends theacronym sound less like "TCP". 2. Design Rationale DCCP is intendedDCCP-Request, which includes a Change R option asking the server tobe used by applications that currentlyuseUDP without end-to-end congestion control. The desire isCCID 2 formany applications to have little reason not to use DCCP instead of UDP, once DCCP is deployed. Thus, DCCP was designed to have as little overhead as possible, in terms both ofthesize of the packet headerserver's data packets, andin terms ofa Change L option informing thestate and CPU overhead required atserver that theend hosts. This desireclient would like to use CCID 2 forminimal overhead results inthedesign decisionits data packets. (2) The server sends a DCCP-Response, including a Confirm L option indicating that the server agrees to use CCID 2 for its data packets, and a Confirm R option indicating that the server agrees toinclude onlytheminimal necessary functionality in DCCP, leaving other functionality, such as FEC or semi-reliability, to be layered on topclient's suggestion ofDCCP as desired. The desireCCID 2 forminimal overhead is also one ofthereasons to propose DCCP instead of just proposingclient's data packets. (3) The client responds with a DCCP-DataAck acknowledging the server's initial sequence number, and including anunreliable version of SCTPapplication- level request forapplications currently using UDP. A second motivation behinddata. We will not discuss thedesignclient-to- server half-connection further in this example. (4) The server sends DCCP-Data packets, where the number ofDCCPpackets sent isto allow applications to choose an alternative to the current TCP-style congestion control that halves the congestion window in response togoverned by a congestionindication. DCCP lets applications choose between several formswindow, as in TCP. The details ofcongestion control. One choice, TCP-like congestion control, halvesthe congestion window are defined inresponse tothe profile for CCID 2, which is a separate document [CCID 2 PROFILE]. The server also sends Change R(Ack Ratio) feature options specifying the number of server data packets to be covered by an Ack packetdropfrom the client. Each DCCP-Data packet is sent as ECN-Capable, with either the ECT(0) ormark,the ECT(1) codepoint set, as described inTCP. A second alternative, TFRC (TCP- Friendly Rate Control,[ECN NONCE]. (5) The client sends aform of equation-based congestion control), minimizes abrupt changesDCCP-Ack packet acknowledging the data packets for every Ack Ratio data packets transmitted by the server. Each DCCP-Ack packet uses a sequence number and contains an Ack Vector, as defined in Section 8 on Acknowledgements. These packets also include Confirm L options answering any Ack Ratio requests from thesending rate while maintaining longer-term fairnessserver. The DCCP-Acks are also sent as ECN-Capable, withTCP. In proposing a new transport protocol, it is necessary to justifyeither ECT(0) or ECT(1). The client's Ack Vector echoes thedesign decision not to requireaccumulated ECN Nonce for theuse ofserver's packets. (6) The server must occasionally acknowledge theCongestion Manager, as well asclient's acknowledgements, so thedesign decision to add a new transport protocol toclient can clean its acknowledgement state. It can do so by sending separate DCCP-Acks as allowed by CCID 2, or by piggybacking acknowledgement information on its data packets with thecurrent family of UDP, TCP, and SCTP.DCCP-DataAck packet type. TheCongestion Manager [RFC3124] allows multiple concurrent streams betweenacknowledgement information may contain detailed Ack Vectors, Kohler/Handley/Floyd/Padhye Section 4.5.1. [Page 20] INTERNET-DRAFT Expires: April 2004 October 2003 like thesame sender and receiver to share congestion control. However,client's acknowledgements; but if thecurrent Congestion Managerclient is sending nothing but acknowledgements, the server's acks-of-acks canonlybeusedmore lightweight. See Section 8.1 for more information. Like the server's DCCP-Data packets, the server's DCCP-DataAck and DCCP-Ack packets are sent as ECN-Capable. (7) The server continues sending DCCP-Data packets as controlled byapplications that have their own end-to-end feedbackthe congestion window. Upon receiving DCCP-Ack packets, the server examines the Ack Vector to learn aboutpacket losses,marked or dropped data packets, and adjusts its congestion window accordingly, as described in [CCID 2 PROFILE]. Because this isnot the case for many of the applications currently using UDP. In addition,unreliable transfer, thecurrent Congestion Manager Kohler/Handley/Floyd/Padhye Section 2. [Page 7] INTERNET-DRAFT Expires: December 2003 June 2003server does notlend itself to theretransmit dropped packets. (8) Because DCCP-Ack packets useof forms of TFRC wheresequence numbers, thestateserver has direct information aboutpast packet dropsthe fraction of loss ormarks is maintained atmarked DCCP-Ack packets. [CCID 2 PROFILE] defines how thereceiver rather than atserver modifies thesender. While DCCP should be ableclient's Ack Ratio in response tomake use of CM where desired by the application, we do not seeanybenefit in making the deployment of DCCP contingentcongestion on thedeployment of CM itself. 3. Concepts and Terminologyacknowledgement stream. (9) Thekey words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",server estimates round-trip times and"OPTIONAL"calculates a TimeOut (TO) value much as the RTO (Retransmit Timeout) is calculated in TCP. Again, the specification for thisdocument areis in [CCID 2 PROFILE]. The TO is used to determine when a new DCCP-Data packet can beinterpreted as described in [RFC 2119]. All multi-byte numerical quantities in DCCP, such as Sequence Numberstransmitted when the server has been limited by the congestion window andargumentsno feedback has been received from the client. (10) The DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets tooptions, are transmitted in network byte order (most significant byte first). 3.1. Anatomy of a DCCP Connection Each DCCPclose the connectionruns between two endpoints, which we often name DCCP A and DCCP B. Data may pass overare as in the example above. 4.5.2. DCCP with TFRC Congestion Control This example is of a connectionin either orwhere bothdirections.half-connections use TFRC Congestion Control, specified by CCID 3 [CCID 3 PROFILE]. (1) TheDCCP connection between DCCP ADCCP-Request andDCCP B consists of four sets of packets, as follows: (1) Data packets from DCCP A to DCCP B. (2) Acknowledgements from DCCP B to DCCP A. (3) DataDCCP-Response packetsfrom DCCP B to DCCP A. (4) Acknowledgements from DCCP A to DCCP B. Wespecifying the use of CCID 3 and thefollowing terms to referinitial DCCP-DataAck packet are similar tosubsets and endpoints of a DCCP connection. Subflows A subflow consists of either data or acknowledgement packets, sentthose inone direction. Each ofthefour sets of packets above is a subflow. (Subflows may overlap to some extent, since acknowledgements may be piggybacked on data packets.) Sequences A sequence consistsCCID 2 example above. (2) The server sends DCCP-Data packets, where the number ofallpackets sent is governed by an allowed transmit rate, as inone direction, regardless of whether they are data or acknowledgements.TFRC. Thesets 1+4 and 2+3, above,details of the allowed transmit rate aresequences.defined in the profile for CCID 3, which is a separate document [CCID 3 PROFILE]. Each DCCP-Data packeton a sequencehas adifferentsequencenumber.number and a window counter value. Kohler/Handley/Floyd/Padhye Section3.1.4.5.2. [Page8]21] INTERNET-DRAFT Expires:December 2003 June 2003 Half-connections A half-connection consists of the data packets sent in one direction, plus the corresponding acknowledgements. The sets 1+2 and 3+4, above, are half-connections. Half-connections are named after the directionApril 2004 October 2003 Some of these dataflow, so the A-to-B half-connection contains the datapackets are DCCP-DataAck packets acknowledging packets fromA to B andtheacknowledgements from B to A. HC-Sender and HC-Receiver Inclient, but for simplicity we will not discuss thecontexthalf-connection ofa single half-connection, the HC-Sender is the endpoint sending data, whiledata from theHC-Receiver isclient to theendpoint sending acknowledgements. For example,server inthe A-to-B half- connection, DCCP A is the HC-Sender and DCCP B is the HC- Receiver. 3.2.this example. The use of ECN follows TCP-like CongestionControl Each half-connectionControl, above, and ismanaged by a congestion control mechanism.described further in [CCID 3 PROFILE]. (3) Theendpoints negotiate these mechanismsreceiver sends DCCP-Ack packets atconnection setup; the mechanisms forleast once per round-trip time acknowledging thetwo half-connections need not bedata packets, unless thesame. Conformant congestion control mechanisms correspond to single-byte congestion control identifiers, or CCIDs. The CCID forserver is sending at ahalf- connection describes how the HC-Sender limits datarate of less than one packetrates; how it maintains necessary parameters, suchper RTT, ascongestion windows; how the HC-Receiver sends congestion feedback via acknowledgements;specified by [CCID 3 PROFILE]. These acknowledgements may be piggybacked on data packets, producing DCCP-DataAck packets. Each DCCP-Ack packet uses a sequence number andhow it managesidentifies theacknowledgement rate. Section 7 introducesmost recent packet received from thecurrently allocated CCIDs, which are defined in separate profile documents. 3.3. Connection Initiation and Termination Every DCCP connection is actively initiated by one DCCP, which connects to a DCCP socket inserver. Each DCCP-Ack packet includes feedback about thepassive listening state. We refer toloss event rate calculated by theactive endpointclient, as"the client" and the passive endpointspecified by [CCID 3 PROFILE]. (4) The server continues sending DCCP-Data packets as"the server". Most ofcontrolled by theDCCP specification is indifferent to whether a DCCP is client or server. However, onlyallowed transmit rate. Upon receiving DCCP-Ack packets, the servermay generateupdates its allowed transmit rate as specified by [CCID 3 PROFILE]. (5) The server estimates round-trip times and calculates aDCCP-CloseReq packet. (A DCCP-CloseReq packet forcesTimeOut (TO) value much as thereceiving DCCP to closeRTO (Retransmit Timeout) is calculated in TCP. Again, theconnection and maintain connection statespecification fora reasonable time, allowing oldthis is in [CCID 3 PROFILE]. (6) The DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets toclear the network.) This means that the client cannot forceclose theserver to maintainconnectionstate afterare as in theconnection is closed.examples above. 5. Packet Formats 5.1. Generic Packet Header All DCCPdoes not support TCP-style simultaneous open. In particular, a host MUST NOT respond to a DCCP-Request packetpackets begin with aDCCP-Responsegeneric DCCP packetunless the destination port specified in the DCCP-Request corresponds to a local socket opened for listening. This preserves the invariant that every connection has one client and one server.header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type |X|# NDP| Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Kohler/Handley/Floyd/Padhye Section3.3.5.1. [Page9]22] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003DCCP shuts down both half-connections as a unit; it has no states analogous to TCP's FINWAITSource andCLOSEWAIT states, where oneDestination Ports: 16 bits each These fields identify the connection, similar to the corresponding fields in TCP"half-connection" is closedand UDP. The Source Port represents the relevant port on the endpoint that sent this packet, the Destination Port the relevant port on the otherremains open. However,endpoint. Data Offset: 8 bits The offset from the start of the DCCPimplementations SHOULD allow applicationsheader todeclare that they are no longer interestedthe beginning of the packet's payload, measured inreceiving data.32-bit words. CCVal: 4 bits Thiswould allowfield is reserved for use by the sending CCID. In particular, the A-to-B CCID's sender, which is active at DCCPimplementationsA, MAY send information tostreamline state for certain half-connections. See Section 8.7, ontheData Dropped option---and particularly its Drop State 4---for more information. 3.4. Featuresreceiver at DCCPuses a generic mechanismB by encoding that information in CCVal. If the relevant CCID does not specify its value, it MUST be set tonegotiate connection properties, suchzero. Checksum Coverage (CsCov): 4 bits The Checksum Coverage field specifies what parts of the packet are covered by the Checksum field, as follows: CsCov = 0 Checksum covers theCCIDs active onDCCP header, DCCP options, network-layer pseudoheader (described below), and thetwo half-connections. These properties are called features. (We reserveentire DCCP payload, possibly padded on theterm "option" for a collectionright with zeros to an even number ofbytes in somebytes. CsCov = 1-15 Checksum covers the DCCPheader.) Each feature type, such as "CCID", corresponds to two independent features, one per half- connection. For instance, there are two CCIDs per connection. The endpoint in chargeheader, DCCP options, network-layer pseudoheader, and the initial (CsCov-1)*4 bytes ofa particular featurethe DCCP payload. Thus, if CsCov iscalled its feature location. The Change, Prefer, and Confirm options negotiate feature values. Change1, none of the DCCP payload issent to a feature location, asking it to change its value forprotected by thefeature.header checksum. Thefeature location may respond with Prefer, which asks the other endpoint to Change again with different values,value (CsCov-1)*4 MUST be less than orit may changeequal to thefeature value and acknowledgelength of therequestDCCP payload. Packets withConfirm. Retransmissions make feature negotiation reliable. Section 6.3 describes theseinvalid CsCov values MUST be ignored; in particular, their optionsfurther. 4.MUST NOT be processed. The meanings of values other than 0 and 1 should be considered experimental. Values other than 0 specify that corruption is acceptable in some or all of the DCCPPacketspacket's payload. In fact, DCCPhas nine different packet types: o DCCP-Request o DCCP-Response o DCCP-Data o DCCP-Ack o DCCP-DataAck o DCCP-CloseReq o DCCP-Close o DCCP-Reset Kohler/Handley/Floyd/Padhye Section 4. [Page 10] INTERNET-DRAFT Expires: December 2003 June 2003 o DCCP-Move Onlycannot even detect corruption in areas not covered by thefirst eight types commonly occur. The DCCP-Move packetheader checksum, unless the Payload Checksum option is used (Section 8.8). Applications should not make any assumptions about the correctness of received data not covered by the checksum, and should if necessary introduce their own appropriate validity checks. Kohler/Handley/Floyd/Padhye Section 5.1. [Page 23] INTERNET-DRAFT Expires: April 2004 October 2003 A DCCP application interface should let sending applications suggest a value for CsCov for sent packets, defaulting tosupport multihoming and mobility. The progress0 (full coverage). It should also let receiving applications refuse delivery of packets with checksum coverage less than atypical DCCP connection isvalue provided by the application; by default, only packets with fully-covered payloads should be accepted. Lower layers that support partial error detection MAY use the Checksum Coverage field asfollows. (This description is informative,a hint of where errors do notnormative.) (1) The client sends the serverneed to be detected. Lower layers MUST use aDCCP-Request packet specifyingstrong error detection mechanism to detect at least errors that occur in theclient and server ports,sensitive part of theservice being requested,packet, andany features being negotiated, includingdiscard damaged packets. The sensitive part consists of theCCID thatbytes between theclient would likefirst byte of theserverIP header and the last byte identified by Checksum Coverage. For more details on application and lower-layer interface issues relating touse. The client may optionally piggybackpartial checksumming, see [UDP-LITE], from which this text was summarized. See Appendix B.1 for further motivation of partial checksums and discussion of partial checksumming issues. Partial checksums introduce somedata on the DCCP-Request packet---an application- level request, say---whichsecurity considerations, which are described in Section 16.2. DCCP partial checksumming was inspired by UDP-Lite [UDP-LITE]. Checksum: 16 bits DCCP uses theserver may ignore. (2)TCP/IP checksum algorithm. Theserver sendsChecksum field equals theclient16 bit one's complement of the one's complement sum of all 16 bit words in the DCCP header, DCCP options, aDCCP-Response packet indicating that it is willing to communicate withpseudoheader taken from the network-layer header, and, depending on theclient. The response indicates any features and options thatvalue of theserver agrees to, beginsChecksum Coverage field, some orcontinues other feature negotiations if desired, and optionally includes an Init Cookie that wraps upallthis information and which must be returned byof theclient forpayload. When calculating theconnection to complete. (3) The client sendschecksum, theserverChecksum field itself is treated as 0. If aDCCP-Ackpacketthat acknowledges the DCCP-Response packet. This acknowledges the server's initial sequencecontains an odd number of header andreturns the Init Cookie if there was one in the DCCP-Response. It may also continue feature negotiation. (4) Next comes zero or more DCCP-Ack exchanges as requiredtext bytes tofinalize feature negotiation. The client may piggyback an application-level requestbe checksummed, 8 zero bits are added onits final ack, producingthe right to form aDCCP- DataAck16 bit word for checksum purposes. The pad byte is not transmitted as part of the packet.(5)Theserverpseudoheader is calculated as for TCP. For IPv4, it is 96 bits long, andclient then exchange DCCP-Data packets, DCCP-Ack packets acknowledging that data, and, optionally, DCCP-DataAck packets containing piggybacked dataconsists of the IPv4 source andacknowledgements. Ifdestination addresses, theclient has no data to send, thenIP protocol number for DCCP (padded on theserver will send DCCP-Dataleft with 8 zero bits), andDCCP-DataAck packets, while the client will send DCCP-Acks exclusively. (6) The server sends a DCCP-CloseReq packet requesting a close. (7) The client sends a DCCP-Close packet acknowledgingtheclose. (8) The server sendsDCCP length as aDCCP-Reset packet whose Reason field16-bit quantity (the length of the DCCP header with options, plus the length of any data); see Section 3.1 of [RFC 793]. For IPv6, it isset to "Closed",320 bits long, andclears its connection state. In DCCP, unlike TCP, Resets are partconsists ofnormal connection termination;the IPv6 source and destination addresses, the DCCP length as a 32-bit quantity, and the IP protocol number for DCCP (padded on the left with 24 zero bits); see Section5.8.8.1 of [RFC 2460]. Packets with invalid header checksums MUST be ignored. In particular, their options MUST NOT be processed. Kohler/Handley/Floyd/Padhye Section4.5.1. [Page11]24] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003(9)Type: 4 bits Theclient receivestype field specifies the type of the DCCP message. The following values are defined: 0 DCCP-Request packet. 1 DCCP-Response packet. 2 DCCP-Data packet. 3 DCCP-Ack packet. 4 DCCP-DataAck packet. 5 DCCP-CloseReq packet. 6 DCCP-Close packet. 7 DCCP-Reset packet. 8 DCCP-Move packet. 9 DCCP-Sync packet. 10-15 Reserved. Extended Sequence Numbers (X): 1 bit This bit is set to one to indicate the use of an extended generic header with 48-bit Sequence and Acknowledgement Numbers. The format described in the section has X set to zero. Section 5.3 describes the extended generic header. Number of Non-Data Packets (# NDP): 3 bits DCCP sets this field to the number of non-data packets it has sent so far on its sequence, modulo 8 A non-data packet is simply any packet not containing user data; DCCP-Ack, DCCP- Close, DCCP-CloseReq, andholds state forDCCP-Reset are always non-data packets, while DCCP-Request, DCCP-Response, and DCCP-Move might or might not be. When sending areasonable interval of time to allow any remaining packets to clearnon-data packet, DCCP increments thenetwork. An alternative connection closedown sequence is initiated by# NDP counter before storing its value in theclient: (6) The client sends a DCCP-Closepacketclosingheader. This field can help theconnection. (7) The server sendsreceiving DCCP decide whether aDCCP-Resetlost packetwith Reason field setcontained any user data. (An application may want to"Closed" and clears its connection state. (8) The client receives the DCCP-Resetknow when it has lost data. DCCP could report every packetand holds state forloss as areasonable interval of time to allow any remainingpotential data loss, but that would cause false loss reports when non-data packets were lost.) For example, say that Kohler/Handley/Floyd/Padhye Section 5.1. [Page 25] INTERNET-DRAFT Expires: April 2004 October 2003 packet 10 had # NDP set toclear the network. This arrangement of setup5; packet 11 was lost; andteardown handshakes permits the server to declinepacket 12 had # NDP set tohold any state until the handshake with5. Then theclient has completed, and ensuresreceiving DCCP could deduce thatthe clientpacket 11 contained data, since # NDP did not change. Likewise, if # NDP had gone up to 6 (and packet 12 contained user data), then packet 11 musthold the Time-Wait state at connection closedown. 4.1. Examples of DCCP Congestion Control Before giving the detailed specifications of DCCP, we present two more detailed examples showing DCCP congestion control in operation. Again, these examplesnot have contained any data. # NDP can overflow, causing ambiguities. For example, if 8 packets areinformative, not normative. 4.1.1. DCCP with TCP-like Congestion Control The first example is ofdropped in aconnection where both half-connections use TCP-like Congestion Control, specified by CCID 2 [CCID 2 PROFILE]. In this example,row but # NDP does not change, theclient sends an application-level requestreceiver will not be able to tell whether or not any of theserver, andlost packets contained data. Thus, applications SHOULD NOT depend on theserver responds with a streamavailability ofdata packets. This example isunambiguous # NDP information. DCCP itself uses # NDP only as a hint of when a connectionusing ECN. (1)has left unidirectional mode; potential ambiguities are not harmful there. Sequence Number: 24 bits Theclient sends the DCCP-Request, which includessequence number field is initialized by aChange option asking the server to use CCID 2 for the server's data packets,DCCP-Request or DCCP-Response packet, anda Prefer option informing the server that the client would like to use CCID 2 for the its data packets. (2)increases by one (modulo 16777216) with every packet sent. Theserver sends a DCCP-Response, including a Confirm option indicating that the server agrees to use CCID 2 for its data packets, and a Change option indicating that the server agreesreceiver uses this information tothe client's suggestion of CCID 2 for the client'sdetermine whether packet losses have occurred. Even packets containing no datapackets. (3) The client responds with a DCCP-DataAck acknowledgingupdate theserver'ssequence number. Sequence numbers also provide some protection against old and malicious packets and half-open connections; see Section 5.2 on sequence number validity. The two subflows' initial sequencenumber, and including a Confirm option Kohler/Handley/Floyd/Padhye Section 4.1.1. [Page 12] INTERNET-DRAFT Expires: December 2003 June 2003 finalizing the negotiation ofnumbers are set by theclient-to-server CCID,first DCCP-Request andan application-level requestDCCP-Response packets sent, and SHOULD be chosen as fordata. We will not discuss the client-to-server half-connection further in this example. (4)TCP. In particular, initial sequence number choice MUST include a random or pseudorandom component to make it harder for attackers to complete sequence number attacks [RFC 1948]. Theserver sends DCCP-Data packets, where theinitial sequence numberof packets sent is governed bychosen for acongestion window,given connection identifier (source address and port plus destination address and port) SHOULD increase over time, asin TCP. The detailsTCP suggests [RFC 793], to prevent inappropriate delivery of old packets. If thecongestion window are defined inheader's X bit equals one, theprofileSequence Number field extends for another 24 bits forCCID 2, which isaseparate document [CCID 2 PROFILE]. The server also sends Ack Ratio feature options specifying the numbertotal ofserver data packets48. Very-high-rate connections SHOULD use these extended 48-bit sequence numbers tobe covered by an Ackprotect against wrapped sequence numbers; see Section 5.3. Many packetfromtypes also carry an Acknowledgement Number in theclient. Each DCCP-Data packetfour bytes following the generic header. Its format issentasECN-Capable, with either the ECT(0) orfollows: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Kohler/Handley/Floyd/Padhye Section 5.1. [Page 26] INTERNET-DRAFT Expires: April 2004 October 2003 Acknowledgement Number: 24 bits The Acknowledgement Number field acknowledges the greatest valid sequence number received so far on this connection. ("Greatest" is, of course, measured in circular sequence space.) Acknowledgement numbers make no attempt to provide precise information about which packets have arrived; options such as theECT(1) codepoint set, as described in [ECN NONCE]. (5)Ack Vector do this. Theclient sendsAcknowledgement Number MUST correspond to a "received" packet, where aDCCP-Ackpacketacknowledgingis classified as "received" if and only if its options were processed by thedata packetsreceiving DCCP. (This means, forevery Ack Ratio dataexample, that received packetstransmitted by the server. Each DCCP-Ack packet uses a sequence numbermust be both header- checksum-valid andcontains an Ack Vector, as defined in Section 8 on Acknowledgements. Thesesequence-valid.) Even "received" packetsalso include Confirm options answering any Ack Ratio requests from the server. The DCCP-Acks are also sent as ECN-Capable, with either ECT(0)may have their payloads dropped, due to receive buffer overflow orECT(1). The client's Ack Vector echoes the accumulated ECN Noncepayload corruption, forthe server's packets. (6)instance. Theserver must occasionally acknowledge the client's acknowledgements, soHC-Receiver will send Data Dropped options when this happens (see Section 8.7); theclient can cleanHC-Sender will reduce itsacknowledgement state. It can do so bysendingseparate DCCP-Acks as allowed by CCID 2,rate orby piggybacking acknowledgement informationcongestion window as appropriate. This issue is discussed further in Sections 8.5 and 8.7. If the header's X bit equals one, the Acknowledgement Number field extends for another 24 bits for a total of 48. Again, see Section 5.3. Reserved: 8 bits The version of DCCP specified here MUST ignore this field on received packets, and MUST set it to all zeroes onits datagenerated packets. 5.2. Sequence Number Synchronization DCCP implementations must react to packetswith the DCCP-DataAck packet type. The acknowledgement information may contain detailed Ack Vectors, likethat are not intended for theclient's acknowledgements; butcurrent connection. This can happen if theclient is sending nothing but acknowledgements,network delivers an old packet, if an attacker attempts to hijack a connection, during theserver's acks-of-acks can be more lightweight. See Section 8.1cleanup of a half-open connection, or formore information. Like the server's DCCP-Data packets, the server's DCCP-DataAckother reasons. DCCP, like TCP, uses sequence number checks andDCCP-AckReset packetsareto defend against these packets. Every DCCP packet sentas ECN-Capable. (7) The server continues sending DCCP-Datauses a new sequence number, however; thus, given large enough bursts of loss, a connection's endpoints might get out of sync relative to any window, requiring a mechanism to restore synchronization. This section describes the algorithms that determine when DCCP packetsas controlled byare intended for thecongestion window. Upon receiving DCCP-Ack packets,current connection, and theserver examinesactions taken on unintended packets. 5.2.1. Variables DCCP sequence number synchronization depends on theAck Vector to learn about marked or dropped data packets, and adjusts its congestion window accordingly, as described in [CCID 2 PROFILE]. Becausefollowing variables, which are maintained by each endpoint. Kohler/Handley/Floyd/Padhye Section 5.2.1. [Page 27] INTERNET-DRAFT Expires: April 2004 October 2003 GSS The Greatest Sequence Number Sent by this endpoint so far. ("Greatest" isunreliable transfer, the server does not retransmit dropped packets. (8) Because DCCP-Ack packets useof course measured in circular sequencenumbers,space.) GSR The Greatest Sequence Number Received from theserver has direct information aboutother endpoint so far. GAR (Optional) The Greatest Acknowledgement Number Received from thefractionother endpoint so far. Some other variables are derived from these primitives. SWL and SWR (Sequence Number Window Left and Right) The two endpoints ofloss or marked DCCP-Ack Kohler/Handley/Floyd/Padhye Section 4.1.1. [Page 13] INTERNET-DRAFT Expires: December 2003 June 2003 packets. [CCID 2 PROFILE] defines howtheserver modifieswindow within which Sequence Numbers are appropriate. AWL and AWR (Acknowledgement Number Window Left and Right) The two endpoints of theclient's Ack Ratiowindow within which Acknowledgement Numbers are appropriate. 5.2.2. Appropriate Sequence Numbers A sequence number S is appropriate iff SWL <= S <= SWR inresponsecircular sequence space. This resembles TCP's receive window. However, in DCCP, sequence numbers change with each packet sent, even pure acknowledgements. Thus, a loss event that dropped many consecutive packets could cause two DCCPs to get out of sync relative to anycongestion on the acknowledgement stream. (9) The server estimates round-trip timeswindow, andcalculatesaTimeOut (TO) value much aspacket beyond theRTO (Retransmit Timeout)window iscalculatednot necessarily a hard error. DCCP-Sync packets help inTCP. Again, the specification forthisis in [CCID 2 PROFILE]. The TO is usedsituation. DCCP A sets SWL and SWR todetermine whenanew DCCP-Data packet can be transmitted when the server has been limited by the congestionloss windowand no feedback has been received from the client. (10) The DCCP-CloseReq, DCCP-Close, and DCCP-Reset packets to close the connection are asof W consecutive sequence numbers containing GSR. ("Consecutive", like "greatest", is measured in circular sequence space.) One-third of theexample above. 4.1.2. DCCPloss window, rounded down, is placed at and before GSR, withTFRC Congestion Control This exampletwo-thirds after GSR. Sequence numbers outside this loss window are inappropriate. inapprop. | appropriate Sequence Numbers | inapprop. <---------*|*===========*======================*|*---------> GSR -|GSR + 1 - GSR GSR +|GSR + 1 + floor(W/3)|floor(W/3) ceil(2W/3)|ceil(2W/3) = SWL = SWR During connection startup, DCCP A MUST adjust SWL so that it is not less than DCCP B's initial sequence number. DCCP B informs DCCP A of W, the loss window width DCCP A should use, via the Loss Window feature (Section 6.10). W defaults to 1000, but Kohler/Handley/Floyd/Padhye Section 5.2.2. [Page 28] INTERNET-DRAFT Expires: April 2004 October 2003 aconnection where both half-connections use TFRC Congestion Control, specified by CCID 3 [CCID 3 PROFILE]. (1) The DCCP-Request and DCCP-Responseproper value should reflect how many packetsspecifying the use of CCID 3 andtheinitial DCCP-DataAck packet are similarsender expects tothosebe in flight. Only theCCID 2 example above. (2) The server sends DCCP-Data packets, wheresender can anticipate this number. Too- small values increase thenumber of packets sent is governed by an allowed transmit rate, as in TFRC. The detailsrisk of theallowed transmit rate are defined inendpoints getting out sync after bursts of loss; too-large values increase theprofile for CCID 3, whichrisk of connection hijacking. One good guideline isa separate document [CCIDto set it to about 3PROFILE]. Each DCCP-Data packet has a sequenceor 4 times the maximum numberand a window counter value. Someofthese data packets are DCCP-DataAckpacketsacknowledging packets fromtheclient, but for simplicity we willsender expects to send in a round-trip time. This value may notdiscussbe available at connection initiation, when thehalf-connection of data fromround-trip time is unknown, but theclient tosender can always send updates as theserver in this example.connection progresses. 5.2.3. Appropriate Acknowledgement Numbers Theuse of ECN follows TCP-like Congestion Control, above,Acknowledgement Number on a packet from DCCP B is appropriate iff it lies within the window [AWL, AWR], where AWR = GSS, and the window isdescribed further in [CCID 3 PROFILE]. (3) The receiver sends DCCP-AckW' packetsat least once per round-trip time acknowledgingwide. W' is thedata packets, unlessvalue of DCCP A's Loss Window feature, which it defined in its role as HC-Sender for theserverother half-connection. inapprop. | appropriate Acknowledgement Numbers | inapprop. <---------*|*===================================*|*----------> GSS - W'|GSS - W' + 1 GSS|GSS + 1 = AWL = AWR During connection startup, DCCP A MUST adjust AWL so that it issending at a rate ofnot less thanone packet per RTT, as specified by [CCID 3 PROFILE]. These acknowledgements may be piggybacked on data packets, producing DCCP-DataAck packets. Each DCCP-Ack packet uses aits initial sequencenumber and identifies the most recentnumber. 5.2.4. Sequence-Validity By State A packetreceived fromis called sequence-valid when its sequence numbers indicate that it is intended for theserver. Each DCCP-Ack packet includes feedback aboutcurrent connection. The rules for sequence-validity depend on theloss event rate calculated bystate of theclient,connection. The baseline rules for sequence-validity are asspecifiedfollows: CLOSED and LISTEN states All packets are sequence-valid (but most packet types will cause a Reset to be generated by[CCID 3 PROFILE].later validity checks). REQUEST state A packet is sequence-valid if and only if it has an appropriate Acknowledgement Number. All other states (1) DCCP-Data packets are sequence-valid if and only if their Sequence Numbers are appropriate. Kohler/Handley/Floyd/Padhye Section4.1.2.5.2.4. [Page14]29] INTERNET-DRAFTExpires: December 2003 June 2003 (4) The server continues sending DCCP-Data packets as controlled by the allowed transmit rate. Upon receiving DCCP-Ack packets, the server updates its allowed transmit rate as specified by [CCID 3 PROFILE]. (5) The server estimates round-trip timesExpires: April 2004 October 2003 (2) DCCP-Sync andcalculates a TimeOut (TO) value much as the RTO (Retransmit Timeout)DCCP-Reset packets are sequence-valid if and only if their Acknowledgement Numbers are appropriate. (3) The sequence-validity of DCCP-Move packets iscalculateddiscussed inTCP. Again, the specification forSection 5.10. (4) All other packets are sequence-valid if and only if both their Sequence and Acknowledgement Numbers are appropriate. DCCP implementations MAY implement additional checks to protect against packets that have valid sequence numbers, but are not part of thisis in [CCID 3 PROFILE]. (6)connection. TheDCCP-CloseReq, DCCP-Close, andadditional checks provide an incremental security advantage at a moderate complexity cost. o DCCP-Reset packetsto close themay not have valid Sequence Numbers because they might be generated by a closed connectionare asinthe examples above. 5. Packet Formats 5.1. Generic Packet Header Allresponse to DCCP-Data packets, which have no Acknowledgement Number. However, DCCPpackets begin withimplementations MUST supply ageneric DCCP packet header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | CCval |valid Sequence Number| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | # NDP | Cslen | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source and Destination Ports: 16 bits each These fields identifywhen one is available (either from connection information or theconnection, similarAcknowledgement Number), and use Sequence Number 0 otherwise. Thus, valid DCCP-Reset packets fall into two categories: Either they contain an appropriate Sequence Number, or they have Sequence Number 0 and their Acknowledgement Number corresponds tothe corresponding fieldsa DCCP- Request or DCCP-Data packet. Implementations that check this invariant MUST ignore DCCP-Resets that don't fit. (Do not, for example, send a DCCP-Sync inTCPresponse to such a Reset.) o DCCP implementations transition to CLOSED state after sending a DCCP-Reset packet, andUDP. The Source Port represents the relevant portwill not send further non-Reset packets onthe endpointthatsentconnection. Therefore, valid DCCP-Reset packets have Sequence Numbers greater than GSR (except for those with Sequence Number 0, as mentioned above), and Acknowledgement Numbers greater than or equal to GAR. Again, implementations that check thispacket, the Destination Port the relevant port on the other endpoint. Type: 4 bits The type field specifies the type ofinvariant MUST ignore DCCP-Resets that don't fit. o Implementations that can detect duplicate sequence numbers within theDCCP message. The following valuescurrent Loss Window should ignore duplicate packets. (Of course, sequence number space can wrap; this refers to packets whose sequence numbers have recently been seen.) o DCCP-Sync packets with Sequence Number less than GSR, or with Acknowledgement Number less than GAR, aredefined: 0 DCCP-Request packet. 1 DCCP-Response packet. 2 DCCP-Data packet. 3 DCCP-Ack packet.stale and MUST be ignored when detected. Implementing these checks should not cause interoperability problems, but augmenting the list with additional ad-hoc checks is NOT RECOMMENDED. Kohler/Handley/Floyd/Padhye Section5.1.5.2.4. [Page15]30] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 20034 DCCP-DataAck packet. 5 DCCP-CloseReq packet. 6 DCCP-Close packet. 7 DCCP-Reset packet. 8 DCCP-Move packet. CCval: 4 bits This field is reserved for5.2.5. Handling Sequence-Invalid Packets Sequence-invalid DCCP-Move, DCCP-Reset, and DCCP-Sync packets MUST be ignored. Otherwise, on receiving a sequence-invalid packet, a DCCP endpoint (say DCCP A) MUST reply with a DCCP-Sync packet, as allowed by the congestion control mechanism in use. This packet MUST acknowledge the packet's Sequence Number (not GSR!). Any DCCP-Sync MUST usebya new Sequence Number, and thus will increase GSS; GSR will not change, however, since thesending CCID. In particular,packet was sequence-invalid. DCCP A MUST NOT otherwise process sequence-invalid packets. On receiving theA-to-B CCID's sender,DCCP-Sync, DCCP B will update its GSR variable and reply with a DCCP-Sync of its own. When DCCP A receives this DCCP- Sync, which acknowledges its DCCP-Sync (and isactive attherefore sequence- valid), it will update its GSR variable, thus getting the endpoints back into sync. Alternatively, if the connection was half-open, DCCPA, MAYB will sendinformation toa Reset. To protect itself against denial-of-service attacks (where an attacker sends purposefully invalid packets, thereby forcing the receiverat DCCP B by encoding that information in CCval.to send DCCP-Syncs), a DCCPproper MUSTimplementation MAY ignore packets with inappropriate Sequence Numbers if thefield. Ifconnection is still active. By "ignore", we mean that therelevant CCID does not specify its value, it SHOULD be set to zero. Sequence Number: 24 bits The sequence number fieldpacket isinitialized bydiscarded without sending aDCCP-Request or DCCP-Response packet, and increases by one (modulo 16777216) with every packet sent. The receiver uses this information to determine whether packet lossesDCCP-Sync. A connection is "active" when appropriate Sequence Numbers haveoccurred. Even packets containing no data updatebeen recently received; "recently" might mean within the last second or the last RTT, whichever is shorter. Similarly, a DCCP MAY rate-limit the DCCP-Syncs sent in response to sequence-invalid packets. 5.2.6. Examples In this first example, DCCP A and DCCP B recover from a large burst of loss that runs DCCP A's sequencenumber. Sequencenumbersalso provide some protection against old and malicious packets; see Section 5.2 onout of DCCP B's appropriate sequence numbervalidity. Very-high-rate DCCPs may need protection against wrappedwindow. Kohler/Handley/Floyd/Padhye Section 5.2.6. [Page 31] INTERNET-DRAFT Expires: April 2004 October 2003 Recovery from Burst of Loss DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) ---> DCCP-Data(seq 2) XXX ... ---> DCCP-Data(seq 100) XXX ---> DCCP-Data(seq 101) ---> ??? seqno out of range; send Sync OK <--- DCCP-Sync(seq 11, ack 101) <--- (GSS=11,GSR=1) ---> DCCP-Sync(seq 102, ack 11) ---> OK (GSS=102,GSR=11) (GSS=11,GSR=102) In this example, a DCCP connection recovers from a simple attack. The attacker cannot guess sequence numbers.For example,(DCCP is not robust to attackers who can guess sequence numbers.) Recovery from Attack DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) *ATTACKER* ---> DCCP-Data(seq 10^6) ---> ??? seqno out of range; send Sync ??? <--- DCCP-Sync(seq 11, ack 10^6) <--- ackno out of range; ignore (GSS=1,GSR=10) (GSS=11,GSR=1) The final example demonstrates recovery from a half-open connection. Recovery from a Half-Open Connection DCCP A DCCP B (GSS=1,GSR=10) (GSS=10,GSR=1) (Crash) CLOSED OPEN REQUEST ---> DCCP-Request(seq 400) ---> ??? !! <--- DCCP-Sync(seq 11, ack 400) <--- OPEN REQUEST ---> DCCP-Reset(seq 401, ack 11) ---> (Abort) REQUEST CLOSED REQUEST ---> DCCP-Request(seq 402) ---> ... 5.3. Extended Sequence Numbers A 10 Gb/s flow of 1500-byte DCCP packets will send 2^24 packets in about 20 seconds. This is along time, in terms of likely round-trip times that could possibly achieve such a sustained rate, but it is not without risk. DCCP's current congestion control mechanisms are designed for congestion windows (or equivalents) of at most a few hundred thousand packets, leaving at least 32 RTTs before sequence numbers wrap. We leave the design of protection against wrapped sequence numbers for the future, when DCCP's congestion control mechanisms can handle larger congestion windows. The two subflows' initial sequence numbers are set by the first DCCP-Request and DCCP-Response packets sent, and SHOULD be chosen as for TCP. In particular, initial sequence number choice MUST include a random or pseudorandom component to make it harder for attackers to complete sequence number attacks [RFC 1948]. The initial sequence number chosen for a given connection identifier (source address and port plus destination address and port) SHOULD increase over time, as TCP suggests [RFC 793], to prevent inappropriate deliverylong time, in terms ofold packets.likely round- Kohler/Handley/Floyd/Padhye Section5.1.5.3. [Page16]32] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003Data Offset: 8 bits The offset from the starttrip times that could possibly achieve such a sustained rate, but it is not without risk. DCCP's current congestion control mechanisms are designed for congestion windows (or equivalents) ofthe DCCP headerat most a few hundred thousand packets, leaving at least 32 RTTs before 24-bit sequence numbers wrap. However, very-high rate connections SHOULD use extended sequence numbers to gain more protection. DCCP extended sequence numbers are activated when thebeginning ofheader's X bit is set to one. This extends thepacket's payload, measured in 32-bit words.Sequence Number and Acknowledgement Number fields by an additional 24 bits, for a total ofNon-Data Packets (# NDP): 4 bits48 bits. A flow of 1500-byte DCCPsets this fieldpackets would have to send more than 28 petabits per second to overflow 48-bit sequence numbers within thenumber of non-data packets it2-minute maximum segment lifetime. The 48-bit numbers are stored in network order, with most significant bit first. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Dest Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Offset | CCVal | CsCov | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type |1|# NDP| Sequence Number (high bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (low bits) | Reserved |T| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ All packet types except for DCCP-Data and DCCP-Request will follow this generic header with an extended Acknowledgement Number: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number (high bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgement Number (low bits) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Once an endpoint has sentso far on its sequence, modulo 16. A non-data packet is simplyany packetnot containing user data; DCCP-Ack, DCCP- Close, DCCP-CloseReq, and DCCP-Reset are always non-data packets, while DCCP-Request, DCCP-Response, and DCCP-Move might or might not be. When sending a non-data packet, DCCP increments the # NDP counter before storing its value in the packet header. This field can help the receiving DCCP decide whether a lost packet contained any user data. (An application may want to know whenwith 48-bit sequence numbers (X=1), it MUST send all succeeding packets with 48-bit sequence numbers. Furthermore, once an endpoint haslost data. DCCP could report everyreceived any packetloss as a potential data loss, but that would cause false loss reports when non-datawith 48-bit sequence numbers, it MUST either send all succeeding packetswere lost.) For example, say that packet 10 had # NDPwith 48-bit sequence numbers, or reset the connection with Reason set to5; packet 11 was lost; and packet 12 had # NDP set"Extended Sequence Numbers" (15). Clients SHOULD decide whether to5. Then the receiving DCCP could deduce that packet 11 contained data, since # NDP did not change. Likewise, if # NDP had gone upuse extended sequence numbers before sending their DCCP-Requests. That is, connections SHOULD NOT transition from 24-bit to6 (and packet 12 contained user data), then packet 11 must not have contained any data. # NDP can overflow, causing ambiguities. For example, if 16 packets are dropped48-bit sequence numbers; they SHOULD contain only 24-bit sequence numbers, or only 48-bit sequence Kohler/Handley/Floyd/Padhye Section 5.3. [Page 33] INTERNET-DRAFT Expires: April 2004 October 2003 numbers. The Transition bit (T) supports transitioning to extended sequence numbers during an active connection, however, ina row but # NDP does not change,case this proves necessary; see below. Extended sequence numbers are treated simply as longer sequence numbers. For instance, thereceiver will not be able to tellsequence-validity mechanisms work the same way whether or notany of the lost packets contained data. DCCP proper does not depend on # NDP's value in any significant way. Checksum Length (Cslen): 4 bits The checksum length field specifies what parts of the packetsequence numbers arecoveredextended. Care is required when comparing a 24-bit sequence number with an 48-bit sequence number; see below. Extended sequence numbers improve security against attackers bythe checksum field. The checksum always covers at least the DCCP header, DCCP options, andmaking it harder to guess apseudoheader taken from the network-layer header (described under Checksum below). Ifvalid sequence number, as well as protecting against benign wrapping. 5.3.1. Transitioning to Extended Sequence Numbers The Transition bit (T) following thechecksum lengthextended Sequence Number fieldis zero, that is all the checksum covers. Ifmakes it possible to transition to 48-bit sequence numbers in thefieldmiddle of a connection. T is15, the checksum coversset to one only during such a transition. When DCCP A switches to 48-bit sequence numbers, it MUST set thepacket's payload as well, possibly with 8 bitsT bit to one on all ofzero paddingits packets for some period. This period SHOULD last on therightorder of a few round trip times, or until DCCP A receives an acknowledgement from DCCP B proving that one of its 48-bit-sequence-number packets has been received, whichever comes later. Each DCCP MUST choose its first 48-bit sequence number topadhave its lower 24 bits equal thepayload24-bit sequence number it expected to send (GSS+1). If DCCP A sends aneven number of bytes. Values between 1 and 14, inclusive, indicate that the checksum covers theextended packet containing an Acknowledgement Number before DCCPheader,B sends it a 48-bit Sequence Number, DCCPoptions,A may send any value for thepseudoheader, and that number of initial 32-bit wordsupper 24 bits of that Acknowledgement Number, but thepacket's payload, padded onlower 24 bits MUST equal theright with zerosexpected 24-bit Acknowledgement Number (GSR). Furthermore, DCCP A MUST leave GSR asnecessary. Values other than 15 specify that corruption is acceptable in some or all of thea 24-bit number until receiving an extended packet from DCCPpacket's payload. In fact,B. If DCCPcannot even detect corruption there, unlessB transitions to extended sequence numbers because it receives a valid packet with extended sequence numbers, it MAY set thePayload Checksum option is used (Section 8.8). The meaningupper 24 bits ofvalues other than 0 and 15 Kohler/Handley/Floyd/Padhye Section 5.1. [Page 17] INTERNET-DRAFT Expires: December 2003 June 2003 should be considered experimental. Section 18.1 further discussesits extended sequence number based on themotivation of, and issues related to, partial checksums. The checksum length field was inspired by UDP-Lite [UDP-LITE]. Checksum: 16upper 24 bitsDCCP usesof theTCP/IP checksum algorithm. The checksum field equalsreceived Acknowledgement Number, but it can also choose a different upper 24 bits. Switching to 48-bit sequence numbers in the16 bit one's complementmiddle of a connection raises theone's complement sumissue ofall 16 bit words incomparing a 24-bit sequence number with a 48-bit sequence number. (This may also occur if theDCCP header, DCCP options,network delivers apseudoheader takenpacket from an old connection, or given a malicious attacker.) Let P be thenetwork-layer header, and, depending onpacket sequence number received from DCCP B, and E be thevalue ofsequence number DCCP A expects. During sequence-validity computations, for example, P might be thechecksum length field, some or all ofpacket's Acknowledgement Number and E might be AWL, thepayload. When calculatingleft edge of thechecksum,appropriate Kohler/Handley/Floyd/Padhye Section 5.3.1. [Page 34] INTERNET-DRAFT Expires: April 2004 October 2003 acknowledgement number window. Then DCCP A should perform thechecksum field itself is treatedcomparison as0.follows. o IfaP and E are both 24 bits, compare them modulo 2^24. o If P and E are both 48 bits, the packet's Transition bit is set, and the last packetcontains an odd number of headersent by DCCP A had its Transition bit set, then compare P andtext bytes to be checksummed, 8 zeroE modulo 2^24. This covers the case where both endpoints transitioned simultaneously, so P and E's upper 24 bits might disagree. o Otherwise, if P and E areadded onboth 48 bits, compare them modulo 2^48. o If P is 48 bits but E is 24, therightremote DCCP may want toform a 16transition to extended sequence numbers. If the packet's Transition bitword for checksum purposes. The pad byteis nottransmitted as part ofset, thepacket. The pseudoheaderpacket iscalculated as for TCP. For IPv4,definitely sequence- invalid; otherwise, compare P with E modulo 2^24. If the packet proves sequence-valid, then it is96OK; transition to extended sequence numbers, and set E according to the full 48 bits of P. If the packet does not prove sequence-valid, send an (extended) DCCP-Sync as required (with T set to one), but do not yet transition to extended sequence numbers. o If P is 24 bitslong, and consists ofbut E is 48, there may have been benign packet reordering. The correct action depends on whether theIPv4 source and destination addresses,last packet seen from theIP protocol number forremote DCCP(padded onhad theleft with 8 zero bits), andTransition bit set. o If Transition was not set, then theDCCP lengthpacket is sequence-invalid; send an (extended) DCCP-Sync as required. o If Transition was set, extend P to a16-bit quantity (the length of the DCCP header with options, plus48-bit value P'. First, let EH equal thelength of any data); see Section 3.1 of [RFC 793]. For IPv6, it is 320upper 24 bitslong, and consistsofthe IPv6 source and destination addresses, the DCCP length as a 32-bit quantity,E, and EL equal theIP protocol number for DCCP (padded on the left withlower 24zero bits); see Section 8.1bits of[RFC 2460]. Packets with invalid checksums MUST be ignored. In particular, their options MUST NOT be processed. 5.2. Sequence Number Validity DCCP endpoints SHOULD ignore packets with invalid sequence numbers, which may arise ifE. Then: If EL > P, set P' = (EH << 24) | P. Otherwise, set P' = (((EH - 1) mod 2^24) << 24) | P. If thenetwork delivers a very oldpacketor an attacker attempts to hijack a connection. TCP solves this problem with its window. In DCCP, however, sequence numbers changeproves sequence-valid when comparing witheachP' modulo 2^48, then it is OK; the packetsent, even pure acknowledgements. Thus, a loss event that dropped many consecutive packets could cause two DCCPs to get out of sync relativewas reordered from before the transition. If it does not, send an (extended) DCCP-Sync (with T set toany window.one) as required. DCCPuses Loss Window and Identification mechanismsimplementations can, of course, avoid most of this complexity by disallowing transitions todetermine whether a given packet'sextended sequencenumber is valid. Each HC-Sender givesnumbers (and by resetting thecorresponding HC-Receiver a loss window width W; see Section 6.9. W defaults to 1000, butconnection when the other endpoint attempts such aproper value should reflect how many packetstransition). Connections that use 48-bit sequence numbers throughout, starting with thesender expects to be in flight. One good guideline is toDCCP-Request, MUST have T setittoabout 3 or 4 times the maximum number ofzero on all their packets. Kohler/Handley/Floyd/Padhye Section5.2.5.3.1. [Page18]35] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003packets the sender expects to send in a round-trip time. Only the sender can anticipate5.4. DCCP State Diagram In thisnumber. (Its value may not be available atsection we present a DCCP state diagram showing how a DCCP connectioninitiation, when the round-trip time is unknown, but the sender can always send updates asshould progress, and the proper responses for packets or timeout events in various connectionprogresses.) Too-small values increasestates. The state diagram is illustrative; therisk oftext should be considered definitive. +----------------------------------+ | Figure omitted from text version | +----------------------------------+ All receive events on theendpoints getting out sync after burstsdiagram represent receipt ofloss; too-large values increasesequence- valid packets with correct header checksums. For example, receiving a Reset with a bad Acknowledgement Number MUST NOT cause DCCP to transition to therisk of connection hijacking. The Identification mechanism is usedTIME-WAIT state. DCCP implementations SHOULD send Acks as described above in response toget back into sync when more than W consecutivesequence-invalid packets. Otherwise-valid packetsare lost. The HC-Receiver sets up a loss window of W consecutive sequence numbers containing GSN,without explicit transitions in theGreatest Sequence Number it has received on any valid packet fromstate diagram SHOULD be treated according to thesender. ("Consecutive" and "greatest"table below. Particular actions aremeasured in circular sequence space.) One-third of"OK", meaning theloss window, rounded down, is placed at and beforepacket MUST be processed according to this document; "Rst", meaning theGSN,receiver SHOULD respond withtwo- thirds aftera (possibly rate-limited) Reset; and "-", meaning theGSN. Sequence numbers outside this loss window are invalid. Packets with invalidpacket SHOULD be ignored. Entries may take the form "Old/New", where "Old" applies to old packets and "New" to new packets (whose sequence numbers arethemselves invalid, unless both ofgreater than GSR, thefollowing conditions are true: (1) No valid packet has been received recently (for instance, within at least one round-trip time), AND (2) The packet includes a correct Identification or Challenge option (see Section 6.4.3), and agreatest validAcknowledgement Number (meaning the Acknowledgement Number is withinsequence number seen so far). Data/Ack/ DataAck/ Reset/ State Request Response Move CloseReq Close Sync ------------- -------- -------- -------- -------- -------- -------- CLOSED Rst Rst Rst Rst Rst OK LISTEN OK Rst Rst(1) Rst Rst OK REQUEST Rst OK Rst Rst Rst OK RESPOND -/OK Rst Rst/OK Rst OK OK SERVER-OPEN -/Rst Rst OK Rst OK OK CLIENT-OPEN Rst -/Rst OK OK OK OK CLOSEREQ -/Rst Rst OK Rst OK OK CLOSING Rst -/Rst OK OK OK OK TIME-WAIT Rst Rst Rst Rst Rst OK Again, we note that thecorresponding Loss Window). The receiving DCCP SHOULD ignore invalidtable only applies to valid packets.In particular, itSequence-invalid packets SHOULDNOT pass any enclosed data tobe treated as described above. A DCCP endpoint that implements theapplication, update its congestion control or feature state, or closeInit Cookie option (Section 6.6) may change theconnection. However,Reset action marked (1). Init Cookie lets thereceiving DCCP MAY sendserver Kohler/Handley/Floyd/Padhye Section 5.4. [Page 36] INTERNET-DRAFT Expires: April 2004 October 2003 package all state for aDCCP-Ack packet torequested connection into an option that thesender, as allowed byclient will echo. A server with Init Cookie need not implement thecongestion control mechanism in use. ThisRESPOND state. Instead, it may reply to each DCCP-Request packetSHOULD acknowledge the last received valid sequence number and containwith aChallenge option (Section 6.4.4). The other DCCP will sendDCCP-Response containing anIdentification optionInit Cookie. When a DCCP-Data, Ack, or DataAck packet carrying a valid Init Cookie arrives from the client, the server will move directly from LISTEN toresync.OPEN. Like TCP SYN cookies [SYNCOOKIES], Init Cookies let servers avoid keeping any state for clients whose addresses have not been verified. A DCCP endpointMAY implement rate limits to reducein thelikelihood of denial-of-service attack.CLOSED or LISTEN state may not have a proper sequence number available to send a Reset. Inparticular,these cases, itMAY ignore all packets with bad sequence numbers---even those containing Identification or Challenge options---for some amount of time, onMUST set theorder of one round-trip time, after receiving a packet with an invalid Identification or Challenge option;Reset's Sequence Number to zero. Resets sent in the CLOSED, LISTEN, anditTIME-WAIT states SHOULD use Reset Reason "No Connection"; other Resets SHOULD use Reason "Invalid Packet". A DCCP MAYrate-limitsend Resets not listed in theChallenge optionsdiagram if itsends. 5.3. DCCP State Diagram In this section we present adetects an inconsistency---for example, if it receives two DCCP packets with the same sequence number, but different packet types. The Open statediagram showing howdoes not signify that a DCCP connectionshould progress, and the proper responsesis ready forpackets or timeout eventsdata transfer. In particular, incomplete feature negotiations might prevent data transfer. Feature negotiation takes place invarious connection states. Theparallel with the statediagram is Kohler/Handley/Floyd/Padhye Section 5.3. [Page 19] INTERNET-DRAFT Expires: December 2003 June 2003 illustrative;transitions on this diagram. Only thetext should be considered definitive. +----------------------------------+ | Figure omittedserver may take the transition fromtext version | +----------------------------------+ All receive events onthediagram represent receipt of valid packets. For example,OPEN state to the CLOSEREQ state. (The server is the DCCP endpoint that began in the LISTEN state.) Similarly, only the client must transition to CLOSE after receiving aReset withCloseReq packet. 5.5. DCCP-Request Packet Format A DCCP connection is initiated by sending a DCCP-Request packet. The format of abad Acknowledgement Number SHOULD NOT causeDCCPto transition torequest packet is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=0 (DCCP-Request) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Service Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Kohler/Handley/Floyd/Padhye Section 5.5. [Page 37] INTERNET-DRAFT Expires: April 2004 October 2003 Service Code: 32 bits The Service Code field describes theTime-Wait state. DCCP implementations MAY send Acks as described above, or "Invalid Packet" Resets, in responseservice toinvalid packets; any such responses SHOULD be rate-limited. Otherwise-valid packets without explicit transitions inwhich thestate diagram SHOULD be treated accordingsender is trying tothe table below. Particular actionsconnect. Service Codes are"OK", meaning the packet MUST be processed according32-bit numbers allocated by IANA; they are meant tothis document; "Rst", meaning the receiver SHOULD either ignore the packet or respond with a (rate-limited) Reset; and "-", meaning the packet SHOULD be ignored. Entries may take the form "Old/New", where "Old" appliescorrespond toold packetsapplication services and protocols, such as FTP and HTTP, and"New" to new packets (whose sequence numbersaregreater thannot intended to be DCCP-specific. With Service Codes, stateful middleboxes, such as firewalls, can identify thelargest sequence number seen so far). Data/Ack/ DataAck/ State Request Response Move CloseReq Close Reset ------------- -------- -------- -------- -------- -------- -------- CLOSED Rst Rst Rst Rst Rst OK LISTEN OK Rst Rst(1) Rst Rst OK REQUEST Rst OK Rst Rst Rst OK RESPOND -/OK Rst Rst/OK Rst OK OK OPEN (server) -/Rst Rst OK Rst OK OK OPEN (client) Rst -/Rst OK OK OK OK SERVER-CLOSE -/Rst Rst OK Rst OK OK CLIENT-CLOSE Rst -/Rst OK OK OK OK TIME-WAIT Rst Rst Rst Rst Rst OKapplication running on a nonstandard port (assuming the DCCP header has not been encrypted). A Service Code of zero is a wildcard, matching any service. Thetable respecifies some transitions listed inhost operating system MAY force every DCCP socket, both actively and passively opened, to specify a nonzero Service Code. Connection requests MUST fail if thestate diagram---for instance, those for receiving packetsDestination Port on the receiver has a different Service Code from that given in theTIME-WAIT state.packet, and both Service Codes are nonzero. Inthese cases, preferthis case, theaction listedreceiver will respond with a DCCP-Reset packet (with Reason set to "Bad Service Code"). A server or stateful middlebox MAY also send a "Bad Service Code" DCCP-Reset in response to packets whose Service Code is considered unsuitable. Options DCCP-Request packets will usually include a "Change R(Connection Nonce)" option, to inform thediagram. For example, inserver of theTIME-WAIT case, prefer sending rate-limited Resets when validclient's connection nonce; see Section 6.5. The client MAY send new DCCP-Request packetsare received; the table would allow ignoring them. However, either action wouldif no response is received after some timeout. The retransmission strategy SHOULD beacceptable. Kohler/Handley/Floyd/Padhye Section 5.3. [Page 20] INTERNET-DRAFT Expires: December 2003 June 2003 A DCCP endpointsimilar to thatimplements the Init Cookie option (Section 6.5) may change the Reset action marked (1). Init Cookie lets the server package all statefor retransmitting TCP SYNs; for instance, arequested connection into a DCCP option thatfirst timeout on theclient will echo. A serverorder of a second, withInit Cookie need not implementan exponential backoff timer. Each retransmission MUST increment theRESPOND state. Instead, it may replySequence Number, and possibly # NDP, by one. A client MAY decide toeach DCCP-Request packet with a DCCP-Response containing an Init Cookie. Whengive up after some number of DCCP-Requests. If so, it SHOULD send aDCCP- Data, Ack, or DataAckDCCP-Reset packetcarrying a valid Init Cookie arrives from the client,to theserver will move directly from LISTENserver, toOPEN. Like TCP SYN cookies [SYNCOOKIES], Init Cookies let servers avoid keeping anyclean up statefor clients whose addresses have not been verified. A DCCP endpointinthe CLOSEDcase one orLISTEN state may notmore of the Requests actually arrived. The DCCP-Reset SHOULD havea proper sequence number available to send a Reset. In these cases, it MUSTReason setSequence Numbertozero. The Open state does not signify that a DCCP connection is ready for data transfer."Aborted". 5.6. DCCP-Response Packet Format Inparticular, incomplete feature negotiations might prevent data transfer. Feature negotiation takes place in parallel with the state transitions on this diagram. Onlytheserver may takesecond phase of thetransition fromthree-way handshake, theOPEN stateserver sends a DCCP-Response message to theSERVER-CLOSE state. (Theclient. In this phase, a serveris the DCCP endpoint that began inwill often specify theLISTEN state.) Similarly, onlyoptions it would like to use, either from among those the clientmust transitionrequested, or in addition toCLIENT-CLOSE after receiving a CloseReq packet. 5.4. DCCP-Request Packet Format A DCCP connectionthose. Among these options isinitiated by sending a DCCP-Request packet. The format of a DCCP request packet is:the congestion control mechanism the server expects to use. Kohler/Handley/Floyd/Padhye Section 5.6. [Page 38] INTERNET-DRAFT Expires: April 2004 October 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / withType=0 (DCCP-Request)Type=1 (DCCP-Response) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Service NameReserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Service Name: 32Acknowledgement Number: 24 bitsThe Service NameIn the case of a DCCP-Response packet, the Acknowledgement Number fielddescribeswill equal theservice to whichsequence number from thesender Kohler/Handley/Floyd/Padhye Section 5.4. [Page 21] INTERNET-DRAFT Expires: December 2003 June 2003 is trying to connect. Service Names are 32-bit numbers allocated by IANA; they are meant to correspond to application servicescorresponding DCCP-Request. Options The Data Dropped andprotocols, such as FTPInit Cookie options are particularly useful for DCCP-Response packets (Sections 8.7 andHTTP,6.6). In addition, DCCP-Response, or early DCCP-Data or DCCP-Ack packets, may include "Confirm L(Connection Nonce)" andare not intended"Change R(Connection Nonce)" options, tobe DCCP-specific. With Service Names, stateful middleboxes, suchnegotiate connection nonces (Section 6.5), asfirewalls, can identifywell as options to negotiate CCIDs and other relevant features. The receiver MAY respond to a DCCP-Request packet with a DCCP-Reset packet to refuse theapplication running onconnection. Relevant Reset Reasons for refusing anonstandard port (assumingconnection include "Connection Refused", when theDCCP header hasDCCP- Request's Destination Port did notbeen encrypted). A Service Name of zero is a wildcard, matching any service. The host operating system MAY force every DCCP socket, both actively and passively opened,correspond tospecifyanonzeroDCCP port open for listening; "Bad ServiceName. Connection requests MUST fail ifCode", when the DCCP-Request's Service Code did not correspond to the service code registered with the DestinationPort onPort; and "Too Busy", when the server is currently too busy to respond to requests. The server SHOULD limit the rate at which it generates these resets. The receiverhasSHOULD NOT retransmit DCCP-Response packets; the sender will retransmit the DCCP-Request if necessary. (Note that the "retransmitted" DCCP-Request will have, at least, a differentService Namesequence number fromthat given inthepacket, and both Service Names are nonzero. In this case,"original" DCCP-Request; the receiver can thus distinguish true retransmissions from network duplicates.) The responder willrespond with a DCCP-Reset packet (with Reason set to "Bad Service Name"). A server or stateful middlebox MAY also send a "Bad Service Name" DCCP-Reset in response to packets with Service Name value 0. Optionsdetect that the retransmitted DCCP-Requestpackets will usually include a "Change(Connection Nonce)" option,applies toinform the server of the client'san existing connectionnonce; seebecause of its Source and Destination Ports. Kohler/Handley/Floyd/Padhye Section6.4. The client MAY send new5.6. [Page 39] INTERNET-DRAFT Expires: April 2004 October 2003 Every valid DCCP-Requestpackets if no response isreceivedafter some timeout.while the server is in the RESPOND state MUST elicit a new DCCP-Response. Eachretransmissionnew DCCP-Response MUST increment the responder's Sequence Number, and possibly # NDP, by one. Theretransmission strategyresponder SHOULDbe similar to that for retransmitting TCP SYNs. A client MAY decide to give up after some number of DCCP-Requests. If so, it MAY sendNOT accept any data accompanying aDCCP-Reset packet toretransmitted DCCP-Request. In particular, theserver,DCCP-Response sent in reply toclean up statea retransmitted DCCP-Request with data SHOULD contain a Data Dropped option, incase one or more ofwhich theRequests actually arrived.retransmitted DCCP-Request is reported as "data dropped due to protocol constraints" (Drop Code 0). TheDCCP-Resetoriginal DCCP-Request SHOULDhave Reason set to "Closed". 5.5. DCCP-Response Packet Format In the second phase of the three-way handshake, the server sends a DCCP-Response message to the client. In this phase, a server will often specifyalso be reported in theoptions it would like to use,Data Dropped option, eitherfrom among thosein a Normal Block (if theclient requested,responder accepted the data, or there was no data), or inaddition to those. Among these options isa Drop Code 0 Drop Block (if thecongestion control mechanismresponder refused theserver expectsdata the first time as well). 5.7. DCCP-Data, DCCP-Ack, and DCCP-DataAck Packet Formats The payload of a DCCP connection is sent in DCCP-Data and DCCP- DataAck packets, and DCCP-Ack packets are used for acknowledgements when there is no payload touse.be sent. DCCP-Data packets look like this: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / with Type=2 (DCCP-Data) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DCCP-Ack packets dispense with the data, but contain an acknowledgement number: Kohler/Handley/Floyd/Padhye Section5.5.5.7. [Page22]40] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header(12 bytes) / / with Type=1 (DCCP-Response) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Acknowledgement Number: 24 bits The Acknowledgement Number field, which appears in several packet types, acknowledges the greatest valid sequence number received so far on this connection. ("Greatest" is, of course, measured in circular sequence space.) In the case of a DCCP- Response packet, the acknowledgement number field will equal the sequence number from the DCCP-Request. Acknowledgement numbers make no attempt to provide precise information about which packets have arrived; options such as the Ack Vector do this. The Acknowledgement Number MUST correspond to a "received" packet, where a packet is classified as "received" if and only if its options were processed by the receiving DCCP. (This means, for example, that received packets' header checksums must have been valid.) Even "received" packets may have their payloads dropped, due to receive buffer overflow or payload corruption, for instance. The receiver will send Data Dropped options when this happens (see Section 8.7); the sender will reduce its sending rate or congestion window as appropriate. This issue is discussed further in Sections 8.5 and 8.7. Reserved: 8 bits The version of DCCP specified here SHOULD set this field to all zeroes on generated packets, and ignore its value on received packets. Options The Data Dropped and Init Cookie options are particularly useful for DCCP-Response packets (Sections 8.7 and 6.5). In addition, DCCP-Response,(12 or 16 bytes) / / with Type=3 (DCCP-Ack) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DCCP-DataAck packets contain both data and an acknowledgement number: acknowledgement information is piggybacked on a data packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 orearly16 bytes) / / with Type=4 (DCCP-DataAck) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A DCCP-Data orDCCP-Ack packets, may include "Confirm(Connection Nonce)" and "Change(Connection Nonce)" options, to negotiate connection nonces (Section 6.4), as well as options to negotiate CCIDs and other relevant Kohler/Handley/Floyd/Padhye Section 5.5. [Page 23] INTERNET-DRAFT Expires: December 2003 June 2003 features. The receiver MAY respond to a DCCP-Request packet with a DCCP-ResetDCCP-DataAck packetto refuse the connection. Relevant Reset Reasons for refusing a connection include "Connection Refused", whenmay contain no data bytes if theDCCP-Request's Destination Port did not correspond toapplication sends a zero-length datagram. DCCPport open for listening; "Bad Service Name", when the DCCP-Request's Service Name did not correspond to the service name registered with the Destination Port;A sends DCCP-Data and"Too Busy", when the server is currently too busy to respondDCCP-DataAck packets torequests. The server SHOULD limit the rate at which it generates these resets. The receiver SHOULD NOT retransmit DCCP-Response packets; the sender will retransmit the DCCP-Request if necessary. (Note that the "retransmitted" DCCP-Request will have, at least, a different sequence number from the "original" DCCP-Request; the receiver can thus distinguish true retransmissions from network duplicates.) The responder will detect that the retransmitted DCCP-Request appliesDCCP B due toan existing connection because of its Source and Destination Ports. Every valid DCCP-Request received MUST elicit a new DCCP-Response, unlessapplication events on host A. These packets are congestion- controlled by theresponder can guarantee thatCCID for therequestor has received at least one Response already. (For instance, ifA-to-B half-connection. In contrast, DCCP-Ack packets sent by DCCP A are controlled by theresponder has received a valid DCCP-Data orCCID for the B-to-A half-connection. Generally, DCCP A will piggyback acknowledgement information on data packets when acceptable, creating DCCP-DataAck packets. DCCP-Ackpacketpackets are used when there is no data to send from DCCP A to DCCP B, or when therequestor, then it knowscongestion state of thenewly received Request is old, and SHOULDA-to-B CCID will not allow data to beignored.) Each new DCCP-Response MUST increment the responder's Sequence Number,sent. Kohler/Handley/Floyd/Padhye Section 5.7. [Page 41] INTERNET-DRAFT Expires: April 2004 October 2003 DCCP-Ack andpossibly # NDP,DCCP-DataAck packets often include additional acknowledgement options, such as Ack Vector, as required byone. The responder SHOULD NOT accept any data accompanying a retransmitted DCCP-Request. In particular,theDCCP-Response sentcongestion control mechanism inreply to a retransmitted DCCP-Request with data SHOULD contain a Data Dropped option,use. Section 8, below, describes acknowledgements inwhich the retransmitted DCCP-Request is reported as "data dropped due to protocol constraints" (Drop State 0).DCCP. 5.8. DCCP-CloseReq and DCCP-Close Packet Format Theoriginal DCCP-Request SHOULD also be reported in the Data Dropped option, either in a Normal Block (ifDCCP-CloseReq and DCCP-Close packets have theresponder acceptedsame format except for Type. However, only thedata, or there was no data),server can send a DCCP-CloseReq packet. Either client orinserver may send aDrop State 0 Drop Block (if the responder refusedDCCP-Close packet. The receiver of a valid DCCP-Close packet SHOULD respond with a DCCP-Reset packet, with Reason set to "Closed"; thedataendpoint that originally sent thefirst time as well). 5.6. DCCP-Data, DCCP-Ack, and DCCP-DataAck Packet FormatsDCCP-Close will hold Time-Wait state. Thepayloadreceiver of aDCCP connection is sent in DCCP-Data and DCCP- DataAck packets, while DCCP-Ack packets are used for acknowledgements when there is no payloadvalid DCCP-CloseReq packet SHOULD respond with a DCCP-Close packet; that receiving endpoint will expect tobe sent. DCCP-Data packets look like this: Kohler/Handley/Floyd/Padhye Section 5.6. [Page 24] INTERNET-DRAFT Expires: December 2003 June 2003hold Time-Wait state after later receiving a DCCP-Reset. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / withType=2 (DCCP-Data) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DCCP-Ack packets dispense with the data, but contain an acknowledgement number: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5Type=5 or 67 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 bytes) / / with Type=3 (DCCP-Ack)(DCCP-CloseReq or Close) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+DCCP-DataAck5.9. DCCP-Reset Packet Format DCCP-Reset packetscontain both dataunconditionally shut down a connection. Every normal connection ends with a DCCP-Reset, but resets may be sent for other reasons, including bad port numbers, bad option behavior, incorrect ECN Nonce Echoes, and so forth. The reason for a reset is represented by anacknowledgement number: acknowledgement informationeight-bit number, the Reason field, and 24 bits of additional data. The endpoint that receives a valid DCCP-Reset packet will hold Time-Wait state for the connection. The optional DCCP-Reset payload, if present, ispiggybacked onadata packet.human-readable text string, preferably in English and encoded in Unicode UTF-8, that describes the error in more detail. DCCP-Reset packets MUST NOT be generated Kohler/Handley/Floyd/Padhye Section 5.9. [Page 42] INTERNET-DRAFT Expires: April 2004 October 2003 in response to received DCCP-Reset packets. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / withType=4 (DCCP-DataAck)Type=7 (DCCP-Reset) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | (+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) | Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reason | Data 1 | Data 2 | Data 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | error text | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reason: 8 bits The Reason field represents the reason that the sender reset the DCCP connection. Data 1, Data 2, and Data 3: 8 bits each The Data fields provide additional information about why the sender reset the DCCP connection. The meanings of these fields depend on the value of Reason. The following Reasons are currently defined. The "Data" columns describe what the Data fields should contain for a given Reason. In those columns, N/A means the Data field SHOULD be set to 0 by the sender of the DCCP-Reset, and ignored by its receiver. Kohler/Handley/Floyd/Padhye Section 5.9. [Page 43] INTERNET-DRAFT Expires: April 2004 October 2003 Section Reason Name Data 1 Data 2 Data 3 Reference ------ ---- ------ ------ ------ --------- 0 Unspecified N/A N/A N/A 1 Closed N/A N/A N/A 3.2 2 Invalid Packet packet N/A N/A 5.4 type 3 Option Error option option data number (if any) 4 Feature Error feature feature data number (if any) 5 Connection Refused N/A N/A N/A 5.6 6 Bad Service Code N/A N/A N/A 5.5 7 Too Busy N/A N/A N/A 5.6 8 Bad Init Cookie N/A N/A N/A 6.6 10 Unanswered Challenge N/A N/A N/A 6.5.4 11 Fruitless Negotiation feature feature data| |6.4.8 number (optional) 12 Aggression Penalty N/A N/A N/A 9.2 13 No Connection N/A N/A N/A 5.4 14 Aborted N/A N/A N/A 5.4 15 Extended Seqnos N/A N/A N/A 5.3 16 Mandatory Failure option option data 6.3 number (if any) 17-127 Reserved 128-255 CCID-specific reasons ...| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DCCP-Ack and DCCP-DataAck packets often include additional acknowledgement options, such as Ack Vector, as required by the Kohler/Handley/Floyd/Padhye Section 5.6. [Page 25] INTERNET-DRAFT Expires: December 2003 June 2003 congestion control mechanism in use. DCCPvariable ... 7.4 Asends DCCP-Data and DCCP-DataAck packets to DCCP B due to application events on host A. These packets are congestion- controlled by the CCID for the A-to-B half-connection. In contrast, DCCP-Ack packets sent byDCCP-Reset packet completes every DCCPA are controlled by the CCID forconnection, whether theB-to-A half-connection. Generally, DCCP A will piggyback acknowledgement information on data packets when acceptable, creating DCCP-DataAck packets. DCCP-Ack packets are used when theretermination isno dataclean (due tosend fromapplication close; Reset Reason "Closed") or unclean. Unlike TCP, which has two distinct termination mechanisms (FIN and RST), DCCPAends all connections in a uniform manner. This is justified because some responses toDCCP B, or whenconnection termination close are thelink from A to B is so congested that sending data would be inappropriate. Section 8, below, describes acknowledgements in DCCP. A DCCP-Data or DCCP-DataAck packet may containsame nodata bytes ifmatter whether termination was clean. For instance, theapplication sendsendpoint that receives azero-length datagram. 5.7. DCCP-CloseReq and DCCP-Close Packet Format The DCCP-CloseReqvalid DCCP-Reset should hold Time-Wait state for the connection. Processors that must distinguish between clean andDCCP-Close packets haveunclean termination can examine thesame format. However, onlyReset Reason. DCCP implementations MUST transition to theserver can send a DCCP-CloseReq packet. Either client or server may sendCLOSED state after sending aDCCP-CloseDCCP-Reset packet. 5.10. DCCP-Move Packet Format ThereceiverDCCP-Move packet type is part of DCCP's support for multihoming and mobility, which is described further in Section 10. DCCP A sends avalid DCCP-CloseDCCP-Move packetSHOULD respond with a DCCP-Reset packet, with Reason setto"Closed"; the endpoint that originally sent the DCCP-Close will hold Time-Wait state.DCCP B after changing its address and/or port number. Thereceiver of a valid DCCP- CloseReqDCCP-Move packetSHOULD respond with a DCCP-Close packet;requests thatreceiving endpoint will expect to hold Time-Wait state after later receiving a DCCP-Reset. See the state diagram in 5.3 for more information. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / GenericDCCPHeader (12 bytes) / / with Type=5 or 6 (DCCP-Close or CloseReq) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+B start sending Kohler/Handley/Floyd/Padhye Section5.7.5.10. [Page26]44] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 20035.8. DCCP-Reset Packet Format DCCP-Resetpacketsunconditionally shut down a connection. Every normal connection ends with a DCCP-Reset, but resets may be sent for other reasons, including badto the new address and portnumbers, bad option behavior, incorrect ECN Nonce Echoes,number. The new address and port come from the packet's network header andso forth. The reason for a reset is represented by an eight-bit number,generic DCCP header; theReason field,old address and24 bits of additional data.port are defined through a Mobility ID, which must have been set earlier via a Mobility ID feature. Theendpoint that receivesMobility ID and avalid DCCP-Reset packet will hold Time-Wait statemandatory Identification option provide some protection against hijacked connections. See Section 10 forthe connection.more on security and DCCP's mobility support. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / withType=7 (DCCP-Reset)Type=8 (DCCP-Move) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+(+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) |ReasonReserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Data 1Mobility ID (high bits) |Data 2+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Data 3Mobility ID (low bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |OptionsOptions, including Identification / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Reason: 8| data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Mobility ID: 64 bits TheReason field representsvalue of thereason thatsender's Mobility ID feature. This value uniquely identifies thesender resetcurrent connection among theDCCP connection. Data 1, Data 2, and Data 3: 8 bits each The Data fields provide additional information about whyset of connections terminating at thesender resetreceiver; it MUST have been set by the receiver in an earlier exchange. Options Every DCCP-Move packet MUST include a valid Identification option (see Section 6.5). DCCPconnection. The meanings of these fields depend on the value of Reason. The following Reasons are currently defined. The "Data" columns describe whatB MUST ignore theData fields should containDCCP-Move if it has no record fora given Reason. In those columns, N/A meanstheData fieldpacket's Mobility ID; if the Identification option is not present or invalid; if the Sequence Number is not greater than GSR; or if the Acknowledgement Number is greater than GSS. DCCP B SHOULDbe setNOT respond to0 byinvalid Moves with DCCP-Reset or DCCP-Ack packets, since any such response would leak information about thesender ofconnection, such as theDCCP-Reset, and ignored by its receiver.current sequence number, to a possibly malicious host. After receiving an invalid DCCP-Move, DCCP B MAY ignore subsequent DCCP- Move packets, valid or not, for a short period of time, such as one Kohler/Handley/Floyd/Padhye Section5.8.5.10. [Page27]45] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003Section Reason Name Data 1 Data 2 Data 3 Reference ------ ---- ------ ------ ------ --------- 0 Unspecified N/A N/A N/A 1 Closed N/A N/A N/A 4 2 Invalid Packet packet N/A N/A 5.3 type 3 Option Error option option data number (if any) 4 Feature Error feature feature data number (if any) 5 Connection Refused N/A N/A N/A 5.5 6 Bad Service Name N/A N/A N/A 5.4 7 Too Busy N/A N/A N/A 5.5 8 Bad Init Cookie N/A N/A N/A 6.5 10 Unanswered Challenge N/A N/A N/A 6.4.4 11 Fruitless Negotiation feature feature data 6.3.7 number (optional) 12 Aggression Penalty N/A N/A N/A 9.2 A DCCP-Reset packet finishes off every terminatedsecond or one round-trip time. This protects DCCP B against denial- of-service attacks from floods of invalid DCCP-Moves. DCCP-Move packets do not follow the usual sequence-validity rules. This is to support endpoints that react to long bursts of loss by moving. Such moves will often happen after the endpoints get out of sync, causing DCCP-Move packets to frequently have inappropriate Sequence Numbers. But the usual DCCP-Sync mechanism is inappropriate in response to Moves, since it could leak sequence numbers to possibly malicious hosts. DCCP B MUST set its GSR variable to the Sequence Number on a valid DCCP-Move. DCCPconnection, whether clean (dueB SHOULD acknowledge valid DCCP-Move packets with DCCP-Ack or DCCP-DataAck packets. If DCCP B accepts the move, it MUST send this acknowledgement toapplication closethe packet's network source address andusual connection termination; Reset Reason "Closed") or unclean. This differs from TCP,DCCP Source Port; if it rejects the move, whichhas two distinct termination mechanisms, FIN and RST. Some responsesit MAY do for any reason, it MUST send this acknowledgement toconnection close must bethesame,old address and old port. The moving endpoint, DCCP A, can determine whether or not its move was accepted by checking theconnection terminated cleanly: for instance, the endpoint that receives a valid DCCP-Reset should hold Time-Wait state for the connection. Processors that must distinguish between cleanacknowledgement's destination address andunclean termination can examinePort. If theReset Reason. 5.9. DCCP-Move Packet Format The DCCP-Move packet type is part of DCCP's support for multihoming and mobility, whichacknowledgement isdescribed further in Section 10.lost, DCCP Asends a DCCP-Move packet to DCCP B after changing its address and/or port number. Themight resend the DCCP-Move packetrequests that(using a new sequence number). DCCP Bstart sending packets towill detect this case because thenewnetwork source address andport number. The old addressSource Port correspond to a valid connection, for which the Sequence Number andportAcknowledgement Number fields arestored explicitly in the DCCP-Move header;appropriate; thenew addressIdentification option is valid for that connection; andport comethe Mobility ID refers to that connection. It SHOULD respond by sending another acknowledgement, as allowed by the congestion control mechanism in use. Once DCCP B receives a non-Move packet from DCCP A, it MUST choose a new Mobility ID for thepacket's network headerconnection andgenericsend a new Change R(Mobility ID) option to DCCPheader. The old address's type is indicated explicitlyA. This reduces the risk of replay. We note that DCCP mobility, as provided byan Old Address Family field. The Sequence Number and Acknowledgement Number fields and aDCCP-Move, may not be useful in the context of IPv6, with its mandatoryIdentification option provide some protection against hijacked connections. See Section 10support formore on securityMobile IP. 5.11. DCCP-Sync Packet Format DCCP-Sync packets are sent when the sequence numbers of the endpoints of a connection appear to have gotten out of sync. On receiving a valid DCCP-Sync packet, DCCP will update its GSR variable, thus restoring synchronization, andDCCP's mobility support.possibly send another DCCP-Sync packet to acknowledge the synchronization. DCCP-Sync packets look like this: Kohler/Handley/Floyd/Padhye Section5.9.5.11. [Page28]46] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Generic DCCP Header (12 or 16 bytes) / / withType=8 (DCCP-Move)Type=9 (DCCP-Sync) / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Acknowledgement Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Old Address Family | Old Port(+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+)Iff (| Acknowledgement Number (low bits) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / Old Address / / / [padding] /Reserved |)X=1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Options, including IdentificationOptions / [padding] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| data | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Old Address Family: 16 bits The Old Address Family field indicates the address family formerly used for this connection, and takes values from the Address Family Numbers registry administered by IANA. Particular values include 1 for IPv46. Options and2 for IPv6. An endpoint MUST discard DCCP-Move packets with unrecognized Old Address Family values. Old Port: 16 bits The former port number used byFeatures All DCCPA's endpoint. Old Address:packets may contain options, which occupy space atleast 32the end of the DCCP header. Each option is a multiple of 8 bits in length. Theformer address used by DCCP A's endpoint, padded on the rightcombination of all options MUST add up to a multiple of 32 bits. Individual options are not padded to multiples of 32 bits, however; any option may begin on any byte boundary. All options are always included in the checksum. Theform and sizefirst byte of an option is theaddressoption type. Options with types 0 through 31 aredeterminedsingle-byte options. Other options are followed by a byte indicating theOld Address Family field. For instance, if Old Address Family is 1, then Old Address contains an IPv4 addressoption's length. This length value includes the two bytes of option-type andtakes 32 bits; if it is 2, then Old Address contains an IPv6 addressoption-length as well as any option-data bytes, andtakes 128 bits.MUST therefore be greater than or equal to two. OptionsEvery DCCP-Moveare processed sequentially, starting at the earliest option in the packetMUST includeheader. The following options are currently defined: Kohler/Handley/Floyd/Padhye Section 6. [Page 47] INTERNET-DRAFT Expires: April 2004 October 2003 Option Section Type Length Meaning Reference ---- ------ ------- --------- 0 1 Padding 6.1 1 1 Mandatory 6.3 2 1 Slow Receiver 8.6 32 variable Ignored 6.2 33 variable Change L 6.4 34 variable Confirm L 6.4 35 variable Change R 6.4 36 variable Confirm R 6.4 37 variable Init Cookie 6.6 38 variable Ack Vector [Nonce 0] 8.5 39 variable Ack Vector [Nonce 1] 8.5 40 variable Data Dropped 8.7 41 6 Timestamp 6.7 42 6-10 Timestamp Echo 6.9 43 variable Identification 6.5.3 44 variable Challenge 6.5.4 45 4 Payload Checksum 8.8 46 4-6 Elapsed Time 6.8 128-255 variable CCID-specific options 7.4 6.1. Padding Option The Padding option, with type 0, is avalid Identificationsingle byte option(see Section 6.4). DCCP B SHOULD ignoreused to pad between or after options. It either ensures theDCCP-Move if anypayload begins on a 32-bit boundary (as required), or ensures alignment ofthefollowingconditions holds: (1) Neither the Old Address/Old Port combination nor the network address/Source Port combination refers to a currently active Kohler/Handley/Floyd/Padhye Section 5.9. [Page 29] INTERNET-DRAFT Expires: December 2003 June 2003 DCCP connection. (2)options (not mandatory). +--------+ |00000000| +--------+ Type=0 6.2. Ignored Option TheIdentification option is not present or invalid. (3) DCCP B does not support mobility, or its Mobility Capable feature is off. DCCP B SHOULD NOT respond to such invalid MovesIgnored option, withDCCP-Reset packets, since any such resets would leak information about the connection, such as the current sequence number, totype 32, signals that apossibly malicious host. After receiving such an invalid DCCP-Move,DCCPB MAY ignore subsequent DCCP-Move packets, valid or not,did not understand some option. This can happen, fora short period of time, such as one second orexample, when oneround-trip time. This protects DCCP B against denial-of-service attacks from floods of invalid DCCP-Moves.DCCPB SHOULD acknowledge valid DCCP-Move packetsconverses withDCCP-Ack or DCCP-DataAck packets. If DCCP B accepts the move, it MUST send this acknowledgement to the packet's network source address and DCCP Source Port; if it rejects the move, which it MAY do for any reason, it MUST send this acknowledgement to the Old Address and Old Port. The moving endpoint, DCCP A, can determine whether or not its move was accepted by checkinganother, extended DCCP. Each Ignored option has one or more bytes of data. The first byte contains theacknowledgement's destination addressoffending option type; the second andPort.subsequent, if present, contain the first bytes of the offending option's data. If theacknowledgement is lost, DCCP A might resendoffending option had data, the Ignored option MUST include at least one byte of that data, but the Ignored option MUST NOT carry more Opt Data than the offending option had data. Kohler/Handley/Floyd/Padhye Section 6.2. [Page 48] INTERNET-DRAFT Expires: April 2004 October 2003 Ignored options should preferably concern options sent on theDCCP-Movepacket(using a new sequence number). DCCP B will detect this case becauseacknowledged by thenetwork source addressAcknowledgement Number. Packets without Acknowledgement Numbers (that is, DCCP-Request andSource Port correspond toDCCP-Data) SHOULD NOT carry Ignored options. +--------+--------+--------+ |00100000|00000011|Opt Type| +--------+--------+--------+ Type=32 Length=3 +--------+--------+--------+--------+-------- |00100000| Length |Opt Type| Opt Data ... +--------+--------+--------+--------+-------- Type=32 6.3. Mandatory Option The Mandatory option, with type 1, is avalid connection, for which the Sequence Number and Acknowledgement Number fields are valid;single byte option that indicates that theIdentificationimmediately following option isvalid formandatory. If the receiving DCCP does not understand thatconnection; andfollowing option, it MUST reset theOld Address and Old Port no longer referconnection with Reset Reason set toa valid"Mandatory Failure". For instance, say DCCPconnection. It SHOULD respond by sendingA receives a packet with two options: a Mandatory option, and immediately following, anotheracknowledgement, as allowed by the congestion control mechanism in use. We note thatoption O. Then DCCPmobility, as provided by DCCP-Move, mayA would reset the connection (rather than, for example, sending an Ignored(O) option) if it did not understand O's type; if it understood O's type, but notbe useful in the context of IPv6, with its mandatory supportO's data; if O's data was invalid forMobile IP. 6. OptionsO's type; if O was a feature negotiation option, andFeatures AllDCCPpackets may contain options, which occupy space at the end ofA did not understand the enclosed feature number; if DCCPheaderA understood O, but chose not to perform the action O implies; andareso forth. +--------+ |00000001| +--------+ Type=1 6.4. Feature Negotiation DCCP contains amultiple of 8 bitsmechanism for reliably negotiating features, notably the congestion control mechanism inlength. Alluse on each half-connection. The motivation is to implement reliable feature negotiation once, so that different options need not reinvent that wheel. Features arealways included inidentified by feature number and owning endpoint. The notation (F,E) represents thechecksum. An option may begin on any byte boundary.feature with feature number F that is owned by DCCP E. A connection generally has two features for each Kohler/Handley/Floyd/Padhye Section6.6.4. [Page30]49] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003The first byte of an option is the option type. Options with types 0 through 31 are single-byte options. Other options are followed byfeature number, one per endpoint (or, equivalently, one per half- connection). Given abyte indicatingfeature owned by DCCP A, we call DCCP A theoption's length. This length value includesfeature location and DCCP B thetwo bytesfeature remote. Both endpoints keep track ofoption-type and option-length as well as any option- data bytes, and MUST therefore be greater than or equalthe values of all features, since the point of feature negotiation is totwo.ensure agreement. Four options, Change L, Confirm L, Change R, and Confirm R, implement feature negotiation. Thefollowing"L" options arecurrently defined: Option Section Type Length Meaning Reference ---- ------ ------- --------- 0 1 Padding 6.1 2 1 Slow Receiver 8.6 32 variable Ignored 6.2 33 variablesent by the feature location, the "R" options are sent by the feature remote. Change6.3 34 variable Prefer 6.3 35 variable Confirm 6.3 36 variable Init Cookie 6.5 37 variable Ack Vector [Nonce 0] 8.5 38 variable Ack Vector [Nonce 1] 8.5 39 variable Data Dropped 8.7 40 6 Timestamp 6.6 41 6-10 Timestamp Echo 6.8 42 variable Identification 6.4.3 44 variable Challenge 6.4.4 45 4 Payload Checksum 8.8 46 4-6 Elapsed Time 6.7 128-255 variable CCID-specificoptions7.4 6.1. Padding Option The padding option, with type 0, isinitiate asingle byte option used to pad between or after options. It either ensuresnegotiation, Confirm options complete thepayload beginsnegotiation. Change options are retransmitted to ensure reliability. Feature values MUST NOT change apart from feature negotiation. This property, retransmissions, and value priority rules ensure that both endpoints eventually agree on every feature's value. Negotiations for multiple features may take place simultaneously. For instance, a32-bit boundary (as required), or ensures alignment of followingpacket may contain multiple Change options(not mandatory). +--------+ |00000000| +--------+ Type=0 6.2. Ignored Option The Ignored option, with type 32, signalsthat refer to different features. The endpoints may also simultaneously open negotiations for the same feature; they will still agree on aDCCP did not understand some option. This can happen,single value. Feature negotiation generally takes place using packet types that carry no user data, such as DCCP-Ack, particularly when the relevant feature may affect how data will be treated. Here are three example feature negotiations forexample, when a conventionalfeatures located at DCCPconverses with an extended DCCP. Each IgnoredB, the first two for the Congestion Control ID feature, the last for the Ack Ratio: Kohler/Handley/Floyd/Padhye Section6.2.6.4. [Page31]50] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003option has one or more bytes of data. The first byte contains the offending option type; the second and subsequent, if present, contains the first byte of the offending option's data. If the offending option had no data, the Ignored option MAY still supply two bytes of data, with the second byte set to 0. If the offending option had data,DCCP A DCCP B 1. Change R(CCID, 2 3 1) ---> ("2 3 1" is DCCP A's value preference list) 2. <--- Confirm L(CCID, 3, 3 2 1) (3 is theIgnored option MUST include at least one byte ofnegotiated value; "3 2 1" is B's pref list) * agreement thatdata. Ignored options SHOULD be sent only on packets(CCID,B) = 3 * 1. XXX <--- Change L(CCID, 3 2 1) 2. Retransmission: <--- Change L(CCID, 3 2 1) 3. Confirm R(CCID, 3, 2 3 1) ---> * agreement thatcontain Acknowledgement Numbers (that is, DCCP-Reponse, DCCP-Ack, DCCP- DataAck, DCCP-Close, DCCP-CloseReq, DCCP-Reset, and DCCP-Move), and SHOULD concern(CCID,B) = 3 * 1. Change R(Ack Ratio, 3) ---> 2. <--- Confirm L(Ack Ratio, 3) * agreement that (Ack Ratio,B) = 3 * 6.4.1. Value Types The feature negotiation optionssent on the packet acknowledged byare theAcknowledgement Number. +--------+--------+--------+ |00100000|00000011|Opt Type| +--------+--------+--------+ Type=32 Length=3 +--------+--------+--------+--------+--------+ |00100000| Length |Opt Type| Opt Data ... +--------+--------+--------+--------+--------+ Type=32 6.3. Feature Negotiation DCCP contains a mechanismsame forreliably negotiating features, notablyevery feature number, but thecongestion control mechanism in use on each half-connection. The motivation is to implement reliableformat for featurenegotiation once, so that different options need not reinvent that wheel. Three options, Change, Prefer,values, andConfirm, implementthe value priority rules that determine the result of a negotiation, differ from featurenegotiation. Change is sentto feature. All current DCCP features fit one of two value types, non-negotiable ("NN") or server-priority ("SP"), although other value types are possible. o Non-negotiable features: The feature value is afeature's location, asking it to change the feature'sbyte string. Each option contains exactly one feature value. The featurelocation may respond with Prefer, which asksremote changes theother endpoint tovalue by sending Changeagain with different values, or it may change theR options. The feature location has no preferred value for the feature, andacknowledgeMUST accept therequestproposed value (as long as it is valid), responding withConfirm. Feature valuesa Confirm L option containing the new value. Change L and Confirm R options MUST NOTchange apart from feature negotiation, and enforced retransmissions make feature negotiation reliable. This ensures that both endpoints eventually agree on every feature's value. Some features are non-negotiable, meaning that thebe sent for non-negotiable features. o Server-priority features: The featurelocation MUST set itsvalueto whatever the other endpoint requests. For non- negotiable features,is a fixed-length byte string (length determined by the featurelocation MUST respond tonumber). Each Changeoptionsoption contains a prioritized list of values, withConfirm; Preferthe most preferred value coming first. Each Confirm option contains the confirmed value, followed by the confirmer's value preference list. The value priority rule isnot useful. These features useserver priority: Given both preference lists, select the first entry in the server's list that also occurs in the client's list. If there is no shared entry, the connection MUST be reset with Reason set to Fruitless Negotiation. All four option types are meaningful for server- priority features. Kohler/Handley/Floyd/Padhye Section6.3.6.4.1. [Page32]51] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 DCCP endpoints need not calculate their value preference lists before featureframework simply to achieve reliability. Negotiations for multiple features may take place simultaneously. For instance,negotiation begins. Thus, a server might adjust its preference list based on the client's preference list, assuming the client opened the negotiation. Once apacket may contain multiple Change options that refer to different features. Featurenegotiationgenerally takes place using packet typesfor a feature has begun, however, thatcarry no user data, such as DCCP-Ack, particularly whenfeature's preference lists MUST remain stable until therelevant feature may affect how data will be treated. 6.3.1.negotiation has closed. 6.4.2. Feature Numbers The first data byte of everyChange, Prefer,Change or Confirm option is a feature number, defining the type of feature being negotiated. The remainder of the data gives one or more values for the feature, and is interpreted according to the feature. The current set of feature numbers is as follows: Value Initial Section Number MeaningNeg.?Type Value Reference ------ ------- ----- ----- --------- 1 Congestion Control(CC) YID (CCID) SP 2 7 2 ECN CapableYSP 1 9.1 3 Ack RatioNNN 2 8.3 4 Use Ack VectorYSP 0 8.4 5 Mobility CapableYSP 0 10.1 6 Loss WindowN 6.9NN 1000 6.10 7 Connection NonceN 6.4.2NN random 6.5.2 8 Identification RegimeY 6.4.1SP 1 6.5.1 9 Mobility ID NN 0 10.2 128-255CCID-Specific FeaturesCCID-specific features ? ? 7.4The "Neg[otiable]?" column is "Y" for normal features and "N" for non-negotiable features. 6.3.2.6.4.3. Change L Option DCCP A sends a Change L option to DCCP B toask it to change the value of someinitiate a negotiation for a feature located at DCCPB.A. DCCP B SHOULD respond to a Change option for a known feature witheither Prefer or Confirm.a Confirm R option. In special circumstances, such as a Change option whose value is inappropriate for the listed featurenumber or a negotiation that seems to be going on forever,number, DCCP B MAY respond instead by ignoring the Change (with or without sending an Ignored option), or by resetting the connection with Reason set to "Fruitless Negotiation" or "Feature Error". DCCP A SHOULD retransmit the Change L option until it receivessome relevant response---by sendingone of those responses. It could send at least one option per round-trip time, for instance, orby addingit could add the Change L option to every Kth packet. DCCP A MAY reset the connection with Reason set to "Fruitless Negotiation" or "Feature Error" if retransmission fails (no meaningful response is received after 10 attempts or more). The format of the option's data ("Value or Values") depends on the feature's value type. Change L options are invalid for non-negotiable features. Kohler/Handley/Floyd/Padhye Section6.3.2.6.4.3. [Page33]52] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 +--------+--------+--------+--------+--------+-------- |00100001| Length |Feature#| Value or Values ... +--------+--------+--------+--------+--------+-------- Type=33 An example Change L option follows. 33,5,1,2,3 I want toevery Kth packet.change my CC feature (feature number 1, a server- priority feature); my preferred values are 2 and 3, in that preference order. 6.4.4. Confirm L Option DCCP Awill generatesends aChangeConfirm L option to DCCP B in response to a validPrefer option; it may also generate aChange R optiondue to some application event.sent by DCCP B. The Confirm L option will complete the negotiation for a feature located at DCCP A. Confirm L need not be retransmitted, since Change R will be retransmitted as necessary. Again, the format of "Value or Values" depends on the feature's value type. +--------+--------+--------+--------+--------+--------|00100001||00100010| Length |Feature#| Value or Values ... +--------+--------+--------+--------+--------+--------Type=33 6.3.3. PreferType=34 Example Confirm L options follow. 34,6,1,2,2,3 I have changed my CC feature (feature number 1, a server- priority feature) to value 2; my preferred values are 2 and 3, in that preference order. 34,9,7,239,48,2,188 I have changed my Connection Nonce feature (feature number 7, a non-negotiable feature) to the 4-byte string 239,48,2,188. 6.4.5. Change R Option DCCP A sends aPreferChange R option to DCCP B toask it to choose another valueinitiate a negotiation forsomea feature located at DCCP B.DCCP B SHOULD respond to a valid Prefer option with a Change; otherThe possible responsesinclude ignoring the option, sending an Ignored option,to Change R are analogous to those for Change L (Confirm L, Ignored, orresetting the connection, as described above.Reset). As with Change L, DCCP A SHOULD retransmit thePreferChange R option until it receivessome relevanta response,usingor thesame guidelines as Change. DCCP A may generate a Prefer option in response to some Change option,retransmission times out. Again, the format of "Value orin response to some application event. Prefer options are not useful for non-negotiable features.Values" depends on the feature's value type. Kohler/Handley/Floyd/Padhye Section 6.4.5. [Page 53] INTERNET-DRAFT Expires: April 2004 October 2003 +--------+--------+--------+--------+--------+--------|00100010||00100011| Length |Feature#| Value or Values ... +--------+--------+--------+--------+--------+--------Type=34 6.3.4. Confirm Option DCCP A sends a Confirm option to DCCP B to inform it that aType=35 Example Changeoption for someR options follow. 35,5,1,3,2 Please change your CC featurelocated at DCCP A has been accepted. Generally the Confirm option will include the feature's accepted value. For some special features, such as(feature number 1, a server- priority feature); my preferred values are 3 and 2, in that preference order. 35,9,7,239,48,2,188 Change your ConnectionNonce,Nonce feature (feature number 1, a non- negotiable feature) to the 4-byte string 239,48,2,188. 6.4.6. Confirmoption contains no data; these features are identified explicitly.R Option DCCP AMUST generatesends a Confirmoptions onlyR option to DCCP B in response to a valid Changeoptions.L option sent by DCCPA SHOULD NOT retransmitB. The Confirmoptions: DCCP BR option willretransmitcomplete therelevant Changesnegotiation for a feature located at DCCP B. Confirm R need not be retransmitted, since Change L will be retransmitted as necessary.The receiptAgain, the format ofa valid Confirm option ends"Value or Values" depends on thenegotiation over afeature'svalue.value type. +--------+--------+--------+--------+--------+--------|00100011||00100100| Length |Feature#| Value or Values ... +--------+--------+--------+--------+--------+--------Type=35 Kohler/Handley/Floyd/Padhye Section 6.3.4. [Page 34] INTERNET-DRAFT Expires: December 2003 June 2003 6.3.5. Example Negotiations This section demonstrates several negotiations of the congestion control feature for the A-to-B half-connection. (This feature is located at DCCP A.) In this sequence of packets, DCCP A is happy with DCCP B's suggestion of CC mechanism 2: B > A Change(CC, 2) A > B Confirm(CC, 2) Here, A and B jointly settle onType=36 An example Confirm R option follows. 36,6,1,2,3,2 Change your CCmechanism 5: B > A Change(CC, 3, 4) A > B Prefer(CC,feature (feature number 1,2, 5) B > A Change(CC, 5) A > B Confirm(CC, 5) In this sequence, A refuses to use CC mechanism 5. If this sequence continued, one or the other endpoint would eventually abort the connection viaaDCCP-Reset packet with Reason setserver-priority feature) to"Fruitless Negotiation": B > A Change(CC, 3, 4, 5) A > B Prefer(CC, 1, 2) B > A Change(CC, 5) A > B Prefer(CC, 1, 2) Here, A elicits agreement from B that it is satisfied with congestion control mechanism 2: A > B Prefer(CC, 1, 2) B > A Change(CC, 2) A > B Confirm(CC, 2) 6.3.6.2; my preferred values are 3 and 2, in that preference order. 6.4.7. Unknown Features If a DCCP receives a Changeor Preferoption referring to a feature number it does not understand, itMUSTSHOULD respond with an Ignored option. This informs the remote DCCP that the local DCCP does not implement the feature. No other action need be taken. (Ignored may also indicate that the DCCP endpoint could not respond to aCCID- specificCCID-specific feature request because the CCID was in flux; see Section 7.4.) Kohler/Handley/Floyd/Padhye Section6.3.6.6.4.7. [Page35]54] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 20036.3.7.6.4.8. State Diagram These state diagrams present the legal transitions in a DCCP feature negotiation. They define a DCCP's states and transitions with respect to the negotiation of a single feature it understands. There are two diagrams, corresponding to the two endpoints: the featurelocationlocation, DCCP A, andwhat we call the "feature requester", DCCP B. Transitions between states are triggered by receiving a packet ("RECV") or by an application event ("APP"). Received packets are further distinguished by any options relevant to the feature being negotiated. "RECV -" means the packet contained no relevant option. "RECV Chg" denotes a Change option, "RECV Pr" a Prefer option, and "RECV Cfm" a Confirm option. The data contained in an option is given in parentheses when necessary. The "SEND" action indicates which option the DCCP will send next. Finally, the "SET-VALUE" action causestheDCCP to change its value for the relevant feature. "SEND" does not force DCCP to immediately generate a packet; rather, it says whichfeatureoption SHOULD be sent on the next packet generated. Aremote, DCCPMAY choose to generate a packet, such as a DCCP- Ack, in response to some "SEND" action, rather than piggyback on another packet. (In some cases, this mayB. Each endpoint can berequired---if adding an option would bump a packet over the PMTU, for instance.) However, it MUST NOT generate a packet if doing so would violate the congestion control mechanism in use. The requester, DCCP B, has four states: Known, Unknown, Failed, and Changing. Similarly, the feature location, DCCP A, has four states: Known, Unknown, Failed, and Confirming. In both cases, Known denotes a state where the DCCP knows the feature's current value, and believes that the other DCCP agrees. Changing and Confirming denote states where the DCCPs areinthe processone ofnegotiating a new value for the feature.three states, STABLE, CHANGING, and FAILED. TheUnknown state can occur only at connection setup time. It denotes aSTABLE statewhere the DCCP does not know anymeans that a value is known for thefeature,feature andhas not yet entered ano negotiationto determine its value. Finally,is in progress. Every feature starts out in theFailedSTABLE state. The CHANGING staterepresentsmeans that a negotiation started by this endpoint is in progress for the feature. This is the only statewherein which retransmissions happen. Finally, the FAILED state means that the otherDCCPendpoint does notimplementunderstand the featureunder negotiation. Retransmissions ofin question. Transitions between states are triggered by receiving a valid packet containing some valid negotiation option, or by an application or protocol event. Receiving a Changeand Prefer options happen on the "RECV -" arcs fromoption causes theChangingnew feature value to be calculated, and a Confirm option sent. The details of this calculation, andConfirming states. A DCCP may start in eithertheUnknown or Known state, dependingcontents of Confirm, depend on the value type of the feature in question.In particular, some features have a well- known value for new connections, in which case the DCCPs beginEndpoints that receive valid Confirm options can simply trust theconnection invalues they contain, or they could redo theKnown states.feature value calculation; again, this is feature- specific. Kohler/Handley/Floyd/Padhye Section6.3.7.6.4.8. [Page36]55] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003REQUESTERFEATURE LOCATION STATE DIAGRAM (DCCPB) +-----------+ | Unknown | +-----------+ +----------+ | +-----------+ | |RECV - |RECV -/Pr | APP | |RECV Pr/Cfm V |SEND - |SEND Chg V |SEND Chg +-----------+ | | +------------+ | | |----+ +------------>| |-----+ | Known |------------------------------>| Changing |A) rcv Confirm R app/protocol evt : snd Change L : ignore +---------------------------+ +----+ | |RECV Pr|APPv ||-----+ +-----------+ SEND Chgrcv Confirm R v +------------+|RECV - ^ | | ^ |SEND -/Chg | | | | | +------------------------------------------+ | +---------+ RECV Cfm(O) | +----------+ SEND - +--------->| Failed | SET-VALUE O RECV Ign +----------+ SEND - Kohler/Handley/Floyd/Padhye Section 6.3.7. [Page 37] INTERNET-DRAFT Expires: December 2003 June 2003 FEATURE LOCATION STATE DIAGRAM (DCCP A) (O represents any feature: accept valueacceptable to DCCP A; X is not acceptable.) RECV Chg(O) SEND Cfm(O) RECV - | APP SET-VALUE O +-----------+ SEND Pr(O) +--------------------| Unknown |------------+ | +-----------++------------+ | |<-------------------| |+-------+| STABLE |+-----------+| CHANGING |------+ ||RECV - |RECV Chg(X)|<-------------------| | ||RECV Chg(X) V V |SEND - |SEND Pr(O) V V |SEND Pr(O) +-----------++------------+ rcv Change R +------------+ | |+------------+^ : calc new value, |(need not be^ ||----+ +------------>| |-----+ the same O)+-----+ snd Confirm L +-----+ |Known |------------------------------>| Confirmingrcv Change R timeout/rcv non-ack | : calc new value, : snd Change L ||----+ RECV Chgsnd Confirm L |APPrcv Ignored/timeout fails | : snd Reset/ignore/other v +----------+ ||-----+ +-----------+FAILED |SEND Pr(O) +------------+ |RECV - ^ ^+----------+ FEATURE REMOTE STATE DIAGRAM (DCCP B) rcv Confirm L app/protocol evt : snd Change R : ignore +---------------------------+ +----+ | | |^ |SEND -/Pr(O)v | rcv Confirm L v +------------+ : calc new value +------------+ ||RECV Chg(O)|<-------------------| | | STABLE | | CHANGING |------+ | |<-------------------| ||SEND Cfm(O)| +------------+ rcv Change L +------------+ |+---------+| ^ : calc new value, ||SET-VALUE O^ | +-----+ snd Confirm R +-----+ | rcv Change L timeout/rcv non-ack |+-------+: calc new value, : snd Change R | snd Confirm R | rcv Ignored/timeout fails | : snd Reset/ignore/other v +----------++---------------------------------------------+ +-------->| Failed | RECV Chg(O) RECV Ign| FAILED | +----------+SEND Cfm(O) SEND - SET-VALUE O This specification allows several choices of action in certain states. The implementation will generally use feature-specific information to decide howDCCP implementations MUST sanity-check options' data as appropriate for the feature before acting according torespond.the diagram. For Kohler/Handley/Floyd/Padhye Section 6.4.8. [Page 56] INTERNET-DRAFT Expires: April 2004 October 2003 example, Ack Ratio takes two-byte, non-zero integer values, so a "Confirm(Ack Ratio, 0)" option is never valid. Server-priority features can tolerate some unknown values in the priority list, as long as the selected value is understood. Invalid options SHOULD cause a transition to the FAILED state, with an appropriate accompanying action, such as sending a reset with Reason set to "Feature Error". The "snd" actions request the sending of a negotiation option. They do not force DCCP to immediately generate a packet; rather, they say which feature option SHOULD be sent on the next packet generated. A DCCP MAY choose to generate a packet, such as a DCCP-Ack, inthe Known stateresponse to some "snd" action, rather than piggyback on another packet. In some cases, this mayrespondbe required---if adding an option would bump a packet over the PMTU, for instance. However, it MUST NOT generate a packet if doing so would violate the congestion control mechanism in use. Retransmissions of Change options happen according to an exponential-backoff timer, and/or when the CHANGING DCCP realizes that the packet containing a Change optionwith either Confirmwas not received. A Change option MAY additionally be piggybacked on other packets sent during the negotiation. After too many timer backoff events, orPrefer. If DCCP Awhen an explicit Ignored option iswilling to setreceived, thefeatureCHANGING DCCP MUST transition to thevalue specified by Change, it will generally send Confirm; but if it would likeFAILED state, as shown. The CHANGING DCCP MUST NOT transition tonegotiate further, it will send Prefer.the FAILED state simply because the other DCCPB retransmitsseems to be ignoring its Changeoptions, and DCCP A retransmits Prefer options, until receiving a relevant response. However, they need not retransmit the option on every packet, as shownoptions (for example, by acknowledging the"RECV - / SEND -" transitions inpacket containing theChanging and Confirming states. These state diagrams guarantee safety,options, but notliveness. Namely,including a Confirm); reordering can cause this behavior even if the endpoint understands the options. The timeout value might initially be set to a small multiple of round-trip times (or 0.2 seconds, if nounexpected or erroneousRTT is available). Backoff should be pinned at roughly 32 RTTs; timer failure should occur after at least 12 retransmissions. Feature negotiation optionswillfor a given feature MUST besent, butprocessed in increasing order by Sequence Number. Say that the last processed negotiation option for a feature (F,X) came on a packet with sequence number S. Then any negotiationmight not terminate. For example, the following infiniteoptions on received packets with Sequence Number less than or equal to S MUST be ignored. This requirement MAY be implemented per-feature, or implementations MAY compare against a single Sequence Number---the most recent negotiationis legal accordingoption processed for any feature. Feature negotiation options on safely reordered packets (with last-negotiation-seqno < S < GSR) SHOULD be accepted, tothis specification.provide some robustness against reordering. Simultaneous negotiation problems can arise if value preferences change too frequently, particularly for server-priority features. A Kohler/Handley/Floyd/Padhye Section6.3.7.6.4.8. [Page38]57] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003A > B Prefer(1) B > A Change(2) A > B Prefer(1) B > A Change(2)... Implementations MAY choose to enforce a maximum length on any negotiation---for example, by resettingDCCP endpoint MUST NOT change its value preferences while in theconnection whenCHANGING state: it MUST instead complete anynegotiation lasts more thanextant negotiation, then open a new one. If the result of somemaximum time. Thefeature negotiation is that a feature has an unacceptable value---for example, for a server-priority feature, none of the client's choices were acceptable to the server, and the prior value is unacceptable to the client---a DCCP endpoint MAY reset the connection, with DCCP-Reset Reason set to "FruitlessNegotiation" SHOULD be usedNegotiation". The CHANGING state signals that the relevant feature's value is in flux. DCCP MAY change its behavior when certain features are CHANGING---for example, by refusing tosignalsend data until reentering STABLE. 6.4.9. Streamlined Negotiation This section provides guidance for implementations thatado not wish to implement full feature negotiation, although general-purpose DCCP implementations SHOULD implement negotiation fully. Minimal DCCP implementations, such as those for embedded devices, might force all negotiation to take place on the first packet exchange. The DCCP-Request would contain Change R options for all server-located features, and Change L options for all client-located features; the DCCP-Response would Confirm each of these requests, or reset the connection if any Change wasaborted because of aunexpected or unacceptable. Changes for CCID-specific features MUST follow Changes for the Congestion Control ID feature in the option list, since options are processed in order. Once the connection is set up, minimal implementations might respond to all feature negotiation options with Ignored, except thattook too long. Ineven minimal implementations SHOULD support "Change R(Ack Ratio)" and "Confirm L(Ack Ratio)". Even general-purpose implementations might refuse to renegotiate theChanging and Confirming states,Congestion Control ID feature in thevaluemiddle of thecorresponding feature is in flux. DCCP MAY change its behavior in these states---for example,connection, byrefusingresponding tosend data until reentering a Known state. 6.4."Change(CCID)" options with Ignored. 6.5. Identification Options The Identification options provide a way for DCCP endpoints to confirm each others' identities, even after changes of address (Section 10) or long bursts of loss that get the endpoints out of sync (Section 5.2). Again, DCCP as specified here does not provide cryptographic security guarantees, and attackers that can see every packet are still capable of manipulating DCCP connections inappropriately, but the Identification options make it more Kohler/Handley/Floyd/Padhye Section 6.5. [Page 58] INTERNET-DRAFT Expires: April 2004 October 2003 difficult for some kinds of attacks to succeed. The Identification option is used to prove an endpoint's identity, while a Challenge option elicits an Identification from the other endpoint. An Identification Regime determines how the Identifications are calculated. In the default MD5 Regime, the calculation involves an MD5 hash over packet data and two Connection Nonces, either exchanged at the beginning of the connection or implicitly agreed upon.6.4.1.6.5.1. Identification Regime Feature Identification Regime has feature number 8. The ID Regime feature located at DCCP B specifies the algorithm that DCCPAB will use for its Identification options, and that DCCP A will use for its Challenge options. Each endpoint must keep track of both its ID regime and, via the ID Regime feature, the regime used by the other endpoint. ID Regime is a server-priority feature. The value of ID Regime is a two-byte number, soavalidConfirm(IDConfirm and Change(ID Regime)option takes exactly fouroptions take at least five bytes. Changeor Preferoptions MAY list multiple ID Regimes in descending order of preference. This document defines two ID Regimes: ID RegimedefaultsMeaning --------- ------- 0 Null Regime 1 MD5 Regime (default) In the Null Regime, every Identification or Challenge option is invalid. The Null Regime makes it impossible for endpoints to0,get back into sync after bursts of loss larger than two-thirds of the Loss Window (Section In the MD5Regime.Regime, which is the default, valid Identification and Challenge options contain an MD5 hash of the Connection Nonce feature values with some packet data. Applications preferringKohler/Handley/Floyd/Padhye Section 6.4.1. [Page 39] INTERNET-DRAFT Expires: December 2003 June 2003different security guarantees, particularly around mobility issues, may prefer to implement another identification algorithm andassign itallocate adifferent ID Regime value. Thenew ID Regimefeature is negotiable, so an endpoint can request that the other endpoint use a particular ID Regime, or one of a set of Regimes, by sending a Prefer option.value for it. If the endpoints cannot agree on mutually acceptable ID Regimes, the connection SHOULD be reset due to "Fruitless Negotiation".6.4.2.6.5.2. Connection Nonce Feature Connection Nonce has feature number 7. The Connection Nonce feature located at DCCP B is the value of DCCP A's connectionnonce.nonce, a value used by Identification Regime 1. Each endpoint SHOULD keep track ofbothKohler/Handley/Floyd/Padhye Section 6.5.2. [Page 59] INTERNET-DRAFT Expires: April 2004 October 2003 its own nonceandand, via the Connection Nonce feature, the other endpoint's nonce. ConnectionNonces are used by Identification Regime 0.Nonce is a non-negotiable feature. The Connection Nonce feature takes arbitrary values of at least4 bytes long. A Change(Connection Nonce) option therefore takes at least 6 bytes. Confirm(Connection Nonce) options MUST NOT contain the relevant value, so a4 bytes long. A Change or Confirm(Connection Nonce) option therefore takesexactly 2at least 7 bytes. Connection Nonce defaults to a random 8-byte string. To prevent spoofing, this string MUST NOT have any trivially predictable value. For example, it MUST NOT be set deterministically to zero, and it SHOULD change on every connection. DCCP endpoints MAY, however, exchange Connection Nonces via some mechanism other than the plaintext, snoopable Connection Nonce option.This featureFor example, two DCCPs might exchange nonces over a secure channel; or, assuming neither endpoint isnon-negotiable. 6.4.3.behind a network address translator, they might encrypt the source and destination ports with a shared secret key. 6.5.3. Identification Option The Identification option serves as confirmation that a packet was sent by an endpoint involved in the initiation of the DCCP connection. It is permitted in any DCCP packet, but it might not be useful until the endpoints have exchanged security information such as connection nonces. The option takes the following form: +--------+--------+--------+--------+--------+--------|00101010||00101011| Length | Identification Data ... +--------+--------+--------+--------+--------+--------Type=42 Kohler/Handley/Floyd/Padhye Section 6.4.3. [Page 40] INTERNET-DRAFT Expires: December 2003 June 2003Type=43 The particular data included in an Identification option sent by DCCP A depends on the ID Regime in force for the A-to-B sequence, which is the value of the ID Regime feature located at DCCP B. The remainder of this section describes ID Regime0,1, the default MD5 Regime. The Identification data provided for the MD5 Regime consists of a 16-byte MD5 digest of: thesecond and fourth32-bit words in thegenericDCCPheader, includingheader that include the Sequence and AcknowledgementNumbers;Numbers (this will be words 3-4 or 3-6, depending on whether sequence numbers are extended); the value of the sender's Connection Nonce; and the value of the other endpoint's Connection Nonce, in that order. The total length of the option is therefore 18 bytes, and the option may only be provided on packets that contain Acknowledgement Numbers, such as DCCP-Ack. Inclusion of the two Connection Nonces ensures that attackers cannot fake an Identification Option, unless they snooped on the beginning of the connection when nonces are exchanged. (No mechanism protects Kohler/Handley/Floyd/Padhye Section 6.5.3. [Page 60] INTERNET-DRAFT Expires: April 2004 October 2003 against snoopers who know Connection Nonces, since DCCP as specified here does not provide strong cryptographic security guarantees; see Section 16.) Inclusion of the Sequence and Acknowledgement Numbers protects against replay attacks within the connection. To check an Identification option's value, the receiver simply calculates the MD5 digest itself and compares that against the option data. The MD5 calculation can be expensive, so an attacker could conceivably disable a DCCP endpoint by sending it a flood of invalid packets with bad Identification options. Rate limits described in Sections 5.2 and 10 mitigate this issue. The receiver MAY ignore an Identification option if it occurs on a packet that would otherwise be considered valid. Example C code for constructing the option's value before transmitting a packet follows.Kohler/Handley/Floyd/Padhye Section 6.4.3. [Page 41] INTERNET-DRAFT Expires: December 2003 June 2003unsigned char *packet_data; int packet_length; int id_option_offset; /* offset of option in packet_data */ const unsigned char *my_nonce, *other_nonce; int my_nonce_length, other_nonce_length; MD5_CTX md5_context; MD5_Init(&md5_context); MD5_Update(&md5_context, packet_data +4, 4); MD5_Update(&md5_context, packet_data + 12, 4);8, 8); /* assuming 24-bit sequence numbers */ MD5_Update(&md5_context, my_nonce, my_nonce_length); MD5_Update(&md5_context, other_nonce, other_nonce_length); packet_data[id_option_offset] = 42; /* option value */ packet_data[id_option_offset+1] = 18; /* option length */ MD5_Final(packet_data + id_option_offset + 2, &md5_context);6.4.4.6.5.4. Challenge Option This option informs the receiving DCCP that one of its packets was ignored, and that succeeding packets will be ignored until the endpoint sends a correct Identification option. The receiving DCCP SHOULD include an Identification option on the next packet it sends. The option takes the following form: Kohler/Handley/Floyd/Padhye Section 6.5.4. [Page 61] INTERNET-DRAFT Expires: April 2004 October 2003 +--------+--------+--------+--------+--------+-------- |00101100| Length | Identification Data ... +--------+--------+--------+--------+--------+-------- Type=44 The Identification Data sent with a Challenge option depends on the active Identification Regime. For the default MD5 Regime (Regime 1), the Identification Data on a packet sent by DCCP B is the same as that for an Identification option sent by DCCP B. The receiver SHOULD ignore a Challenge option, and the packet the Challenge option contains, if the Identification Data is incorrect. The purpose of this mechanism is to prevent denial-of-service attacks where an attacker could cause the receiver to send many packets with expensive-to-compute Identification options, since the receiver MAY ignore Challenge options for some time after receiving an invalid Challenge. If, after several Challenge options, a DCCP is unable to elicit a valid Identification from its partner, it MAY reset the connection with Reason "Unanswered Challenge".Kohler/Handley/Floyd/Padhye Section 6.4.4. [Page 42] INTERNET-DRAFT Expires: December 2003 June 2003 6.5.6.6. Init Cookie Option This option is permitted in DCCP-Response, DCCP-Data, DCCP-Ack, andDCCP- DataAckDCCP-DataAck messages. Theoption MAY be returned by theserver MAY include an Init Cookie option inaits DCCP-Response. If so, then the client MUST echo the same Init Cookie option inits ensuing DCCP-Dataeach succeeding DCCP packet until one of those packets is acknowledged orDCCP-DataAck message.the connection is reset. The server SHOULD design its Init Cookie format so that Init Cookies can be checked for tampering; it SHOULD respond to aninvalidtampered Init Cookie option by resetting the connection with Reason set to "Bad Init Cookie". The purpose of this option is to allow a DCCP server to avoid having to hold any state until the three-way connection setup handshake has completed. The server wraps up the servicename,code, server port, and any options it cares about from both the DCCP-Request and DCCP- Response in an opaque cookie. Typically the cookie will be encrypted using a secret known only to the server and include a cryptographic checksum or magic value so that correct decryption can be verified. When the server receives the cookie back in the response, it can decrypt the cookie and instantiate all the state it avoided keeping. The precise implementation of the Init Cookie does not need to be specified here; since Init Cookies are opaque to the client, there are no interoperability concerns. Kohler/Handley/Floyd/Padhye Section 6.6. [Page 62] INTERNET-DRAFT Expires: April 2004 October 2003 Init Cookies are limited to at most 253 bytes in length. +--------+--------+--------+--------+--------+--------|00100100||00100101| Length | Init Cookie Value ... +--------+--------+--------+--------+--------+--------Type=36 6.6.Type=37 6.7. Timestamp Option This option is permitted in any DCCP packet. The length of the option is 6 bytes. +--------+--------+--------+--------+--------+--------+|00101000|00000110||00101001|00000110| Timestamp Value | +--------+--------+--------+--------+--------+--------+Type=40Type=41 Length=6 The four bytes of option data carry the timestamp of this packet in some undetermined form. A DCCP receiving a Timestamp option SHOULD respond with a Timestamp Echo option on the next packet it sends.6.7.6.8. Elapsed Time Option This option is permitted in any DCCP packet that contains an Acknowledgement Number. It indicates how much time, in tenths of milliseconds,Kohler/Handley/Floyd/Padhye Section 6.7. [Page 43] INTERNET-DRAFT Expires: December 2003 June 2003has elapsed since the packet being acknowledged---the packet with the given Acknowledgement Number---was received. The option may take 4 or 6 bytes, depending onhow largethe size of the Elapsed Timeis.value. Elapsed Time helps correct round-trip time estimates when the gap between receiving a packet and acknowledging that packet may be long---in CCID 3, for example, where acknowledgements are sent infrequently. +--------+--------+--------+--------+ |00101110|00000100| Elapsed Time | +--------+--------+--------+--------+ Type=46 Len=4 +--------+--------+--------+--------+--------+--------+ |00101110|00000110| Elapsed Time | +--------+--------+--------+--------+--------+--------+ Type=46 Len=6 The option data, Elapsed Time, represents an estimated upper bound on the amount of time elapsed since the packet being acknowledged was received, with units of tenths of milliseconds. If Elapsed Time is less than a second, the first,more parsimonioussmaller form of the option SHOULD Kohler/Handley/Floyd/Padhye Section 6.8. [Page 63] INTERNET-DRAFT Expires: April 2004 October 2003 be used. Elapsed Times of more than 6.5535 seconds MUST be sent using the second form of the option. DCCP endpoints MUST NOT report Elapsed Times that are significantly larger than the true elapsed times. A connection MAY be reset, with Reason set to "Aggression Penalty", if one endpoint determines that the other is reporting a much-too-large Elapsed Time. Elapsed Time is measured in tenths of milliseconds as a compromise between two conflictinggoals: first, to providegoals. First, it provides enough granularity to reducealiasing noiserounding error when measuring elapsed time over fastLANs; and second, to allowLANs. Second, Elapsed Time allows most reasonable elapsed times to fit into two bytes of data.Elapsed Time may help correct round-trip time estimates when the gap between receiving a packet and acknowledging that packet may be long---in CCID 3, for example, where acknowledgements are sent infrequently. 6.8.6.9. Timestamp Echo Option This option is permitted in any DCCP packet, as long as at least one packet carrying the Timestamp option has been received. Generally, a DCCP endpoint should send one Timestamp Echo option for each Timestamp option it receives; and it should send that option as soon as is convenient. The length of the option is between 6 and 10 bytes, depending on whether Elapsed Time is included and how large it is.Kohler/Handley/Floyd/Padhye Section 6.8. [Page 44] INTERNET-DRAFT Expires: December 2003 June 2003+--------+--------+--------+--------+--------+--------+|00101001|00000110||00101010|00000110| Timestamp Echo | +--------+--------+--------+--------+--------+--------+Type=41Type=42 Len=6 +--------+--------+------- ... -------+--------+--------+|00101001|00001000||00101010|00001000| Timestamp Echo | Elapsed Time | +--------+--------+------- ... -------+--------+--------+Type=41Type=42 Len=8 (4 bytes) +--------+--------+------- ... -------+------- ... -------+|00101001|00001010||00101010|00001010| Timestamp Echo | Elapsed Time | +--------+--------+------- ... -------+------- ... -------+Type=41Type=42 Len=10 (4 bytes) (4 bytes) The first four bytes of option data, Timestamp Echo, carry a Timestamp Value taken from a preceding received Timestamp option. Usually, this will be the last packet that was received---the packet indicated by the Acknowledgement Number, if any---but it might be a preceding packet. The Elapsed Time field is similar to the value stored in the Elapsed Time option. If present, it indicates the amount of time elapsed since receiving the packet whose timestamp is being echoed. This time MUST be in tenths of milliseconds. Elapsed Time is meant to Kohler/Handley/Floyd/Padhye Section 6.9. [Page 64] INTERNET-DRAFT Expires: April 2004 October 2003 help the Timestamp sender separate the network round-trip time from the Timestamp receiver's processing time. This may be particularly important for CCIDs where acknowledgements are sent infrequently, so that there might be considerable delay between receiving a Timestamp option and sending the corresponding Timestamp Echo. A missing Elapsed Time field is equivalent to an Elapsed Time of zero. The smallest version of the option SHOULD be used that can hold the relevant Elapsed Time value.6.9.6.10. Loss Window Feature Loss Window has feature number 6. The Loss Window feature located at DCCP B is the width of the window DCCP B uses to determine whether packets from DCCP A are valid. Packets outside this window will be dropped by DCCP B as old duplicates or spoofing attempts; see Section 5.2 for more information. DCCP A sends a"Change(Loss"Change R(Loss Window, W)" option to DCCP B to set DCCP B's Loss Window to W. Loss Window is non-negotiable. The Loss Window feature takes3-byte3- or 6-byte integer values, like DCCP sequence numbers. Change and Confirm options for Loss Window are therefore either 6 or 9 bytes long.Kohler/Handley/Floyd/Padhye Section 6.9. [Page 45] INTERNET-DRAFT Expires: December 2003 June 2003Loss Window defaults to 1000 for new connections. The Loss Window value is the total width of the loss window. The receiver positions the loss window asymmetrically aroundGSN,GSR, the greatest sequence numberseen,received, with one-third of the loss window width (rounded down) reserved forGSNGSR and older sequence numbers and two-thirds reserved for newer sequencenumbers, as follows: invalid | valid sequence numbers | invalid <---------|============+========================|----------> GSN -|GSN + 1 - GSN GSN +|GSN + 1 + floor(W/3)|floor(W/3) ceil(2W/3)|ceil(2W/3) This feature is non-negotiable.numbers. See Section 5.2. 7. Congestion Control IDs Each congestion control mechanism supported by DCCP is assigned a congestion control identifier, or CCID: a number from 0 to 255. During connection setup, and optionally thereafter, the endpoints negotiate their congestion control mechanisms by negotiating the values for their Congestion Control ID features. Congestion Control ID has feature number 1. The feature located at DCCP A is the CCID in use for the A-to-B half-connection. DCCP B sendsan "Change(CC,a "Change R(CCID, K)" option to DCCP A to ask A to use CCID K for its data packets. CCID is a server-priority feature. The data byte of Congestion Control ID feature negotiation options form a list of acceptable CCIDs, sorted in descending order of priority. For example, the option"Change(CC 1, 2,"Change R(CCID, 1 2 3)" asks thesenderreceiver to use CCID1,1 for its packets, although CCIDs 2 and 3 are also acceptable. (This corresponds to the bytes"33,"35, 6, 1, 1, 2, 3": Change R option(33),(35), option length (6), feature ID (1), CCIDs Kohler/Handley/Floyd/Padhye Section 7. [Page 65] INTERNET-DRAFT Expires: April 2004 October 2003 (1, 2, 3).) Similarly,"Confirm(CC"Confirm L(CCID, 1,2,1 2 3)" tells the receiver that the sender is using CCID1,1 for its packets, but that CCIDs 2 or 3 might also be acceptable. The CCIDs defined by this document are: CCID Meaning ---- ------- 0 Reserved 1 Unspecified Sender-Based Congestion Control 2 TCP-like Congestion Control 3 TFRC Congestion Control A new connection starts with CCID 2 for both DCCPs. If this is unacceptable for a DCCP endpoint, thatendpoint's Congestion Control feature will start in the Unknown state, and the endpoint will send Kohler/Handley/Floyd/Padhye Section 7. [Page 46] INTERNET-DRAFT Expires: December 2003 June 2003 "Prefer(CC)" options on its first packets. A DCCPendpoint MUSTNOTsenddata when its Congestion Control feature is in the Unknown state, with the possible exception of the data included"Change(CCID)" options ona DCCP- Request. Itits first packets, and MUSTalso limitReset therate at which it sends unsolicited DCCP-Ack packets until its Congestion Control feature is known. (This does not affect acknowledgements onconnection if theother half-connection, whichresults of those negotiations arecontrolled by the other endpoint's Congestion Control feature; see Section 8.1.)unacceptable. All CCIDs standardized for use with DCCP will correspond to congestion control mechanisms previously standardized by the IETF. We expect that for quite some time, all such mechanisms will be TCP- friendly, but TCP-friendliness is not an explicit DCCP requirement. A DCCP implementation intended for general use---in a general- purpose operating system kernel, for example---SHOULD implement at least CCIDs 1 and 2. The intent is to make these CCIDs broadly available for interoperability, although any given application might disallow their use via the feature negotiation process. 7.1. Unspecified Sender-Based Congestion Control CCID 1 denotes an unspecified sender-based congestion control mechanism. Separate features negotiate the corresponding congestion acknowledgement options---for example, Ack Vector. This provides a limited, controlled form of interoperability for new IETF-approved CCIDs. Implementors MUST NOT use CCID 1 in production environments as a proxy for congestion control mechanisms that have not entered the IETF standards process. We intend that any production use of CCID 1 would have to be explicitly approved first by the IETF. Middleboxes MAY choose to treat the use of CCID 1 as experimental or unacceptable. For example, say that CCID 98, a new sender-based congestion control mechanism using Ack Vector for acknowledgements, has entered the IETF standards process, and the IETF has approved the use of CCID 1 Kohler/Handley/Floyd/Padhye Section 7.1. [Page 66] INTERNET-DRAFT Expires: April 2004 October 2003 as a backup for CCID 98. Now, DCCP A, which understands and would like to use CCID 98, is trying to communicate with DCCP B, which doesn't yet know about CCID 98. DCCP A can simply negotiate use of CCID 1 and, separately, negotiate Use Ack Vector. DCCP B will provide the feedback DCCP A requires for CCID 98, namely Ack Vector, without needing to understand the congestion control mechanism in use. CCID 1 has no sender implementation; it is exclusively meaningful at the receiver to support forward compatibility. The sender alwaysKohler/Handley/Floyd/Padhye Section 7.1. [Page 47] INTERNET-DRAFT Expires: December 2003 June 2003uses a specific congestion control mechanism whose CCID is not 1. However, the code implementing a CCID that requires only generic feedback, such as Ack Vector, MAY add CCID 1 to the list of acceptable CCIDs sent to the receiver (following the actual CCID), facilitating communication with receivers that do not understand the actual CCID. Any CCID feature negotiation in which the sender proposes the use of CCID 1 without any other CCID is considered erroneous, and SHOULD result in connection reset, with Reason set to "Fruitless Negotiation". Many DCCPimplementations MAY provideAPIsthatwill allow applications to suggest preferred CCIDs for sending and receiving data.Any such API MUST NOT allow sending applications to suggest CCID 1; again, CCID 1 willApplications might besuggested when appropriate by the code implementing the preferred CCID. In contrast, APIs SHOULD let applicationsable to allow or prevent the use of CCID 1 for sending and receiving. For sending, however, it makes sense to let the code implementing a particular CCID silently suggest CCID 1 when appropriate. CCID 1 places no restrictions on how often the HC-Receiver may send DCCP-Ack packets. This applies wherever we say "send a DCCP-Ack as allowed by the congestion control mechanism in use". A careful implementation SHOULD implement a liberal rate limit on DCCP-Acks to prevent ack storms, however. 7.2. TCP-like Congestion Control CCID22, TCP-like Congestion Control, denotes Additive Increase, Multiplicative Decrease (AIMD) congestion control with behavior modelled directly on TCP, including congestion window, slow start, timeouts, and so forth. CCID 2 achieves maximum bandwidth over the long term, consistent with the use of end-to-end congestion control, but halves its congestion window in response to each congestion event. This leads to the abrupt rate changes typical of TCP. Applications should use CCID 2 if they prefer maximum bandwidth utilization to steadiness of rate. This is often the case for applications that are not playing their data directly to the user. For example, a hypothetical application that transferred files over DCCP, using application-level retransmissions for lost packets, Kohler/Handley/Floyd/Padhye Section 7.2. [Page 67] INTERNET-DRAFT Expires: April 2004 October 2003 would prefer CCID 2 to CCID 3. On-line games may also prefer CCID 2. CCID 2 is further described in [CCID 2 PROFILE]. 7.3. TFRC Congestion Control CCID 3 denotes TCP-Friendly RateControl,Control (TFRC), an equation-basedrate- controlledrate-controlled congestion control mechanism. TFRC is designed to be reasonably fair when competing for bandwidth with TCP-like flows, where a flow is "reasonably fair" if its sending rate is generally within a factor of two of the sending rate of a TCP flow under the same conditions. However, TFRC has a much lower variation of throughput over time compared with TCP, which makes CCID 3 more suitable than CCID 2 for applications such as telephony or streaming media where a relatively smooth sending rate is of importance. CCID 3 is further described in [CCID 3 PROFILE]. The TFRC congestion control algorithms were initially described in [RFC 3448]. 7.4. CCID-SpecificOptionsOptions, Features, andFeaturesReset Reasons Optionandtypes, featurenumbersnumbers, and Reset Reasons 128 through 255 are available forCCID- specificCCID-specific use. CCIDs may often need new option types---for communicating acknowledgement or rate information, for example. CCID-specific option types let them create options at will without polluting the global option space. Option 128 might have different meanings on a half-connection using CCID 4 and ahalf-connectionhalf- connection using CCID 8. CCID-specific options and features will never conflict with global options and features introduced by later versions of this specification. Any packet may contain information meant for either half-connection, so CCID-specific optionandtypes, featurenumbersnumbers, and Reset Reasons explicitly signal the half-connection to which they apply. o Option numbers 128 through 191 are for options sent from theHC-SenderHC- Sender to the HC-Receiver; optionKohler/Handley/Floyd/Padhye Section 7.4. [Page 48] INTERNET-DRAFT Expires: December 2003 June 2003numbers 192 through 255 are for options sent from the HC-Receiver to the HC-Sender.Similarly,o Reset Reasons 128 through 191 indicate that the HC-Sender reset the connection (most likely because of some problem with acknowledgements sent by the HC-Receiver); Reset Reasons 192 through 255 indicate that the HC-Receiver reset the connection (most likely because of some problem with data packets sent by the HC-Sender). Kohler/Handley/Floyd/Padhye Section 7.4. [Page 68] INTERNET-DRAFT Expires: April 2004 October 2003 o Finally, feature numbers 128 through 191 are used for features located at the HC-Sender; feature numbers 192 through 255 are for features located at the HC-Receiver.(ChangeSince Change L and Confirm L options for a feature are senttoby the featurelocation; Prefer and Confirm options arelocation, we know that any Change L(128) option was sentfromby thefeature location. Thus, Change(128)HC-Sender, while any Change L(192) option was sent by the HC-Receiver. Similarly, Change R(128) options are sent by theHC-Receiver by definition,HC-Receiver, whileChange(192)Change R(192) options are sent by theHC-Sender.)HC-Sender. For example, consider a DCCP connection where the A-to-B half- connection uses CCID 4 and the B-to-A half-connection uses CCID 5. Here is how a sampling of CCID-specific options and features are assigned to half-connections: Relevant Relevant Packet Option Half-conn. CCID ------ ------ ---------- ---- A > B 128 A-to-B 4 A > B 192 B-to-A 5 A > BChange(128,Change L(128, ...)B-to-A 5A-to-B 4 A > B Change R(192, ...) A-to-B 4 A > BPrefer(128,Confirm L(128, ...) A-to-B 4 A > BConfirm(128,Confirm R(192, ...) A-to-B 4 A > BChange(192,Change R(128, ...) B-to-A 5 A > B Change L(192, ...)A-to-B 4B-to-A 5 A > BPrefer(192,Confirm R(128, ...) B-to-A 5 A > BConfirm(192,Confirm L(192, ...) B-to-A 5 B > A 128 B-to-A 5 B > A 192 A-to-B 4 B > AChange(128,Change L(128, ...)A-to-B 4B-to-A 5 B > APrefer(128,Change R(192, ...) B-to-A 5 B > AConfirm(128,Confirm L(128, ...) B-to-A 5 B > AChange(192,Confirm R(192, ...) B-to-A 5 B > APrefer(192,Change R(128, ...) A-to-B 4 B > A Change L(192, ...) A-to-B 4 B > A Confirm R(128, ...) A-to-B 4 B > AConfirm(192,Confirm L(192, ...) A-to-B 4 CCID-specific options and features have no clear meaning when a nontrivial negotiation for the relevant CCID is influx. A DCCP SHOULD respond toprogress. This can happen when a CCID-specific option follows a Change(CCID) option. Say the Change option prefers CCID X. Then the negotiation is nontrivial if and only if its result is not X. CCID-specific options and featureswithMUST be ignored during a nontrivial CCID negotiation---for instance, by responding Ignored options---except that Mandatory CCID-specific optionsduring those times.and features MUST induce a Kohler/Handley/Floyd/Padhye Section 7.4. [Page 69] INTERNET-DRAFT Expires: April 2004 October 2003 DCCP-Reset with Reason "Mandatory Error". 8. Acknowledgements Congestion control requires receivers to transmit information about packet losses and ECN marks to senders. DCCP receivers MUST report all congestion they see, as defined by the relevant CCID profile. Each CCID says when acknowledgements should be sent, what options they must use, how they should be congestion controlled, and so on.Kohler/Handley/Floyd/Padhye Section 8. [Page 49] INTERNET-DRAFT Expires: December 2003 June 2003Most acknowledgements use DCCP options. For example, on a half- connection with CCID 2 (TCP-like), the receiver reports acknowledgement information using the Ack Vector option. This section describes common acknowledgement options and shows how acks using those options will commonly work. Full descriptions of the acknowledgement mechanisms used for each CCID are laid out in the CCID profile specifications. Acknowledgement options, such as Ack Vector, generally depend on the DCCP Acknowledgement Number, and are thus only allowed on packet types that carry that number (all packets except DCCP-Request and DCCP-Data).However, detailedDetailed acknowledgement options are notgenerally necessarynecessarily required onDCCP-Resets.every packet that carries an Acknowledgement Number, however. 8.1. Acks of Acks and Unidirectional Connections DCCP was designed to work well for both bidirectional and unidirectional flows of data, and for connections that transition between these states. However, acknowledgements required for a unidirectional connection are very different from those required for a bidirectional connection. In particular, unidirectional connections need to worry about acks of acks. The ack-of-acks problem arises because some acknowledgement mechanisms are reliable. For example, an HC-Receiver using CCID 2, TCP-like Congestion Control, sends Ack Vectors containing completely reliable acknowledgement information. The HC-Sender should occasionally inform the HC-Receiver that it has received an ack. If it did not, the HC-Receiver might resend complete Ack Vector information, going back to the start of the connection, with every DCCP-Ack packet! However, note that acks-of-acks need not be reliable themselves: when an ack-of-acks is lost, the HC-Receiver will simply maintain, and periodically retransmit, old acknowledgement-related state for a little longer. Therefore, there is no need for acks-of-acks-of-acks. Kohler/Handley/Floyd/Padhye Section 8.1. [Page 70] INTERNET-DRAFT Expires: April 2004 October 2003 When communication is bidirectional, any required acks-of-acks are automatically contained in normal acknowledgements for data packets. On a unidirectional connection, however, the receiver DCCP sends no data, so the sender would not normally send acknowledgements. Therefore, the CCID in force on that half-connection must explicitly say whether, when, and how the HC-Sender should generate acks-of- acks. For example, consider a bidirectional connection where both half- connections use the same CCID (either 2 or 3), and where DCCP B goes "quiescent". This means that the connection becomes unidirectional: DCCP B stops sending data, and sends only sends DCCP-Ack packets toKohler/Handley/Floyd/Padhye Section 8.1. [Page 50] INTERNET-DRAFT Expires: December 2003 June 2003DCCP A. For CCID 2, TCP-like Congestion Control, DCCP B uses Ack Vector to reliably communicate which packets it has received. As described above, DCCP A must occasionally acknowledge a pure acknowledgement from DCCP B, so that DCCP B can free old Ack Vector state. For instance, DCCP A might send a DCCP-DataAck packet every now and then, instead of DCCP-Data. In contrast, for CCID 3, TFRC Congestion Control, DCCP B's acknowledgements generally need not be reliable, since they contain cumulative loss rates; TFRC works even if every DCCP-Ack is lost. Therefore, DCCP A need never acknowledge an acknowledgement. When communication is unidirectional, a single CCID---in the example, the A-to-B CCID---controls both DCCPs' acknowledgements, in terms of their content, their frequency, and so forth. For bidirectional connections, the A-to-B CCID governs DCCP B's acknowledgements (including its acks of DCCP A's acks), while the B- to-A CCID governs DCCP A's acknowledgements. DCCP A switches its ack pattern from bidirectional to unidirectional when it notices that DCCP B has gone quiescent. It switches from unidirectional to bidirectional when it must acknowledge even a single DCCP-Data or DCCP-DataAck packet from DCCP B. (This includes the case where a single DCCP-Data or DCCP-DataAck packet was lost in transit, which is detectable using the # NDP field in the DCCP packet header.) Each CCID defines how to detect quiescence on that CCID, and how that CCID handles acks-of-acks on unidirectional connections. TheB- to-AB-to-A CCID defines when DCCP B has gone quiescent. Usually, this happens when a period has passed without B sending any data packets. For CCID 2, this period isroughlythe maximum of 0.2 seconds and tworound-tripround- trip times. The A-to-B CCID defines how DCCP A handles acks-of-acks once DCCP B has gone quiescent. Kohler/Handley/Floyd/Padhye Section 8.1. [Page 71] INTERNET-DRAFT Expires: April 2004 October 2003 8.2. Ack Piggybacking Acknowledgements of A-to-B data MAY be piggybacked on data sent by DCCP B, as long as that does not delay the acknowledgement longer than the A-to-B CCID would find acceptable. However, data acknowledgements often require more than 4 bytes to express. A large set of acknowledgements prepended to a large data packet might exceed the path's MTU. In this case, DCCP B SHOULD send separate DCCP-Data and DCCP-Ack packets, or wait, but not too long, for a smaller datagram. Piggybacking is particularly common at DCCP A when the B-to-A half- connection is quiescent---that is, when DCCP A is just acknowledging DCCP B's acknowledgements, as described above. There are threeKohler/Handley/Floyd/Padhye Section 8.2. [Page 51] INTERNET-DRAFT Expires: December 2003 June 2003reasons to acknowledge DCCP B's acknowledgements: to allow DCCP B to free up information about previously acknowledged data packets from A; to shrink the size of future acknowledgements; and to manipulate the rate at which future acknowledgements are sent. Since these are secondary concerns, DCCP A can generally afford to wait indefinitely for a data packet to piggyback its acknowledgement onto. Any restrictions on ack piggybacking are described in the relevant CCID's profile. 8.3. Ack Ratio Feature Ack Ratio provides a common mechanism by which CCIDs that clock acknowledgements offofdata packets can perform rudimentary congestion control on the acknowledgement stream. CCID 2, TCP-like Congestion Control, uses Ack Ratio to limit the rate of its acknowledgement stream, for example. Some CCIDs ignore Ack Ratio, performing congestion control on acknowledgements in some other way. Ack Ratio has feature number 3. The Ack Ratio feature located at DCCP B equals the rough ratio of data packets sent by DCCP A to acknowledgement packets sent back by DCCP B. For example, if it is set to four, then DCCP B will send at least one acknowledgement packet for every four data packets DCCP A sends. DCCP A sends a"Change(Ack"Change R(Ack Ratio)" option to DCCP B to change DCCP B's ack ratio. Ack Ratio is a non-negotiable feature. An Ack Ratiooption contains two bytes of data: a sixteen-bit integer representing the ratio. A new connection starts withoption contains two bytes of data: a sixteen-bit integer representing the ratio. A new connection starts with Ack Ratio 2 for both DCCPs. Implementations should treat Ack Ratio as a loose guideline. For instance, a DCCP endpoint might implement a delayed acknowledgement timer like TCP's, whereby each packet is acknowledged within at most Kohler/Handley/Floyd/Padhye Section 8.3. [Page 72] INTERNET-DRAFT Expires: April 2004 October 2003 T seconds of its receipt. (In TCP, T is commonly set to 200 milliseconds.) This is explicitly allowed even though it might lead to sending more acknowledgement packets than Ack Ratio would suggest. Particular algorithms for setting and using Ack Ratio2 for both DCCPs. This feature is non-negotiable.are discussed in the relevant CCID drafts. 8.4. Use Ack Vector Feature The Use Ack Vector feature lets DCCPs negotiate whether they should use Ack Vector options to report congestion. Ack Vector provides detailed loss information, and lets senders report back to their applications whether particular packets were dropped. Use Ack Vector is mandatory for some CCIDs, and optional for others. Use Ack Vector has feature number 4. The Use Ack Vector feature located at DCCP B specifies whether DCCP B MUST use Ack Vector options on its acknowledgements to DCCP A, although DCCP BMAYmay send Ack Vector options even when Use Ack Vector is false. DCCP A sends a"Change(Use"Change R(Use Ack Vector, 1)" option to DCCP B to ask B to send Ack Vector options as part of its acknowledgement traffic.Kohler/Handley/Floyd/Padhye Section 8.4. [Page 52] INTERNET-DRAFT Expires: December 2003 June 2003Use Ack Vector is a server-priority feature. Use Ack Vector feature values are a single byte long. The receiver MUST send Ack Vector options if this byte is nonzero. A new connection starts with Use Ack Vector 0 for both DCCPs. 8.5. Ack Vector Options The Ack Vector gives a run-length encoded history of data packets received at the client. Each byte of the vector gives the state of that data packet in the loss history, and the number of preceding packets with the same state. The option's data looks like this: +--------+--------+--------+--------+--------+--------|001001??||0010011?| Length |SSLLLLLL|SSLLLLLL|SSLLLLLL| ... +--------+--------+--------+--------+--------+--------Type=37/38Type=38/39 \___________ Vector ___________... The two Ack Vector options (option types3738 and38)39) differ only in the values they imply for ECN Nonce Echo. Section 9.2 describes this further. The vector itself consists of a series of bytes, each of whose encoding is: Kohler/Handley/Floyd/Padhye Section 8.5. [Page 73] INTERNET-DRAFT Expires: April 2004 October 2003 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |St | Run Length| +-+-+-+-+-+-+-+-+ St[ate]: 2 bits Run Length: 6 bits State occupies the most significant two bits of each byte, and can have one of four values: 0 Packet received (and not ECN marked). 1 Packet received ECN marked. 2 Reserved. 3 Packet not yet received. The first byte in the first Ack Vector option refers to the packet indicated in the Acknowledgement Number; subsequent bytes refer to older packets. (Ack Vector MUST NOT be sent on DCCP-Data and DCCP-Kohler/Handley/Floyd/Padhye Section 8.5. [Page 53] INTERNET-DRAFT Expires: December 2003 June 2003Request packets, which lack an Acknowledgement Number.) If an Ack Vector contains the decimal values 0,192,3,64,5 and the Acknowledgement Number is decimal 100, then: Packet 100 was received (Acknowledgement Number 100, State 0, Run Length 0). Packet 99 was lost (State 3, Run Length 0). Packets 98, 97, 96 and 95 were received (State 0, Run Length 3). Packet 94 was ECN marked (State 1, Run Length 0). Packets 93, 92, 91, 90, 89, and 88 were received (State 0, Run Length 5). Run lengths of more than 64 must be encoded in multiple bytes. A single Ack Vector option can acknowledge up to 16192 data packets. Should more packets need to be acknowledged than can fit in 253 bytes of Ack Vector, then multiple Ack Vector options can be sent. The second Ack Vector option will begin where the first Ack Vector option left off, and so forth. Kohler/Handley/Floyd/Padhye Section 8.5. [Page 74] INTERNET-DRAFT Expires: April 2004 October 2003 Ack Vector states are subject to two general constraints. (These principles SHOULD also be followed for other acknowledgement mechanisms; referring to Ack Vector states simplifies their explanation.) (1) Packets reported as State 0 or State 1 MUST have been processed by the receiving DCCP stack. In particular, their options must have been processed. Any data on the packet need not have been delivered to the receiving application; in fact, the data may have been dropped. (2) Packets reported as State 3 MUST NOT have been received by DCCP. Feature negotiations and options on such packets MUST NOT have been processed, and the Acknowledgement Number MUST NOT correspond to such a packet. Packets dropped in the application's receive buffer SHOULD be reported as Received or Received ECN Marked (States 0 and 1), depending on their ECN state; such packets' ECN Nonces MUST be included in the Nonce Echo. The Data Dropped option informs the sender that some packets reported as received actually had their payloads dropped. One or more Ack Vector options that, together, report the status of more packets than have actually been sent SHOULD be consideredKohler/Handley/Floyd/Padhye Section 8.5. [Page 54] INTERNET-DRAFT Expires: December 2003 June 2003invalid. The receiving DCCP SHOULD either ignore the options or reset the connection with Reason set to "Option Error". Packets whose status has not reported by any Ack Vector option SHOULD be treated as "not yet received" (State 3) by the sender. Appendix A provides a non-normative description of the details of DCCP acknowledgement handling, in the context of an abstract Ack Vector implementation. 8.5.1. Ack Vector Consistency A DCCP sender will commonly receive multiple acknowledgements for some of its data packets. For instance, an HC-Sender might receive two DCCP-Acks with Ack Vectors, both of which contained information about sequence number 24. (Because of cumulative acking, information about a sequence number is repeated in every ack until the HC-Sender acknowledges an ack. Perhaps the HC-Receiver is sending acks faster than the HC-Sender is acknowledging them.) In a perfect world, the two Ack Vectors would always be consistent. However, there are many reasons why they might not be: o The HC-Receiver received packet 24 between sending its acks, so the first ack said 24 was not received (State 3) and the second Kohler/Handley/Floyd/Padhye Section 8.5.1. [Page 75] INTERNET-DRAFT Expires: April 2004 October 2003 said it was received or ECN marked (State 0 or 1). o The HC-Receiver received packet 24 between sending its acks, and the network reordered the acks. In this case, the packet will appear to transition from State 0 or 1 to State 3. o The network duplicated packet 24, and one of the duplicates was ECN marked. This might show up as a transition between States 0 and 1. To cope with these situations, HC-Sender DCCP implementations SHOULD combine multiple receivedAck Vector statesAck Vector states according to this table: Received State 0 1 3 +---+---+---+ 0 | 0 |0/1| 0 | Old +---+---+---+ 1 | 1 | 1 | 1 | State +---+---+---+ 3 | 0 | 1 | 3 | +---+---+---+ To read the table, choose the row corresponding to the packet's old state and the column corresponding to the packet's state in the newly received Ack Vector, then read the packet's new state off the table. For an old state of 0 (received non-marked) and received state of 1 (received ECN marked), the packet's new state may be set to either 0 or 1. The HC-Sender implementation will be indifferent to ack reordering if it chooses new state 1 for that cell. The HC-Receiver should collect information about received packets, which it will eventually report to the HC-Sender on one or more acknowledgements, according tothisthe following table: ReceivedStatePacket 0 1 3 +---+---+---+ 0 | 0 |0/1| 0 |OldStored +---+---+---+ 1| 1 ||0/1| 1 | 1 | State +---+---+---+ 3 | 0 | 1 | 3 | +---+---+---+To readKohler/Handley/Floyd/Padhye Section 8.5.1. [Page 76] INTERNET-DRAFT Expires: April 2004 October 2003 This table equals the sender's table, except that when the stored state is 1 and the received state is 0, the receiver is allowed to switch its stored state to 0. A HC-Sender MAY choose to throw away old information gleaned from the HC-Receiver's Ack Vectors, in which case it MUST ignore newly received acknowledgements from the HC-Receiver for those old packets. It is often kinder to save recent Ack Vector information for a while, so that the HC-Sender can undo its reaction to presumed congestion when a "lost" packet unexpectedly shows up (the transition from State 3 to State 0). 8.5.2. Ack Vector Coverage We can divide the packets that have been sent from an HC-Sender to an HC-Receiver into four roughly contiguous groups. From oldest to youngest, these are: (1) Packets already acknowledged by the HC-Receiver, where the HC- Receiver knows that the HC-Sender has definitely received the acknowledgements. (2) Packets already acknowledged by the HC-Receiver, where the HC- Receiver cannot be sure that the HC-Sender has received the acknowledgements. (3) Packets not yet acknowledged by the HC-Receiver. (4) Packets not yet received by the HC-Receiver. The union of groups 2 and 3 is called the Acknowledgement Window. Generally, every Ack Vector generated by the HC-Receiver will cover thetable, choosewhole Acknowledgement Window: Ack Vector acknowledgements are cumulative. (This simplifies Ack Vector maintenance at therow corresponding toHC- Receiver; see Section A, below.) As packets are received, this window both grows on thepacket's old stateright and shrinks on thecolumn corresponding toleft. It grows because there are more packets, and shrinks because thepacket's state indata packets' Acknowledgement Numbers will acknowledge previous acknowledgements, moving packets from group 2 into group 1. 8.6. Slow Receiver Option An HC-Receiver sends thenewly received Ack Vector, then readSlow Receiver option to its sender to indicate that it is having trouble keeping up with thepacket's new state offsender's data. The HC-Sender SHOULD NOT increase its sending rate for approximately one round-trip time after seeing a packet with a Slow Receiver option. However, thetable. For an old state of 0 (received non-marked)Slow Receiver option does not indicate congestion, andreceived state of 1 (received ECN marked),thepacket's new state may be setHC-Sender need not reduce its sending Kohler/Handley/Floyd/Padhye Section8.5.1.8.6. [Page55]77] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003 rate. (If necessary, the receiver can force the sender to slow down by dropping packets, with or without Data Dropped, or reporting false ECN marks.) APIs should let receiver applications set Slow Receiver, and sending applications determine whether or not their receivers are Slow. The Slow Receiver option takes just one byte: +--------+ |00000010| +--------+ Type=2 Slow Receiver does not specify why the receiver is having trouble keeping up with the sender. Possible reasons include lack of buffer space, CPU overload, and application quotas. A sending application might react toeither 0Slow Receiver by reducing its sending rate or1. The HC-Sender implementation will be indifferentby switching toack reordering if it chooses new state 1 for that cell.a lossier compression algorithm. TheHC-Receiversending application shouldcollect information about received packets, which it will eventually reportnot react tothe HC-Sender on one orSlow Receiver by sending moreacknowledgements, according to the following table: Received Packet 0 1 3 +---+---+---+ 0 | 0 |0/1| 0 | Stored +---+---+---+ 1 |0/1| 1 | 1 | State +---+---+---+ 3 | 0 | 1 | 3 | +---+---+---+ This table equalsdata, however. For example, thesender's table, except that whenoptimal response to a CPU-bound receiver might be to increase thestored state is 1 andsending rate, by switching to a less- compressed sending format, since a highly-compressed data format might overwhelm a slow CPU more seriously than thereceived statehigher memory requirements of a less-compressed data format. The Slow Receiver option is0, thenot appropriate for this case; a CPU-bound receiveris allowedshould not ask for Slow Receiver options toswitch its stored statebe sent. Slow Receiver implements a portion of TCP's receive window functionality. We believe receiver operating systems and applications will find it easier to0. A HC-Sender MAY choosesend Slow Receiver when appropriate than they currently find it tothrow away old information gleaned from the HC-Receiver's Ack Vectors, in which casecorrectly set a TCP receive window. 8.7. Data Dropped Option The Data Dropped option indicates that some packets reported as received actually had their data dropped before itMUST ignore newly received acknowledgements fromreached theHC-Receiver for those old packets. It is often kinderapplication. The sender's congestion control mechanism may respond tosave recent Ack Vector information fordata-dropped packets less severely than to lost or marked packets. For instance, awhile, so that the HC-Sender can undowindowed mechanism might subtract a constant value from itsreaction to presumedcongestionwhenwindow, rather than cut it in half. Data Dropped lets a"lost" packet unexpectedly shows up (the transition from State 3 to State 0). 8.5.2. Ack Vector Coverage We can divide the packets that have been sent from an HC-Sendersender differentiate between different kinds of loss (network and endpoint), but it does not allow total freedom in how toan HC-Receiver into four roughly contiguous groups. From oldestreact. The congestion control response toyoungest, these are: (1) Packets already acknowledgeda Data Dropped packet must be approved by theHC-Receiver, where the HC- Receiver knows thatIETF. Each congestion control Kohler/Handley/Floyd/Padhye Section 8.7. [Page 78] INTERNET-DRAFT Expires: April 2004 October 2003 mechanism MUST react to a Data Dropped packet as if theHC-Sender has definitelypacket were ECN marked, unless explicitly specified otherwise. If a received packet's payload is dropped for one of theacknowledgements. (2) Packets already acknowledged by the HC-Receiver, where the HC- Receiver cannotreasons listed below, this SHOULD besure that the HC-Sender has receivedreported using a Data Dropped option. Alternatively, theacknowledgements. (3) Packetsreceiver MAY choose to report as "received" only those packets whose payloads were notyet acknowledged bydropped, subject to theHC-Receiver. (4) Packetsconstraint that packets notyetreported as receivedby the HC-Receiver. Kohler/Handley/Floyd/Padhye Section 8.5.2. [Page 56] INTERNET-DRAFT Expires: December 2003 June 2003MUST NOT have had their options processed. Theunionoption's data looks like this: +--------+--------+--------+--------+--------+-------- |00101000| Length | Block | Block | Block | ... +--------+--------+--------+--------+--------+-------- Type=40 \___________ Vector ___________ ... The vector itself consists ofgroupsa series of bytes, called Blocks, each of whose encoding corresponds to one of these choices: 0 1 2and3is called4 5 6 7 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |0| Run Length | or |1|DrpCd|Run Len| +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Normal Block Drop Block The first byte in theAcknowledgement Window. Generally, every Ack Vector generated byfirst Data Dropped option refers to theHC-Receiver will coverpacket indicated in thewholeAcknowledgementWindow: Ack Vector acknowledgements are cumulative. (This simplifies Ack Vector maintenance at the HC- Receiver; see Section 23, below.) As packets are received, this window both grows on the right and shrinksNumber; subsequent bytes refer to older packets. (Data Dropped MUST NOT be sent onthe left. It grows because there are moreDCCP-Data or DCCP- Request packets,and shrinks becausewhich lack an Acknowledgement Number.) Normal Blocks, which have high bit 0, indicate that any received packets in the Run Length had their datapackets' Acknowledgement Numbers will acknowledge previous acknowledgements, movingdelivered to the application. Drop Blocks, which have high bit 1, indicate that received packets in the Run Len[gth] were not delivered as usual. The 3-bit Drop Code [DrpCd] field says what happened; generally, no data fromgroup 2 into group 1. 8.6. Slow Receiver Option An HC-Receiver sendsthat packet reached theSlow Receiverapplication. Packets reported as "not yet received" MUST be included in Normal Blocks; packets not covered by any Data Dropped option are treated as if they were in a Normal Block. Defined Drop Codes for Drop Blocks are: 0 Packet data dropped due toits sender to indicate that it is having trouble keeping up withprotocol constraints. For example, thesender's data. The HC-Sender SHOULD NOT increase its sending rate for approximately one round-trip time after seeing a packet withdata was included on aSlow Receiver option. However,DCCP-Request packet, and theSlow Receiver optionreceiving application does notindicate congestion, andallow that piggybacking; or theHC-Sender need not reduce its sending rate. (If necessary,data was sent during an important feature negotiation. Kohler/Handley/Floyd/Padhye Section 8.7. [Page 79] INTERNET-DRAFT Expires: April 2004 October 2003 1 Packet data dropped because thereceiver can forceapplication is no longer listening. 2 Packet data dropped in thesenderreceive buffer. 3 Packet data dropped due toslow down by dropping packets or reporting false ECN marks.) APIs SHOULD let receiver applications set Slow Receiver, and sending applications determine whether or not their receivers are Slow. The Slow Receivercorruption. 4-6 Reserved. 7 Packet data corrupted, but delivered to the application anyway. For example, if a Data Dropped optiontakes just one byte: +--------+ |00000010| +--------+ Type=2 Slow Receiver does not specify whycontains thereceiverdecimal values 0,160,3,162, the Acknowledgement Number ishaving trouble keeping up with100, and an Ack Vector reported all packets as received, then: Packet 100 was received (Acknowledgement Number 100, Normal Block, Run Length 0). Packet 99 was dropped in thesender. Possible reasons include lack ofreceive bufferspace, CPU overload,(Drop Block, Drop Code 2, Run Length 0). Packets 98, 97, 96, andapplication quotas. A sending application might react to Slow Receiver by reducing its sending rate95 were received (Normal Block, Run Length 3). Packets 95, 94, and 93 were dropped in the receive buffer (Drop Block, Drop Code 2, Run Length 2). Run lengths of more than 128 (for Normal Blocks) orby switching to a lossier compression algorithm. However, a smart sender might actually *increase* its sending rate16 (for Drop Blocks) must be encoded inresponse to Slow Receiver, by switchingmultiple Blocks. A single Data Dropped option can acknowledge up toa less-compressed sending format. (A highly-compressed32384 Normal Block dataformat might overwhelm a slow CPU more seriously thanpackets, although thehigher memory requirements ofreceiver SHOULD NOT send aless-compressed data format.) This tension between transfer size (less compression meansData Dropped option when all relevant packets fit into Normal Blocks. Should morecongestion) and processing speed (less compression means less processing) cannotpackets need to beresolvedacknowledged than can fit ingeneral. Slow Receiver implements a portion253 bytes ofTCP's receive window functionality. We believe receiver operating systems and applicationsData Dropped, then multiple Data Dropped options can be sent. The second option willfind it much easier to send Slow Receiver when appropriatebegin where the first option left off, and so forth. One or more Data Dropped options that, together, report the status of more packets thanthey currently find ithave been sent, or that change the status of a packet, or that disagree with Ack Vector or equivalent options (by reporting a "not yet received" packet as "dropped in the receive buffer", for example), SHOULD be considered invalid. The receiving DCCP SHOULD respond tocorrectlyinvalid Data Dropped options by ignoring them or by resetting the connection with Reason seta TCPto "Option Error". A DCCP application interface should let receiving applications specify the Drop Codes corresponding to received packets. For example, this would let applications calculate their own checksums, Kohler/Handley/Floyd/Padhye Section8.6.8.7. [Page57]80] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003receive window. 8.7.but still report "dropped due to corruption" packets via the Data DroppedOptionoption. TheData Dropped option indicates that some packets reported as received actually had their data dropped before it reachedinterface should not let applications reduce theapplication. The sender's congestion control mechanism may respond to data-dropped packets less severely than to lost or marked packets. For instance,"seriousness" of awindowed mechanism might subtractpacket's Drop Code; for example, the application should not be able to upgrade aconstant valuepacket fromits congestion window, rather than cut it in half.delivered corrupt (Drop Code 7) to delivered normally (no Drop Code). 8.7.1. Data Droppedlets a sender differentiate between different kinds of loss (networkandendpoint), but it does not allow total freedom in how to react. The congestion controlNormal Congestion Response When deciding on a response to a particular acknowledgement or set of acknowledgements containing Data Droppedpacket MUST be approved by the IETF. In particular,packets, a congestion control mechanism MUSTreact to a Data Dropped packet as if the packet were ECN marked, unless specified otherwise in the relevant CCID profile. In any case, whenconsider dropped packets and ECN marks (including ECN-marked packets that are included in DataDropped,Dropped), as well as thesender's congestion control mechanism MUST react toData Dropped packets. For window-based mechanisms, theECN mark. If a received packet's payloadvalid response space isdropped for onedefined as follows. Assume an old window ofthe reasons listed below, this SHOULD be reported usingW. Independently calculate a new window W_new1 that assumes no packets were Data Droppedoption. Alternatively, the receiver MAY choose to report as "received"(so W_new1 contains onlythosethe normal congestion response), and a new window W_new2 that assumes no packetswhose payloadswerenot dropped, subject tolost or marked (so W_new2 contains only theconstraintData Dropped response). We are assuming thatpacketsData Dropped recommended a reduction in congestion window, so W_new2 < W. Then the actual new window W_new MUST NOT be larger than the minimum of W_new1 and W_new2; and the sender MAY combine the two responses, by setting W_new = W + min(W_new1 - W, 0) + min(W_new2 - W, 0). Non-window-based congestion control mechanisms MUST behave analogously. 8.7.2. Particular Drop Codes Drop Code 0 ("protocol constraints") does notreportedindicate any kind of congestion, so the sender's CCID SHOULD react to non-marked packets with Drop Code 0 asreceived MUSTif they were received. However, the sending DCCP SHOULD NOThave had their options processed. The option'ssend more datalooks like this: +--------+--------+--------+--------+--------+-------- |00100111| Length | Block | Block | Block | ... +--------+--------+--------+--------+--------+-------- Type=39 \___________ Vector ___________ ... The vector itself consists ofuntil it believes the relevant protocol constraint has passed. Drop Code 1 ("application no longer listening") means the application running at the endpoint that sent the option is no longer listening for data. For example, aseries of bytes, called Blocks, each of whose encoding correspondsserver might close its receiving half-connection toonenew data after receiving a complete request from the client. This would limit the amount ofthese choices: 0state the server would expend on incoming data, and thus reduce the potential damage from certain denial-of-service attacks. A Data Dropped option containing Drop Code 12 3 4 5 6 7 0SHOULD be sent whenever received data is ignored due to a non-listening application. Once a DCCP reports Drop Code 12 3 4 5 6 7 +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |0| Run Length | or |1|Dr St|Run Len| +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ Normal Blockfor a packet, it SHOULD report DropBlockCode 1 for every Kohler/Handley/Floyd/Padhye Section8.7.8.7.2. [Page58]81] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003The first byte insucceeding data packet on that half-connection; once a DCCP receives a Drop State 1 report, it SHOULD expect that no more data will ever be delivered to thefirst Data Dropped option refersother endpoint's application, so it SHOULD NOT send more data. A DCCP receiving Drop Code 1 MAY report this event to the application. (Previous versions of this specification used a "Buffer Closed" option instead of Drop Code 1.) Drop Code 2 ("receive buffer drop") indicates congestion inside the receiving host. Every packetindicated innewly acknowledged as Drop Code 2 SHOULD reduce theAcknowledgement Number; subsequent bytes refer to older packets. (Data Dropped MUST NOTsender's instantaneous rate by one packet per round trip time, using whatever mechanism is appropriate for the relevant CCID. Further details may besent on DCCP-Data or DCCP- Request packets, which lack an Acknowledgement Number.) Normal Blocks, which have high bit 0, indicate that any received packetsavailable in CCID documents. 8.8. Payload Checksum Option The Payload Checksum option holds theRun Length had their data delivered toInternet checksum (the 16 bit one's complement of theapplication. Drop Blocks, which have highone's complement sum) of all 16 bit1, indicate that received packetswords in theRun Len[gth] were not delivered as usual. The 3-bit Dr[op] St[ate] field says what happened; generally, noDCCP payload (the datafrom that packet reached the application. Packets reported as "not yet received" MUST be includedcontained inNormal Blocks;a DCCP-Request, DCCP- Response, DCCP-Data, DCCP-DataAck, or DCCP-Move packet). When combined with a nonzero Checksum Coverage, this lets DCCP distinguish between corruption in a packet's payload and corruption in its header. Corrupted-header packetsnot covered by any Data Dropped option areMUST be treated asif they were in a Normal Block. Defined Drop States for Drop Blocks are: 0 Packet datadroppeddue to protocol constraints. Forby the network, while corrupted-payload packets MAY be treated differently; for example, thedata was included on a DCCP-Request packet,sender's response to corruption might be less stringent than its response to congestion. A low Checksum Coverage lets DCCP process packets with valid headers, even if the payload is corrupt, avoiding the congestion response to corruption. The Payload Checksum option then lets DCCP detect payload corruption, and therefore avoid delivering bad data to the application. The option looks like this: +--------+--------+--------+--------+ |00101101|00000100| Checksum | +--------+--------+--------+--------+ Type=45 Length=4 The receivingapplication does not allow that piggybacking; orDCCP MUST check thedata was sent during an important feature negotiation. 1 Packet data dropped inPayload Checksum's value against the actual payload checksum. If thereceive buffer. 2 Packetvalues differ, the packet's data SHOULD be dropped, and reported as dropped due tocorruption. 3 Packet data corrupted, but delivered to the application anyway. 4 Packet data dropped because the application is no longer listening. 5-7 Reserved. For example, ifcorruption (Drop Code 3) using a Data Dropped optioncontains the decimal values 0,144,3,146,(Section 8.7). Optionally, DCCP MAY provide an API through which theAcknowledgement Numberreceiving application could request delivery of known-corrupt data. When that API is100, and an Ack Vectoractive, the packet's data SHOULD be delivered, but reportedall packetsasreceived, then: Packet 100 was received (Acknowledgement Number 100, Normal Block, Run Length 0). Packet 99 was dropped in the receive bufferdelivered corrupt (DropBlock, Drop State 1, Run Length 0). Packets 98, 97, 96, and 95 were received (Normal Block, Run Length 3). Packets 95, 94, and 93 were dropped inCode 7) using a Data Dropped option. In either case, thereceive buffer (Drop Block, Drop State 1, Run Length 2).packet will be reported as Received or Received ECN Kohler/Handley/Floyd/Padhye Section8.7.8.8. [Page59]82] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003Run lengths of more than 128 (for Normal Blocks)Marked by Ack Vector or16 (for Drop Blocks) must be encoded in multiple Blocks.equivalent options. Asingle Data Droppedpacket processor with access to link-layer error detection mechanisms might explicitly set Payload Checksum to zero when the link layer reported that a portion of the payload was corrupted. No actual Internet checksum has value zero, so this reliably informs the receiver that the payload is corrupt. Note that Payload Checksum's value is included in the header checksum. The Internet checksum used by the Payload Checksum optioncan acknowledge up to 32384 Normal Block data packets, althoughis generally considered weak, but it has thereceiver SHOULD NOTadvantage that all IP processors can already calculate it. Applications desiring a stronger Payload Checksum should either send a checksum with the payload (reporting any checksum violations via the Data Droppedoption when all relevant packets fit into Normal Blocks. Should more packets needAPI), or propose a new checksum option. See Section B.1 for a discussion of the issues related tobe acknowledged than can fit in 253 bytesthe use ofData Dropped, then multiple Data Droppedthis option. 9. Explicit Congestion Notification The DCCP protocol is fully ECN-aware. Each CCID specifies how its endpoints respond to ECN marks. Furthermore, DCCP, unlike TCP, allows senders to control the rate at which acknowledgements are generated (with optionscanlike Ack Ratio); this means that acknowledgements are generally congestion-controlled, and may have ECN-Capable Transport set. A CCID profile describes how that CCID interacts with ECN, both for data traffic and pure-acknowledgement traffic. A sender SHOULD set ECN-Capable Transport on its packets whenever the receiver has its ECN Capable feature turned on and the relevant CCID allows it, unless the sending application indicates that ECN should not besent.used. Thesecond option will begin whererest of this section describes thefirst option left off,ECN Capable feature andso forth. One or more Data Dropped options that, together, reportthestatusinteraction ofmore packets than have been sent, or that changethestatus ofECN Nonce with acknowledgement options such as Ack Vector. 9.1. ECN Capable Feature The ECN Capable feature lets apacket, orDCCP inform its partner thatdisagree with Ack Vector or equivalent options (by reporting a "not yet received" packet as "dropped init cannot read ECN bits from received IP headers, so thereceive buffer", for example), SHOULD be considered invalid.partner must not set ECN-Capable Transport on its packets. Kohler/Handley/Floyd/Padhye Section 9.1. [Page 83] INTERNET-DRAFT Expires: April 2004 October 2003 ECN Capable has feature number 2. ThereceivingECN Capable feature located at DCCPSHOULD respond to invalid Data Dropped options by ignoring themA indicates whether orby resetting the connection with Reasonnot A can successfully read ECN bits from received frames' IP headers. (This is independent of whether it can setto "Option Error". Drop State 4 ("application no longer listening") means the application running at the endpoint thatECN bits on senttheframes.) DCCP A sends a "Change L(ECN Capable, 0)" option to DCCP B to inform B that A cannot read ECN bits. The ECN Capable feature isno longer listening for data. For example,aserver might close its receiving half-connection to new data after receivingserver-priority feature. An ECN Capable feature contains acomplete request from the client. This would limit the amountsingle byte ofstate the server would expenddata. ECN capability is onincoming data,if andthus reduceonly if this byte is nonzero. A new connection starts with ECN Capable 1 (that is, ECN capable) for both DCCPs. If a DCCP is not ECN capable, it MUST send "Change L(ECN Capable, 0)" options to the other endpoint until acknowledged (by "Confirm R(ECN Capable, 0)") or the connection closes. Furthermore, it MUST NOT accept any data until thepotential damage from certain denial-of-service attacks. Aother endpoint sends "Confirm R(ECN Capable, 0)". It SHOULD send Data Droppedoption containingoptions on its acknowledgements, with DropState 4 SHOULD be sent whenever receivedCode 0 ("protocol constraints"), if the other endpoint does send data inappropriately. 9.2. ECN Nonces Congestion avoidance will not occur, and the receiver will sometimes get its data faster, when the sender isignored duenot told about any congestion events. Thus, the receiver has some incentive toa non-listening application. Once a DCCP reports Drop State 4 for a packet, it SHOULD report Drop State 4 for every succeeding data packet onfalsify acknowledgement information, reporting thathalf-connection; once amarked or dropped packets were actually received unmarked. This problem is more serious with DCCPreceives a Drop State 4 report,than with TCP, since TCP provides reliable transport: itSHOULD expect that nois moredata will ever be delivereddifficult with TCP to lie about lost packets without breaking theother endpoint'sapplication.A DCCP receiving Drop State 4 MAY report this eventECN Nonces are a general mechanism to prevent ECN cheating (or loss cheating). Two values for theapplication. (Previous versions of this specification used a "Buffer Closed" option instead of Drop State 4.) 8.8. Payload Checksum Optiontwo-bit ECN header field indicate ECN-Capable Transport, 01 and 10. ThePayload Checksum option holdssecond code point, 10, is the16 bit one's complement ofECN Nonce. In general, a protocol sender chooses between these code points randomly on its output packets, remembering the sequence it chose. The protocol receiver reports, on every acknowledgement, theone's complement sumnumber ofall 16 bit words inECN Nonces it has received thus far. This is called theDCCP payload (the data contained inECN Nonce Echo. Since ECN marking and packet dropping both destroy the ECN Nonce, aDCCP-Request, DCCP-Response, DCCP-Data, DCCP- DataAck,receiver that lies about an ECN mark orDCCP-Move packet). When combined withpacket drop has aChecksum Length50% chance ofless than 15, this lets DCCP distinguish between corruption in a packet's payloadguessing right andcorruption in its header. Corrupted-header packets MUST be treated as dropped by the network, while corrupted- payload packets MAY be treated differently; for example, the sender's response to corruption might be less stringent than its response to congestion. A low Checksum Length lets DCCP process packets with valid headers, even if the payload is corrupt,avoidingthe congestion response to corruption.discipline. ThePayload Checksum option Kohler/Handley/Floyd/Padhye Section 8.8. [Page 60] INTERNET-DRAFT Expires: December 2003 June 2003 then lets DCCP detect payload corruption, and therefore avoid delivering bad datasender may react punitively to an ECN Nonce mismatch, possibly up to dropping theapplication. The option's data looks like this: +--------+--------+--------+--------+ |00101101|00000100| Checksum | +--------+--------+--------+--------+ Type=45 Length=4connection. Thereceiving DCCP MUST check the Payload Checksum's value against the actual payload checksum. IfECN Nonce Echo field need not be an integer; one bit is enough to catch 50% of infractions. In DCCP, thevalues differ,ECN Nonce Echo field is encoded in acknowledgement options. For example, thepacket's data SHOULD be dropped, and reported as dropped dueAck Vector option comes in two forms, Ack Vector [Nonce 0] (option 38) and Ack Vector [Nonce 1] (option 39), corresponding tocorruption (Drop State 2) usingthe two values for aData Dropped option (Section 8.7). Optionally, DCCP MAY provide an API through whichone-bit ECN Nonce Echo. The Kohler/Handley/Floyd/Padhye Section 9.2. [Page 84] INTERNET-DRAFT Expires: April 2004 October 2003 Nonce Echo for a given Ack Vector equals thereceiving application could request deliveryone-bit sum (exclusive- or, or parity) ofknown-corrupt data. When that API is active, the packet's data SHOULD be delivered, butECN nonces for packets reported by that Ack Vector asdelivered corrupt (Drop State 3) using a Data Dropped option. In either case, the packet will be reportedreceived and not ECN marked. Thus, only packets marked asReceived or ReceivedState 0 matter for this calculation (that is, valid received packets that were not ECNMarked bymarked). Every Ack Vectoror equivalent options. See Section 18.1option is detailed enough fora discussion oftheissues relatedsender to determine what theuse ofNonce Echo should have been. It can check thisoption. 9. Explicit Congestion Notification The DCCP protocol is fully ECN-aware. Each CCID specifies how its endpoints respond to ECN marks. Furthermore, DCCP, unlike TCP, allows senders to controlcalculation against therate at which acknowledgements are generated (with options likeactual Nonce Echo, and complain if there is a mismatch. (The AckRatio);Vector could conceivably report every packet's ECN Nonce state, but thismeans that acknowledgements are generally congestion-controlled, and may have ECN-Capable Transport set. A CCID profile describes how that CCID interacts with ECN, both for data traffic and pure-acknowledgement traffic.would severely limit Ack Vector's compressibility without providing much extra protection.) Consider a half-connection from DCCP A to DCCP B. DCCP AsenderSHOULD setECN-Capable TransportECN Nonces on its packets, and remember which packets had nonces, wheneverthe receiver has itsDCCP B reports that it is ECNCapable feature turned onCapable. An ECN-capable endpoint MUST calculate and use therelevant CCID allows it, unless the sending application indicates thatcorrect value for ECNshould not be used. The rest of this section describesNonce Echo when sending acknowledgement options. An ECN-incapable endpoint, however, SHOULD treat the ECNCapable feature and the interaction of theNonce Echo as always zero. When a sender detects an ECN Noncewith acknowledgement optionsEcho mismatch, it SHOULD behave as if the receiver had reported one or more packets as ECN-marked (instead of unmarked). It MAY take more punitive action, such asAck Vector. Kohler/Handley/Floyd/Padhye Section 9. [Page 61] INTERNET-DRAFT Expires: December 2003 June 2003 9.1. ECN Capable Featureresetting the connection. TheECN Capable feature lets aReason for such DCCP-Reset packets SHOULD be set to "Aggression Penalty". An ECN-incapable DCCPinform its partner that it cannot read ECN bits fromSHOULD ignore receivedIP headers, so the partner must not set ECN-Capable Transport on its packets.ECNCapable has feature number 2. Thenonces and generate ECNCapable feature located atnonces of zero. For instance, out of the two Ack Vector options, an ECN-incapable DCCPA indicates whether or not A can successfully readSHOULD generate Ack Vector [Nonce 0] (option 38) exclusively. (Again, the ECNbits from received frames' IP headers. (This is independent of whether it canCapable feature MUST be set to zero in this case.) 9.3. Other Aggression Penalties The ECNbits on sent frames.) DCCP A sendsNonce provides one way for a"Prefer(ECN Capable, 0)" option toDCCPBsender toinform Bdiscover thatA cannot read ECN bits. An ECN Capable feature containsasingle byte of data. ECN capabilityreceiver ison ifmisbehaving. There may be other mechanisms, andonly if this bytea receiver or middlebox may also discover that a sender isnonzero.misbehaving---sending more data than it should. In any of these cases, the entity that discovers the misbehavior MAY react by resetting the connection, with Reason set to "Aggression Penalty". Anew connection startsreceiver that detects marginal (meaning possibly spurious) sender misbehavior MAY instead react with a Slow Receiver option, or by reporting some packets as ECNCapable 1 (that is, ECN capable)marked that were not, in fact, marked. 10. Multihoming and Mobility DCCP provides primitive support forboth DCCPs. Ifmultihoming and mobility via a mechanism for transferring aDCCP is not ECN capable, it MUST send "Prefer(ECN Capable, 0)" options to the other endpoint until acknowledged (by "Change(ECN Capable, 0)") or theconnectioncloses. Furthermore, it MUST NOT accept any data until the otherendpointsends "Change(ECN Capable, 0)". It SHOULD send Data Dropped options on its acknowledgements, with Drop State 0 ("protocol constraints"), if the otherfrom one address to another. The moving endpointdoes send data inappropriately. 9.2. ECN Nonces Congestion avoidance will not occur,must negotiate mobility support Kohler/Handley/Floyd/Padhye Section 10. [Page 85] INTERNET-DRAFT Expires: April 2004 October 2003 beforehand, and both endpoints must share their Connection Nonces. When the moving endpoint gets a new address, it sends a DCCP-Move packet from that address to thereceiver will sometimes getstationary endpoint. The stationary endpoint then changes itsdata faster, whenconnection state to use thesendernew address. DCCP's support for mobility isnot told about any congestion events. Thus,intended to solve only thereceiversimplest multihoming and mobility problems. For instance, DCCP hassome incentive to falsify acknowledgement information, reporting that markedno support for simultaneous moves. Applications requiring more complex mobility semantics, ordropped packets were actually received unmarked. This problem ismoreserious withstringent security guarantees, should use an existing solution like Mobile IP or [SB00]. 10.1. Mobility Capable Feature A DCCPthan with TCP, since TCP provides reliable transport: it is more difficult with TCP to lie about lost packets without breaking the application. ECN Nonces are a general mechanism to prevent ECN cheating (or loss cheating). Two values foruses thetwo-bit ECN header field indicate ECN-Mobility CapableTransport, 01 and 10. The second code point, 10, is the ECN Nonce. In general, a protocol sender chooses between these code points randomly onfeature to inform itsoutput packets, remembering the sequencepartner that itchose. The protocol receiver reports, on every acknowledgement,would like to be able to change its address and/or port during thenumbercourse ofECN Nonces it has received thus far. This is called the ECN Nonce Echo. Since ECN marking and packet dropping both destroytheECN Nonce, a receiver that lies about an ECN markconnection. Mobility Capable has feature number 5. The Mobility Capable feature located at DCCP A indicates whether or not A will accept a DCCP-Move packetdrop hassent by B. DCCP B sends a50% chance of guessing right and avoiding discipline. The sender may react punitively"Change R(Mobility Capable, 1)" option toan ECN Nonce mismatch, possibly upDCCP A todropping the connection. The ECN Nonce Echo field need not be an Kohler/Handley/Floyd/Padhye Section 9.2. [Page 62] INTERNET-DRAFT Expires: December 2003 June 2003 integer; one bit is enoughinform it that B might like tocatch 50% of infractions. In DCCP, the ECN Nonce Echo fieldmove later. Mobility Capable isencoded in acknowledgement options. For example, the Ack Vector option comes in two forms, Ack Vector [Nonce 0] (option 37) and Ack Vector [Nonce 1] (option 38), corresponding to the two values foraone-bit ECN Nonce Echo. The Nonce Echo forserver-priority feature. A Mobility Capable feature contains agiven Ack Vector equals the one-bit sum (exclusive- or, or parity)single byte ofECN nonces for packets reported by that Ack Vector as receiveddata. Mobility is allowed if andnot ECN marked. Thus,onlypackets marked as State 0 matter forif thiscalculationbyte is nonzero. A DCCP MUST reject a DCCP-Move packet referring to a connection when Mobility Capable is 0; however, it MAY reject a valid DCCP-Move packet even when Mobility Capable is 1. A new connection starts with Mobility Capable 0 (that is,valid received packets that were not ECN marked). Every Ack Vector optionmobility isdetailed enoughnot allowed) for both DCCPs. 10.2. Mobility ID A DCCP uses thesenderMobility ID feature to inform its partner of a 64-bit number that will act as identification, should the partner need todetermine whatchange its address and/or port during theNonce Echo should have been. It can check this calculation againstcourse of theactual Nonce Echo, and complain if there is a mismatch. (The Ack Vector could conceivably report every packet's ECN Nonce state, but this would severely limit Ack Vector's compressibility without providing much extra protection.) Consider a half-connection fromconnection. Mobility ID has feature number 9. The Mobility ID feature located at DCCP Ato DCCP B. DCCPis the identifier that ASHOULD set ECN Nonceswill use onits packets, and remember whichDCCP-Move packetshad nonces, wheneverit sends to B. DCCP Breports thatsends a "Change R(Mobility ID, N)" option to DCCP A to inform itis ECN Capable. An ECN-capable endpoint MUST calculate and usethat of thecorrect valueID A has chosen forECN Nonce Echo when sending acknowledgement options. An ECN-incapable endpoint, however, SHOULD treat the ECN Nonce Echo as always zero. WhenB's use. Mobility ID is asender detects an ECN Nonce Echo mismatch, it SHOULD behave as ifnon-negotiable feature. A Mobility ID feature contains eight bytes of data. The feature remote, say DCCP A, chooses thereceiver had reported one or more packets as ECN-marked (insteadvalue ofunmarked). It MAY take more punitive action, such as resettingMobility ID to uniquely identify a connection; its value must not equal theconnection. The Reasonvalue of any Kohler/Handley/Floyd/Padhye Section 10.2. [Page 86] INTERNET-DRAFT Expires: April 2004 October 2003 other Mobility ID currently maintained by DCCP A. For security, DCCP A MUST choose Mobility ID randomly. Furthermore, it MUST reassign Mobility ID after each successful move by DCCP B, and it MAY reassign Mobility ID more frequently. A new connection starts with Mobility ID 0 forsuch DCCP-Reset packets SHOULD be set to "Aggression Penalty". An ECN-incapableboth DCCPs. Zero is not a valid Mobility ID. 10.3. Security The DCCPSHOULD ignore received ECN noncesmobility mechanism, like DCCP in general, does not provide cryptographic security guarantees. Nevertheless, mobile hosts must use valid Mobility IDs andgenerate ECN nonces of zero. For instance, outinclude valid Identifications in their DCCP-Move packets, providing protection against some classes ofthe two Ack Vector options,attackers. Specifically, anECN-incapableattacker cannot move a DCCPSHOULDconnection to a new address unless they know valid Mobility IDs and how to generateAck Vector [Nonce 0] (option 37) exclusively. (Again,valid Identifications. Even with theECN Capable feature MUST be set to zero indefault MD5 Identification Regime, thiscase.) 9.3. Other Aggression Penalties The ECN Nonce provides one way formeans that an attacker must have snooped on every packet in the connection to get a reasonable probability of success, assuming that initial sequence numbers and Connection Nonces are chosen well (that is, randomly). Section 16 further describes DCCPsendersecurity considerations. 10.4. Congestion Control State Once an endpoint has transitioned todiscover thatareceivernew address, the connection ismisbehaving. There mayeffectively a new connection in terms of its congestion control state: the accumulated information about congestion between the old endpoints no longer applies. Both DCCPs MUST initialize their congestion control state (windows, rates, and so forth) to that of a new connection---that is, they must "slow start". Similarly, the endpoints' configured MTUs (see 11) SHOULD beother mechanisms,reinitialized, anda receiver or middleboxPMTU discovery performed again, following an address change. 10.5. Loss During Transition Several loss and delay events mayalso discover that a sender is misbehaving---sending more data than it should. In anyaffect the transition ofthese cases,a DCCP connection from one address to another. The DCCP-Move packet itself might be lost; theentityacknowledgement to thatdiscoverspacket might be lost, leaving themisbehavior MAY react by resettingmobile endpoint unsure of whether theconnection, with Reason settransition has completed; and data from the old endpoint might continue to"Aggression Penalty". Aarrive at the receiverthat detects marginal (meaning possibly spurious) sender misbehavior MAY instead react witheven after the transition. To protect against lost DCCP-Move packets, the mobile host SHOULD retransmit aSlow Receiver option, or by reporting some packets as ECN marked that were not, in fact, marked.DCCP-Move packet if it does not receive an Kohler/Handley/Floyd/Padhye Section9.3.10.5. [Page63]87] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 200310. Multihoming and Mobility DCCP provides primitive support for multihoming and mobility viaacknowledgement within a reasonable time period. Section 5.10 describes the mechanismfor transferringused to protect against duplicate DCCP-Move packets. A receiver MAY drop all data received from the old address/port pair once aconnection endpointDCCP-Move has successfully completed. Alternately, it MAY accept one Loss Window's worth of this data. Congestion and loss events on this data SHOULD NOT affect the new connection's congestion control state. The receiver MUST NOT accept data with the old address/port pair past one Loss Window, and SHOULD send DCCP-Resets in response to those packets. During some transition period, acknowledgements fromone addressthe receiver toanother. The moving endpoint must negotiate mobility support beforehand, andthe mobile host will contain information about packets sent bothendpoints must share their Connection Nonces. Whenfrom themoving endpoint gets a new address, it sends a DCCP-Move packetold address/port pair, and fromthat address tothestationary endpoint.new address/port pair. Thestationary endpoint then changes its connection state to usemobile DCCP should not let loss events on packets from the old address/port pair affect the newaddress. DCCP's supportcongestion control state. 11. Maximum Packet Size A DCCP implementation MUST maintain the maximum packet size (MPS) allowed formobilityeach active DCCP session. The MPS isintended to solve onlyinfluenced by thesimplest multihomingmaximum packet size allowed by the current congestion control mechanism (CCMPS), the maximum packet size supported by the path's links (PMTU, the Path Maximum Transfer Unit) [RFC 1191], and the lengths of the IP andmobility problems. For instance,DCCPhas no support for simultaneous moves. Applications requiring more complex mobility semantics, or more stringent security guarantees,headers. A DCCP application interface should let the application discover DCCP's current MPS. DCCP applications should use the API to discover the MPS. Generally, the DCCP implementation will refuse to send any packet bigger than the MPS, returning anexisting solution like Mobile IP or [SB00]. 10.1. Mobility Capable Featureappropriate error to the application. A DCCPuses the Mobility Capable featureinterface may allow applications toinform its partnerrequest thatit would like topackets larger than PMTU be fragmented. This only matters when CCMPS > PMTU; packets larger than CCMPS MUST be rejected regardless. Fragmentation should not beable to change its address and/or port duringthecoursedefault. The rest of this section assumes theconnection. Mobility Capableapplication hasfeature number 5. The Mobility Capable feature located at DCCP A indicates whether ornotA will accept a DCCP-Move packet sentrequested fragmentation. The MPS reported to the application SHOULD be influenced byB. DCCP B sends a "Change(Mobility Capable, 1)" optionthe size expected to be required for DCCPA to inform it that B might like to move later. A Mobility Capable feature contains a single byte of data. Mobility is allowed ifheaders andonly if this byte is nonzero. A DCCP MUST reject a DCCP-Move packet referring to a connection when Mobility Capable is 0; however, it MAY reject a valid DCCP-Move packet evenoptions. If the application provides data that, whenMobility Capable is 1. A new connection startscombined withMobility Capable 0 (that is, mobility is not allowed) for both DCCPs. 10.2. Security Thethe options the DCCPmobility mechanism,implementation would likeDCCP in general, does not provide cryptographic security guarantees. Nevertheless, mobile hosts must use valid sequence numbers and include valid Identifications in their DCCP-Move packets, providing protection against some classes of attackers. Specifically, an attacker cannot move a DCCP connection to a new address unless they know valid sequence numbers and howtogenerate valid Identifications. Even withinclude, would exceed thedefault MD5 Identification Regime, this means that an attacker must have snoopedMPS, the implementation should either send the options oneverya separate packetin the connection to get(such as areasonable probability of success, assuming that initial sequence numbersDCCP-Ack) or lower the MPS, drop the data, andConnectionreturn an appropriate error to the application. Kohler/Handley/Floyd/Padhye Section10.2.11. [Page64]88] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003Nonces are chosen well (that is, randomly). Section 16 further describesThe PMTU SHOULD be initialized from the interface MTU that will be used to send packets. The MPS will be initialized with the minimum of the PMTU and the CCMPS, if any. To perform PMTU discovery, the DCCPsecurity considerations. 10.3. Congestion Control State Once an endpoint has transitionedsender sets the IP Don't Fragment (DF) bit. However, it is undersirable for MTU discovery toa new address,occur on the initial connectionis effectively a newsetup handshake, as the connectionin termssetup process may not be representative ofits congestion control state:packet sizes used during theaccumulated information about congestion betweenconnection, and performing MTU discovery on theold endpoints no longer applies. Both DCCPs MUST initialize their congestion control state (windows, rates,initial handshake might unnecessarily delay connection establishment. Thus, DF SHOULD NOT be set on DCCP-Request andso forth)DCCP-Response packets. In addition DF SHOULD NOT be set on DCCP-Reset packets, although typically these would be small enough to not be a problem. On all other DCCP packets, DF SHOULD be set. As specified in [RFC 1191], when a router receives a packet with DF set that is larger than the next link's MTU, it sends an ICMP Destination Unreachable message to the source ofa new connection---that is, they must "slow start"---unless they have high-quality information about actual network conditions betweenthetwo new endpoints. Normally,datagram with theonly way to get this information would be by runningCode indicating "fragmentation needed and DF set" (also known as a "Datagram Too Big" message). When a DCCPconnection betweenimplementation receives a Datagram Too Big message, it decreases its PMTU to thenew addresses. Similarly,Next-Hop MTU value given in theendpoints' configured MTUs (see 11) SHOULD be reinitialized, andICMP message. If the MTU given in the message is zero, the sender chooses a value for PMTUdiscovery performed again, following an address change. 10.4. Loss During Transition Several loss and delay events may affectusing thetransitionalgorithm described in Section 7 ofa DCCP connection from one address to another. The DCCP-Move packet itself might be lost;[RFC 1191]. If theacknowledgement to that packet might be lost, leavingMTU given in themobile endpoint unsure of whethermessage is greater than thetransitioncurrent PMTU, the Datagram Too Big message is ignored, as described in [RFC 1191]. (We are aware that this may cause problems for DCCP endpoints behind certain firewalls.) If the DCCP implementation hascompleted;decreased the PMTU, anddata fromtheold endpoint might continuesending application attempts toarrive atsend a packet larger than thereceiver even afternew MPS, thetransition. To protect against lost DCCP-Move packets,API MUST cause themobile host SHOULD retransmit a DCCP-Move packet if it does not receivesend to fail returning anacknowledgement within a reasonable time period. Section 5.9 describes the mechanism usedappropriate error toprotect against duplicate DCCP-Move packets. A receiver MAY drop all data received fromtheold address/port pair once a DCCP-Move has successfully completed. Alternately, it MAY accept one Loss Window's worth of this data. Congestionapplication, andloss events onthe application SHOULD then use the API to query the new value of MPS. When thisdata SHOULD NOT affectoccurs, it is possible that the kernel has some packets buffered for transmission that are smaller than the old MPS, but larger than the newconnection's congestion control state.MPS. Thereceiverkernel MAY send these packets with the DF bit cleared, or it MAY discard these packets; it MUST NOTaccept datatransmit these datagrams with theold address/port pair past one Loss Window, and SHOULD send DCCP- Resets in responseDF bit set. A DCCP implementation may allow the application tothose packets. During some transition period, acknowledgements fromoccasionally request that PMTU discovery be performed again. This will reset thereceiverPMTU to themobile hostoutgoing interface's MTU. Such requests SHOULD be rate limited, to one per two seconds, for example. A successful DCCP- Move willcontain information about packets sent both from the old address/port pair, and fromalso reset thenew address/port pair. The mobilePMTU. A DCCPshould not let loss events on packets fromsender MAY optionally treat theold address/port pair affectreception of an ICMP Datagram Too Big message as an indication that thenew congestion control state.packet being reported was Kohler/Handley/Floyd/Padhye Section10.4.11. [Page65]89] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 200311. Maximum Transfer Unit A DCCP implementation MUST maintain its idea of the current maximum transfer unit (MTU)not lost due congestion, and so foreach active DCCP session. The particular MTU may be influenced bythe purposes of congestion controlmechanisms, as well as Path MTU (PMTU) discovery [RFC 1191]. Any API toit MAY ignore the DCCPMUST allowreceiver's indication that this packet did not arrive. However, if this is done, then theapplication to discover DCCP's current MTU.DCCPapplications SHOULD usesender MUST check theAPI to discoverECN bits of theMTU,IP header echoed in the ICMP message, andSHOULD NOT send datagramsonly perform this optimization if these ECN bits indicate thatare greater thantheMTU.packet did not experience congestion prior to reaching the router whose link MTU it exceeded. With application support, DCCP also allows for upward probing of PMTU [PMTUD]: the application would start by sending small packets, then gradually increase their sizes. A DCCPAPIimplementation supporting this upward probing MAYchoose to let applications indicate that largetreat the loss of packetsshould be fragmented; if an application not using that API tries to sendafter apacket biggerpacket-size increase as an indication of MTU limitation, rather thanthe MTU, thecongestion. XXX 12. Middlebox Considerations This section describes properties of DCCPimplementation MUST dropthat firewalls, network address translators, and other middleboxes should consider, including parts of the packetand return an appropriate error.that middleboxes should not change. ThePMTU SHOULDintent is to draw attention to aspects of DCCP that may beinitializeduseful, or dangerous, for middleboxes, or that differ significantly fromthe interface MTUTCP. The Service Code field in DCCP-Request packets provide information thatwillmay beused to send packets. The MTUuseful for stateful middleboxes. With Service Code, a middlebox can tell what protocol a connection willbe initialized with the minimum of the PMTU and any MTU setuse without relying on port numbers. Middleboxes can disallow attempted connections accessing unexpected services bythe relevant CCID. To perform PMTU discovery, the DCCP sender sets the IP Don't Fragment (DF) bit. However, it is undersirable for MTU discoverysending a DCCP-Reset with Reason set tooccur on"Bad Service Code". Middleboxes probably shouldn't modify theinitial connection setup handshake, asService Code, unless they are really changing the service a connectionsetup process may not be representative ofis accessing. The Source and Destination Port fields are in the same packetsizes used duringlocations as theconnection,corresponding fields in TCP andperforming MTU discovery on the initial handshake might unnecessarily delayUDP, which may simplify some middlebox implementations. Modifying DCCP Sequence Numbers and Acknowledgement Numbers is more tedious and dangerous than modifying TCP sequence numbers. A middlebox that added packets to, or removed packets from, a DCCP connectionestablishment. Thus, DF SHOULD NOT be set on DCCP-Requestwould have to modify, at least: (1) acknowledgement options, such as Ack Vector; (2) CCID-specific options, such as TFRC's Loss Intervals; andDCCP-Response packets. In addition DF SHOULD NOT be set on DCCP-Reset packets, although typically these(3) Identification options---for example, the default MD5 Identification Regime includes sequence numbers in its cryptographic hash. On ECN-capable connections, the middlebox wouldbe small enoughhave to keep track of ECN Nonce information for packets it introduced or removed, so that the relevant acknowledgement options continued to have correct ECN Nonce Echoes, or risk the connection Kohler/Handley/Floyd/Padhye Section 12. [Page 90] INTERNET-DRAFT Expires: April 2004 October 2003 being reset for "Aggression Penalty". We therefore recommend that middleboxes notbe a problem. On all other DCCP packets, DF SHOULD be set. As specified in [RFC 1191], when a router receives amodify packetwith DF setstreams by adding or removing packets. Note that there islarger than the PMTU, it sends an ICMP Destination Unreachable messageless need to modify DCCP's per-packet sequence numbers than TCP's per-byte sequence numbers; for example, a middlebox can change thesourcecontents ofthe datagram with the Code indicating "fragmentation needed and DF set" (also known as a "Datagram Too Big" message). When a DCCP implementation receivesaDatagram Too Big message, it decreasespacket without changing itsPMTUsequence number. (In TCP, sequence number modification is required tothe Next-Hop MTU value givensupport protocols like FTP that carry variable-length addresses in theICMP message.data stream. If such an application were deployed over DCCP, middleboxes would simply grow or shrink theMTU given in the message is zero, the sender chooses a value for PMTU usingrelevant packets as necessary, without changing their sequence numbers. This might involve fragmenting thealgorithm described in Section 7packet.) Middleboxes may, of[RFC 1191]. If the MTU givencourse, reset connections in progress. Clearly this requires inserting a packet into one or both packet streams, but themessage is greater than the current PMTU, the Datagram Too Big messagedifficult issues do not arise. DCCP isignored, as describedsomewhat unfriendly to "connection splicing" [SHHP00], in[RFC 1191]. (Wewhich clients' connection attempts areawareintercepted, but possibly later "spliced in" to external server connections via sequence number manipulations. A connection splicer at minimum would have to ensure that the spliced connections agreed on all relevant feature values, which might take some renegotiation. Middleboxes that want to trivially support the MD5 Identification Regime (Section 6.5.3) should not alter packets' Sequence Number, Type, # NDP, Acknowledgement Number, and Reserved fields, or the Connection Nonce feature values, which are included in the MD5 hash sent with Identification and Challenge options. The contents of thismay cause problemssection should not be interpreted as a wholesale endorsement of stateful middleboxes. 13. Abstract API API issues for DCCPendpoints behind certain firewalls.) If the DCCP implementation has decreased the PMTU, and the sending application attemptsare discussed in another Internet-Draft, in progress. 14. Multiplexing Issues In contrast tosendTCP, DCCP does not offer reliable ordered delivery. As apacket larger than the new MTU, the API MUST cause the sendconsequence, with DCCP there are no inherent performance penalties in layering functionality above DCCP tofail returning an appropriate errormultiplex several sub-flows into a single DCCP connection. If it is desired to share congestion control state among multiple DCCP flows that share theapplication,same source and destination addresses, theapplication SHOULD then use the APIpossibilities are to add DCCP-specific mechanisms to enable this, or Kohler/Handley/Floyd/Padhye Section11.14. [Page66]91] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003query the new value ofto use a generic multiplexing facility like thePMTU. When this occurs, it is possible thatCongestion Manager [RFC 3124] residing below thekernel hastransport layer. For somepackets buffered for transmission that are smaller thanDCCP flows, theold PMTU, but larger thanability to specify thenew PMTU. The kernel MAY sendcongestion control mechanism might be critical, and for thesepackets withflows theDF bit cleared, or it MAY discard these packets;Congestion Manager will only be a viable tool if itMUST NOT transmit these datagrams with the DF bit set.allows DCCPcurrently provides no waytoincrease the PMTU once it has decreased. A DCCP sender MAY optionally treatspecify thereception of an ICMP Datagram Too Big message as an indication thatcongestion control mechanism used by thepacket being reported was not lost due congestion, and soCongestion Manager for that flow. Thus, to allow thepurposessharing of congestion controlit MAY ignorestate among multiple DCCP flows, the alternatives seem to be to add DCCP-specific functionality to the Congestion Manager, or to add a similar layer below DCCPreceiver's indicationthatthis packet did not arrive. However, if thisisdone, then thespecific to DCCP. We defer issues of DCCPsender MUST check the ECN bitsoperating over a revised version of theIP header echoed in the ICMP message, and only perform this optimization if these ECN bits indicate thatCongestion Manager, or over a DCCP-specific module for thepacket did not experiencesharing of congestionpriorcontrol state, toreaching the router whose MTU it exceeded. 12. Middlebox Considerations This section describes properties oflater work. 15. DCCPthat firewalls, network address translators,andother middleboxes should consider, including parts of the packet that middleboxes should not change.RTP TheintentReal-Time Transport Protocol, RTP [RFC 3550], isto draw attention to aspectscurrently used over UDP by many of DCCP's target applications (for instance, streaming media). This section therefore discusses the relationship between DCCPthat may be useful, or dangerous, for middleboxes, or that differ significantly from TCP. The Service Name fieldand RTP, and inDCCP-Request packets provide information that may be useful for stateful middleboxes. With Service Name, a middlebox can tell what protocol a connection will use, without relying on port numbers. Middleboxes MAY disallow attempted connections with zero Service Names by sending a DCCP-Reset. Middleboxes SHOULD NOT modifyparticular, theService Name. The Source and Destination Port fieldsquestion of whether any changes in RTP are necessary or desirable when it is layered over DCCP instead of UDP. There are two potential sources of overhead in thesameRTP-over-DCCP combination, duplicated acknowledgement information and duplicated sequence numbers. We argue that together, these sources of overhead add slightly more than 4 bytes per packetlocations asrelative to RTP-over-UDP, and that eliminating thecorresponding fields in TCPredundancy would not reduce the overhead. First, consider acknowledgements. Both RTP andUDP, which may simplify some middlebox implementations. Middleboxes MUST NOT modifyDCCPpackets' Sequence Number, Acknowledgement Number, and # NDP fields in orderreport feedback about loss rates toadd or removedata senders, via Real-Time Control Protocol Sender and Receiver Reports (RTCP SR/RR packets) and via DCCP acknowledgement options. These feedback mechanisms are potentially redundant. However, RTCP SR/RR packetsfrom a packet stream. Anycontain information not present in DCCP acknowledgements, suchmodification would affectas "interarrival jitter", and DCCP's acknowledgements contain information not transmitted by RTCP, such as theendpoints' accounting of which packets have been lost, destroyECN Nonce Echo. Neither feedback mechanism encompasses theIdentification mechanism,other. Sending both types of feedback isn't particularly costly either. RTCP reports are sent relatively infrequently: once every 5 seconds, for low-bandwidth flows. In DCCP, some feedback mechanisms are expensive---Ack Vector, for example, is frequent andconfuse theverbose---but others are relatively cheap: CCID 3 (TFRC) acknowledgements take between 16 and 32 bytes of options sent once per round trip time. (Reporting less frequently than once per RTT would make congestion controlmechanisms in use. Note that there islessneedresponsive tomodify DCCP's per-packet sequence numbers than TCP's per-byte sequence numbers; for example, a middlebox can change the contents of a packet withoutloss.) We therefore conclude that acknowledgement overhead in RTP-over-DCCP is not significantly Kohler/Handley/Floyd/Padhye Section12.15. [Page67]92] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003changing its sequence number. (In TCP, sequence number modification is required to support protocols like FTP that carry variable-length addresses inhigher than for RTP-over-UDP, at least for CCID 3. One clear redundancy can be addressed at thedata stream. If suchapplication level. The verbose packet-by-packet loss reports sent in RTCP Extended Reports (RTCP XR) Loss RLE Blocks can be derived from DCCP's Ack Vector options. (The converse is not true, since Loss RLE Blocks contain no ECN information.) Since DCCP implementations should provide an API for applicationwere deployed over DCCP, middleboxes would simply growaccess to Ack Vector information, RTP-over-DCCP applications might request either DCCP Ack Vectors orshrinkRTCP Extended Report Loss RLE Blocks, but not both. Now consider sequence number redundancy on data packets. The embedded RTP header contains a 16-bit RTP sequence number. Most data packets will use therelevantDCCP-Data type; DCCP-DataAck and DCCP-Ack packetsas necessary,need not usually be sent. The DCCP-Data header is 12 bytes long withoutchanging theiroptions, including a 24-bit sequencenumbers.) The exception to this rulenumber. This isthat middleboxes MAY reset connections in progress. Clearly this requires inserting4 bytes more than apacket into one or both packet streams, as well as dropping all laterUDP header. Any options required on data packets would add further overhead, although many CCIDs (for instance, CCID 3, TFRC) don't require options onthe connection. This does not explicitly prevent onemost data packets. The DCCP sequence numbermodification occasionally seen with TCP, namely proxies with "connection splicing" [SHHP00]. Such proxies intercept TCP connection attemptscannot be inferred froma client, but may later "splice"the RTP sequence number since it increments on non-data packets as well as datafrom an external server connection into that client connection viapackets. The RTP sequence numbermanipulations. Packets are not added to or removedcannot be inferred from thespliced- in stream, reducing theDCCP sequence numberissues somewhat. Nevertheless, DCCP, with its extensive end-to-end feature negotiation, is inherently unfriendly to the ideaeither; for instance, RTP sequence numbers might be sent out ofconnection splicing: the proxy would have to ensure that the server chose the same feature values that the proxy had previously negotiated with the client.order. Furthermore,Identification optionsremoving RTP's sequence number wouldrequire special handling; and there may be othernot save any header space because of alignment issues. Wesuggesttherefore recommend that RTP transmitted over DCCPsplicing, if implemented, should take place at the application level. A middlebox that wants to trivially support the MD5 Identification Regime (Section 6.4.3) MUST NOT alter packets' Sequence Number, Type, CCval, Acknowledgement Number, and Reserved fields, or the Connection Nonce feature values, which are included inuse theMD5 hash sent with Identification and Challenge options.same headers currently defined. Thecontents of this section SHOULD NOT be interpreted as4 byte header cost is awholesale endorsement of stateful middleboxes. 13. Abstract API API issuesreasonable tradeoff forDCCP are discussed in another Internet-Draft, in progress. 14. Multiplexing Issues In contrastDCCP's congestion control features and access toTCP,ECN. Truly bandwidth-starved endpoints should use header compression. 16. Security Considerations DCCP does notoffer reliable ordered delivery. As a consequence, withprovide cryptographic security guarantees. Applications desiring hard security should use IPsec or end-to-end security of some kind. Nevertheless, DCCPthere are no inherent performance penalties in layering functionality aboveis intended to protect against some classes of attackers. Attackers cannot hijack a mobility-incapable DCCP connection (close the connection unexpectedly, or cause attacker data tomultiplex several sub-flows intobe accepted by an endpoint as if it came from the sender) unless they can guess valid sequence numbers. Thus, as long as endpoints choose initial sequence numbers well, asingleDCCPconnection.attacker must snoop on data packets to get any reasonable probability of success. The sequence number validity (Section 5.2) mechanism provide this guarantee. We also avoid leaking sequence numbers to possibly malicious endpoints. This is why invalid DCCP-Moves are ignored rather than reset, for example. Kohler/Handley/Floyd/Padhye Section14.16. [Page68]93] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003If it is desired to share congestion control state among multiple DCCP flows that share the same source and destination addresses,16.1. Security Considerations for Mobility Mobility slightly changes this security guarantee by introducing a new mechanism by which an attacker can hijack a connection. This mechanism, DCCP-Move, has thepossibilities are to add DCCP-specific mechanisms to enable this, or to useunfortunate property that, given ageneric multiplexing facility likesuccessful attack, theCongestion Manager [RFC 3124] residing belowvictim could not realize that thetransport layer. For someconnection has been stolen---its connection would simply be reset unexpectedly. Nevertheless, a DCCPflows, the abilityattacker still must snoop on data packets tospecify the congestion control mechanism might be critical, and for these flows the Congestion Manager will only beget any reasonable probability of success. Specifically, an attacker can send a valid DCCP-Move packet if it can guess a valid Mobility ID AND it can generate valid Identification options. DCCP- Move packets need not contain valid Sequence or Acknowledgement Numbers, since a move might often follow aviable tool if it allows DCCPlong burst of loss, so endpoints must choose these values well tospecify the congestion control mechanism used byprevent attack. Randomly choosing Connection Nonces and Mobility IDs should suffice, although we are concerned about theCongestion Manager forfact thatflow. Thus, to allowMobility IDs do not expire like sequence numbers do [[XXX]]. 16.2. Security Considerations for Partial Checksums The partial checksum facility has separate security impact, particularly in its interaction with authentication and encryption mechanisms. The impact is thesharing of congestion control state among multiplesame in DCCPflows,as in thealternatives seem to be to add DCCP-specific functionality toUDP-Lite protocol, and what follows was adapted from theCongestion Manager, or to addcorresponding text in the UDP-Lite specification [UDP-LITE]. When asimilar layer belowDCCPthatpacket's Checksum Coverage field isspecific to DCCP. We defer issuesnot zero, the uncovered portion ofDCCP operating overarevised version ofpacket may change in transit. This is contrary to theCongestion Manager, or over a DCCP-specific module foridea behind most authentication mechanisms: authentication succeeds if thesharingpacket has not changed in transit. Unless authentication mechanisms that operate only on the sensitive part ofcongestion control state, to later work. 15. DCCPpackets are developed andRTP The real-time transport protocol, RTP [RFC 1889], is currently used (over UDP) by many of DCCP's target applications (for instance, streaming media). This section therefore discusses the relationship betweenused, authentication will always fail for partially-checksummed DCCPand RTP, and in particular,packets whose uncovered part has been damaged. The IPsec integrity check (Encapsulation Security Protocol, ESP, or Authentication Header, AH) is applied (at least) to thequestionentire IP packet payload. Corruption ofwhetheranychangesbit within that area will then result inRTP are necessary or desirable when it is layered overthe IP receiver discarding a DCCPinsteadpacket, even if the corruption happened in an uncovered part ofUDP. The main issue here is header size: athe DCCPheaderpayload. When IPsec isat least 4 bytes larger thanused with ESP payload encryption, aUDP header. There are two potential sources of overhead inlink can not determine theRTP-over-DCCP combination, duplicated acknowledgement information and duplicated sequence numbers. We argue that together, these sourcesspecific transport protocol ofoverhead add just 4 bytes pera packetrelative to RTP-over-UDP, and that eliminatingbeing forwarded by inspecting theredundancy would not reduceIP packet payload. In this case, theoverhead. However, particular CCIDs might make productive use oflink MUST provide a standard integrity check covering thespace occupied by RTP's sequence number. First, consider acknowledgements. The information onentire IP packetloss that RTP communicates via RTCP SR/RR packets is communicated byand payload. DCCPvia acknowledgement options. Much of the informationpartial checksums provide no benefit inan RTCP receiver report couldthis case. Kohler/Handley/Floyd/Padhye Section 16.2. [Page 94] INTERNET-DRAFT Expires: April 2004 October 2003 Encryption (e.g., at the transport or application levels) may bedivined from DCCP acknowledgements, depending onused. Note that omitting an integrity check can, under certain circumstances, compromise confidentiality [BEL98]. If a few bits of an encrypted packet are damaged, theCCID indecryption transform will typically spread errors so that the packet becomes too damaged to be of use.Acknowledgement options, such as Ack Vector,Many encryption transforms today exhibit this behavior. There exist encryption transforms, stream ciphers, which do not cause error propagation. Proper use of stream ciphers can befrequent and verbose, whereas RTCP reports are sent only rarely, with a minimum intervalquite difficult, especially when authentication-checking is omitted [BB01]. In particular, an attacker can cause predictable changes to the ultimate plaintext, even without being able to decrypt the ciphertext. 17. IANA Considerations DCCP introduces several sets of5 seconds between reports [RFC 1889]. However, not all CCIDsnumbers whose values should be allocated by IANA. The following sets of numbers should requiresuch verbose acknowledgements. CCID 3 (TFRC) reports acknowledgements atan IETF standards-track specification as alow rate---between 16 and 32 bytes ofprerequisite for new registrations. o DCCP Packet Types 9 through 15 (Section 5.1). o 8-bit DCCP-Reset Reasons (Section 5.9). o 8-bit DCCP Option Types (Section 6). The CCID-specific options(depending on ECN usage), sent once per round trip time. This is128 through 255 need notan undue burden. Furthermore, the options are Kohler/Handley/Floyd/Padhye Section 15. [Page 69] INTERNET-DRAFT Expires: December 2003 June 2003 necessary to implement responsive congestion control, and we cannot report less frequently,be allocated by IANA, althoughwe might design alternative acknowledgement optionsparticular CCIDs may request thattake fewer bytes.IANA allocate their CCID-specific options. o 8-bit DCCPgives the application the trade off between small packet overhead and the precise feedback providedFeature Numbers (Section 6.4). The CCID-specific features 128 through 255 need not be allocated by IANA, although particular CCIDs may request that IANA allocate their CCID- specific features. o 8-bit DCCP Congestion Control Identifiers (CCIDs) (Section 7). o 16-bit Identification Regimes, for use with DCCP Identification and Challenge options (Section 6.5). o AckVector. While RTP receiver reports might be considered "redundant" in the presence of DCCP's more precise acknowledgements, theyVector States (Section 8.5). Only State 2 remains unallocated. o Data Dropped Drop Codes 4 through 6 (Section 8.7). 32-bit Service Codes (Section 5.5), which aresent so infrequently that it isnotworth optimizing them away. Also, note that in the common case ofspecific to DCCP, will require more liberal registration rules. Service Codes are meant to correspond to application-level services. For example, there might be aone-wayService Code for HTTP connections, one for FTP control connections, and one for FTP datastream, acknowledgement packets contain no data, so acknowledgement header size (as distinctconnections. However, a Kohler/Handley/Floyd/Padhye Section 17. [Page 95] INTERNET-DRAFT Expires: April 2004 October 2003 special-purpose Web server might use a Service Code different fromcongestion onHTTP's to indicate its function. We suggest that IANA allocate Service Codes to anyone who asks, subject to theacknowledgement path)following guidelines. o No specification, standards-track or otherwise, isnot an issue. We now consider sequence number redundancy on data packets. The embedded RTP header containsrequired to request a16-bit RTP sequence number. Most data packets will use the DCCP-Data type; DCCP-DataAck and DCCP-Ack packets need not usuallyService Code. o Service Codes should besent. The DCCP-Data header is 12 bytes long without options, includingallocated one at a24-bit sequence number. This is 4 bytes more thantime, or in small blocks. A particular intended service should be described, in aUDP header. Any options required on data packets would add further overhead, although many CCIDs (for instance, CCID 3 [TFRC]) don't require options on most data packets. The DCCP sequence number cannotshort English phrase, before a Service Code can beinferred fromallocated. o IANA should maintain an association of Service Codes to theRTP sequence number since it increments on non-data packets as well as data packets.corresponding short English phrases. o Users may request specific Service Code values. TheRTP sequence number couldrequested values should beinferred from the DCCP sequence number, though; it might equal the DCCP sequence number minus the total number of non-data packets seen so far inassigned first-come first-serve. We suggest that users request Service Codes that can be interpreted as meaningful four-byte ASCII strings. Thus, the "Frobodyne Plotz Protocol" might correspond to "fdpz", or theconnection (as tracked by DCCP's # NDP header field). Removing RTP's sequencenumberwould not save any header space because1717858426. The canonical interpretation ofalignment issues. However, particular DCCP CCIDs might make usea Service Code field is numeric. o The subset of Service Codes in which the16 bits occupied by the RTP sequence number. Therefore, particular DCCP CCIDs MAY provide optional CCID-specific features that store DCCP quantitieshigh-order byte has a value between 65 and 90, inclusive---the capital letters inplace ofASCII---should be reserved for international standard or standards-track specifications, IETF or otherwise. o Furthermore, theembedded RTP sequence number. A conforming DCCP would writesubset of Service Codes in which thecalculated RTP sequence number before passinghigh-order byte has thepacket to RTP. (The DCCP checksum would usevalue 63---ASCII '?'---should never be allocated. These Service Codes are reserved for private use. o Service Code 0 should never be allocated either. It represents the absence of a meaningful Service Code. This design for Service Code allocation is based on the allocation of 4-byte identifiers for Macintosh resources, PNG chunks, and TrueType and OpenType tables. Finally, DCCPquantity, notrequires a Protocol Number to be added to theRTP sequence number.) Given RTP-over-DCCP's small overhead, however,registry of Assigned Internet Protocol Numbers. Experimental implementorsdemanding tiny headers will probably prefer more comprehensive header compression to this ad-hoc compression technique. 16. Security Considerations DCCP does not provide cryptographic security guarantees. Applications desiring hard securityshould useIPsec or end-to-end securityProtocol Number 33 for DCCP, but this number may change in future. 18. Thanks There is a wealth ofsome kind.work in this area, including the Congestion Manager. Kohler/Handley/Floyd/Padhye Section16.18. [Page70]96] INTERNET-DRAFT Expires:December 2003 JuneApril 2004 October 2003Nevertheless, DCCP is intended to protect against some classesWe thank the staff and interns of ICIR and, formerly, ACIRI, the members of the End-to-End Research Group, and the members ofattackers. Attackers cannot hijack a DCCP connection (closetheconnection unexpectedly, or cause attacker data to be accepted by an endpoint as if it came fromTransport Area Working Group for their feedback on DCCP. We especially thank thesender) unless they can guess valid sequence numbers. Thus, as long as endpoints choose initial sequence numbers well, aDCCPattacker must snoop on data packets to get any reasonable probability of success. The sequence number validity (Section 5.2), Identification (Section 6.4.3),expert reviewers: Greg Minshall, Eric Rescorla, andmobility (Section 10) mechanisms provide this guarantee.Magnus Westerlund for detailed written comments and problem spotting, and Rob Austein and Steve Bellovin for verbal comments and written notes. We alsoavoid leaking sequence numbers to possibly malicious endpoints. For instance, thisthank those who provided comments and suggestions via the DCCP BOF, Working Group, and mailing lists, including Damon Lanphear, Patrick McManus, Sara Karlberg, Kevin Lai, Youngsoo Choi, Dan Duchamp, Gorry Fairhurst, Derek Fawcus, David Timothy Fleeman, John Loughney, Ghyslain Pelletier, Tom Phelan, Stanislav Shalunov, Yufei Wang, and Michael Welzl. In particular, Michael Welzl suggested the Payload Checksum option. A. Appendix: Ack Vector Implementation Notes This appendix discusses particulars of DCCP acknowledgement handling, in the context of an abstract implementation for Ack Vector. It iswhy invalid DCCP-Moves are ignored,informative rather thanreset. 17. IANA Considerations DCCP introduces six setsnormative. The first part ofnumbers whose values should be allocated by IANA.our implementation runs at the HC-Receiver, and therefore acknowledges data packets. It generates Ack Vector options. The implementation has the following characteristics: o32-bit Service Names (Section 5.4; not exclusiveAt most one byte of state per acknowledged packet. o O(1) time toDCCP).update that state when a new packet arrives (normal case). o8-bit DCCP-Reset Reasons (Section 5.8).Cumulative acknowledgements. o8-bit DCCP Option Types (Section 6).Quick removal of old state. TheCCID-specific options 128 through 255 need notbasic data structure is a circular buffer containing information about acknowledged packets. Each byte in this buffer contains a state and run length; the state can beallocated by IANA. o 8-bit DCCP Feature Numbers (Section 6.3). The CCID-specific features 128 through 255 need0 (packet received), 1 (packet ECN marked), or 3 (packet notbe allocated by IANA. o 8-bit DCCP Congestion Control Identifiers (CCIDs) (Section 7).yet received). The buffer grows from right to left. The implementation maintains five variables, aside from the buffer contents: o16-bit Identification Regimes, for use with DCCP Identification"buf_head" andChallenge options (Section 6.4). In addition, DCCP requires a Protocol Number to be added to"buf_tail", which mark theregistrylive portion ofAssigned Internet Protocol Numbers. Experimental implementors should use Protocolthe buffer. o "buf_ackno", the Acknowledgement Number33 for DCCP, but this number may changeof the most recent packet acknowledged infuture. 18. Design Motivation Inthesection we attemptbuffer. This corresponds tocapture some oftherationale behind specific details"head" pointer. Kohler/Handley/Floyd/Padhye Section A. [Page 97] INTERNET-DRAFT Expires: April 2004 October 2003 o "buf_nonce", the one-bit sum (exclusive-or, or parity) ofDCCP design. 18.1. CSlenthe ECN Nonces received on all packets acknowledged by the buffer with State 0. We draw acknowledgement buffers like this: +-------------------------------------------------------------------+ |S,L|S,L|S,L|S,L| | | | | |S,L|S,L|S,L|S,L|S,L|S,L|S,L|S,L| +-------------------------------------------------------------------+ ^ ^ buf_tail buf_head, buf_ackno = A bu