view Side-By-Side changes
Network Working Group Richard Price, Siemens/Roke Manor INTERNET-DRAFT Hans Hannu, Ericsson Expires:AugustSeptember 2002 Carsten Bormann, TZI/Uni Bremen Jan Christoffersson, Ericsson Zhigang Liu, Nokia Jonathan Rosenberg, dynamicsoftFebruary 14,March 1, 2002 Signaling Compression (SigComp)<draft-ietf-rohc-sigcomp-04.txt><draft-ietf-rohc-sigcomp-05.txt> Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF ROHC WG. Comments should be directed to its mailing list,rohc@cdt.luth.se.rohc@ietf.org. Abstract This document defines SigComp, a solution for compressing messages generated bytext-basedapplication protocols such asSIP[SIP] andRTSP[RTSP]. The architecture and pre-requisites of SigComp are outlined, along with the format of the SigComp message. Decompression functionality for the SigComp solution is provided by a "Universal Decompressor Virtual Machine" optimized for the task of running decompression algorithms. The UDVM can be configured to understand the output of many well-known compressors such as [DEFLATE]. Price, Hannu, et al. [Page 1] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002 Table of contents 1. Introduction..................................................2 2. Terminology...................................................3 3. SigCompArchitecture..........................................5Architecture..........................................6 4. SigComp message flow..........................................11 5. SigCompcompressor............................................15compressor............................................14 6. State handling andcapability announcement....................18announcement...............................16 7. Overview of theUDVM..........................................22UDVM..........................................20 8. Decompressing a SigCompmessage...............................26message...............................23 9. UDVM instructionset..........................................30set..........................................26 10. Securityconsiderations.......................................41considerations.......................................38 11. IANAconsiderations...........................................43considerations...........................................40 12.Acknowledgements..............................................43Acknowledgements..............................................41 13. AuthorsĘaddresses............................................44addresses............................................41 14.References....................................................45References....................................................42 Appendix A.Mnemonic language.....................................46 Appendix B. Example application-defined parameters................48 Appendix C. Example decompression algorithms......................49 Appendix D.Documenthistory......................................51history......................................43 1. Introduction The Session Initiation Protocol(SIP)[SIP], along with many other application protocols used for multimedia communications such asRTSP[RTSP], is a textual protocol engineered for bandwidth rich links. As a result,theSIP messages have not been optimized in terms of size. Typical SIP messagesarerange from a few hundred bytes to as high as2000.two thousand. To date, this has not been a significant problem. With the planned usage of these protocols in wireless handsets as part of 2.5G and 3G cellular networks, the large size of these messages is problematic. With low-rate IP connectivity, store-and- forward delays are significant. Taking into account retransmits, and the multiplicity of messages that are required in some flows, call setup and feature invocation are adversely affected. Therefore, we believe there is merit in reducing these message sizes. This document outlines the architecture and pre-requisites of the SigCompsolution including the capability announcement and UDVM algorithm upload, along withsolution, the format of the SigCompmessage.message, algorithm upload, and the Universal Decompressor Virtual Machine (UDVM) that provides decompression functionality. SigComp istypicallyoffered to applications as a "shim" layer between the application and the transport. The service provided is that of the underlying transport plus compression. Both connection-oriented and connectionless transports are supported by SigComp. This document focuses on the signaling scenario where anendpoint sends and receives data to/from an outbound/inboundend-terminal communicates with a proxy.However,However SigCompis designed to run over both connectionless and connection- oriented transports and hencemay be applicable to other scenarios with multiple endpoints compressing and decompressing data. Price, Hannu, et al. [Page 2] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. SigComp The overall solution for signaling compression, comprising the compressor, decompressor, dispatchers and state handler. Application For the purpose of this document, an application is a text-based protocol software that: a) sends application data to the compressor dispatcher b) receives data from the decompressor dispatcher c)decides whetherauthenticates the sender of a decompressed message and gives permission for stateinformation mayto be saved in the sender's name Application message An uncompressed message provided to or from the application. Endpoint One instance of an application plus a SigComp layer. Each endpoint is capable of sending and/or receiving SigComp messages. Endpoint identity A unique indicator assigned to each endpoint by the application (for example an URI). The application authenticates the sender of a decompressed message, and provides their endpoint identity to the SigComp state handler. Transport Mechanism for passing data between twoinstances of an application.endpoints. SigComp is capable of sending messages over a wide range of transports including TCP, UDP and [SCTP]. Message-based transport A transport that carries data as a set ofdistinct,bounded messages. Stream-based transport A transport that carries data as a continuous stream with no message boundaries.In this case,Price, Hannu, et al. [Page 3] INTERNET-DRAFT SigCompreserves the character 0xFFFF to delimit messages in the compressed stream.March 1, 2002 Application-defined parameters Parameters that must be agreed upon by theapplicationslocal and remote endpoints invoking SigComp.Depending onValues for thesituation theseapplication-defined parametersmight beare typically fixeda-priori or negotiated. Application message An uncompressed message, as provided from or to the application, which istobe compressed by the compressor. When delivered from the decompressor the data has passed throughmeet thedecompression process and is referred to as decompressed data orrequirements of adecompressed message. Price, Hannu, et al. [Page 3] INTERNET-DRAFT SigComp February 14 , 2002particular signaling application. SigComp message May contain a compressed application message in the form of UDVM bytecode. In case of amessage- based transport,message-based transport such as UDP, a SigComp message corresponds to exactly one (UDP) datagram. For a stream-basedtransport,transport such as TCP, each SigComp message is separated by a0xFFFFreserved delimiter. Standalone SigComp message A SigComp message that does not include any compressed application data. Certain signaling applications may not allow standalone SigComp messages due to security requirements. CompressorThe compressorEntity that invokesthean encoder, and keeps track of states that can be used for compression. It is responsible for supplying UDVM bytecode to the remote decompressor in order for compressed data to be decompressed. Encoder Encodes data according to a(compression) algorithm into UDVM bytecode. The encoded data can be decoded by a UDVM.particular compression algorithm. Compressor dispatcherA layerEntity that receives uncompressed application messages, invokes a compressor, and forwards the resulting SigComp messages to a remoteSigComp layer.endpoint. DecompressorThe decompressorEntity that is responsible for converting a SigComp message into uncompressed data. Decompression functionality is provided by the UDVM. Decompressor dispatcherA layerEntity that receives SigComp messages, invokes a decompressor, and forwards the decompressed application messages to an application. Price, Hannu, et al. [Page 4] INTERNET-DRAFT SigComp March 1, 2002 Virtual machine A machine architecture designed to be implemented in software (although silicon implementations are of course possible). Universal Decompressor Virtual Machine (UDVM) The virtual machine described in this document. The UDVM is used for decompression of SigComp messages. Bytecode Machine code that can be executed by a virtual machine. UDVM bytecode is a combination of UDVM instructions and compressed data.Price, Hannu, et al. [Page 4] INTERNET-DRAFT SigComp February 14 , 2002Per-message compression Compression that does not reference data from previous messages. SigComp can decompress a message of this type using only the application-defined parameters and the data in the message itself. Dynamic compression Compression relative to messages sent prior to the current compressed message. SigComp stores and retrieves this data using the state handler. State Data saved for retrieval by later SigComp messages.The dataAn item of state typically reflects the contents of the UDVM memory after decompressing a message, but state can also besavedcreated by the compressor or by the application. State handler Entity responsible for storing and accessing state information once permission is granted by the application. State identifier Reference used to access an item of state previouslysavedcreated by the compressor, the decompressor or the application. CPU cycles A measure of the amount of "CPU power" required to execute a UDVM instruction (the simplest UDVM instructions require a single CPU cycle). An upper limit is placed on the number of cycles that can be used to decompress each bit in a compressed message. Price, Hannu, et al. [Page 5] INTERNET-DRAFT SigComp March 1, 2002 3. SigComp Architecture In the SigComp architecture compression and decompression is performed at two communicatingendpoints.entities. SigComp is offered to applications as a "shim" layer between the application and the underlying transport, and so these entities are endpoints when viewed from a transport layer perspective. Note however that from the application perspective SigComp is applied on a per-hop basis. Figure 1 shows the layout of a communicating endpoint that implements a SigComp layer. The figure does not mandate any particular implementation, but is shown to the reader for the sake of clarity. The SigCompis typically offered to applications as a "shim"layerbetween the application andis further decomposed in thetransport. Note however that for certain applicationsfollowing components: - A compressor dispatcher: this is thecompressed SigComp message may be passed back tointerface from the application. The compressor dispatcher receives an applicationitselfmessage and an identifier foradditional processing before transmission. For example,theapplication may wish to apply encryption toreceiving endpoint. Based on thecompressedendpoint identity the compressor dispatcher invokes a particular compressor, which returns a SigComp messagebefore handing itthat is forwarded to thetransport. Theremote SigComplayer is common for several text based protocol applications (identified as Application 1 and Application 2 in Figure 1). These applications are not part of the SigComp layer. Price, Hannu, et al. [Page 5] INTERNET-DRAFT SigComp February 14 , 2002 The SigComp layer is further decomposed in the following components: - A compressor dispatcher: this is the interface from the applications. Application messages are received by the compressor dispatcher, and based on the application requirements, the compressor dispatcher invokes a particular compressor to achieve the desired compression ratio using the allocated processing and memory resources. The compressor returns a SigComp message that is forwarded to the remote SigComp peer.endpoint. - A decompressor dispatcher: this is the interface towards theapplications.application. A SigComp message is received by the decompressor dispatcher and an instanceof the UDVMa decompressor is invoked. Once the dispatcher has received the (decompressed) application data itdetermines the target application and forwardforwards the message toit.the application. - One or more compressors:the compressorsa distinct compressor is invoked for eachcontain an algorithm to performremote endpoint with which thecompression.local application wishes to communicate. A compressor receives an (uncompressed) application message from the compressor dispatcher, compresses the message, and returns a SigComp message to the compressor dispatcher. During the compressionprocess,process the compressor may invoke the state handler to restoreaprevious state or saveanewone. Thestate. Each compressoris responsible for providing the remote decompressor with suitable UDVM bytecode to reconstruct the original application message. Within the compressor, the entity which runs the actual compressionchooses a certain algorithm(minus state management issues) is known asto encode the"encoder".data, (e.g. [DEFLATE]). - One or more decompressors:the decompressors contain the needed UDVM to perform the decompression. Thesince SigComp can run over an unsecure transport layer, a distinct decompressor must be invoked on a per-message basis. A decompressor receives a SigComp message from the decompressor dispatcher,decompressdecompresses the message, and returns the (decompressed) application message to the decompressor dispatcher. During the decompression process, the decompressor may invoke the state handler to restoreaprevious state or saveanewone.state. - State handler: this entity contains enough logic to store and retrieve states.A stateState isdatainformation that is stored between SigComp messages: this data can be saved either by a compressor, a decompressor or an application.The saved state may be used for (de)compression between a compressor and its peer decompressor. TheFor security purposes the state Price, Hannu, et al. [Page 6] INTERNET-DRAFT SigComp March 1, 2002 handleris also responsible for askingmust always ask the application to grant permission for new states to besaved by the state handler.saved. Stateparameterscreation and retrievalof statesare further described in Chapter 6.Price, Hannu, et al. [Page 6] INTERNET-DRAFT SigComp February 14 , 2002 +-----------------+ +-----------------+ |+---------------------------------------------+ | | || Application 1 |<-+ +->|Application2 | | | || | | | | +---------------------------------------------+ | | ^ Message & | Endpoint |+-----------------+ | | +-----------------+|^Decompressed endpoint | identity | |^message identity | |Application msgs.| | |+-------------------------+| +-- -- -- ----| | | ---|-- -- -- -- -- -- --|-- -- -- |- --|---- -- ----++ | | |+-----------------------+| | v v | || | |+--------------+| |+--------------+ +--------------+ | SigComp | | | | | | SigComp message | Compressor | | State | | Decompressor | message <-------| dispatcher | | handler | | dispatcher |<------- | | | | | | | | +--------------+| |+--------------+ +--------------+ | ^ ^| |^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | | | | | | | | | | | | | | | v | | | | | v | | +--------------+v| | | | +--------------+ | | Compressor 1 |+---------+| | | | |Decompressor 1| | ||<-->| State |<-->||<----+ | | +---->| | | | (Encoder) | |Handler| | (UDVM) | | | |+---------+| | | | | +--------------+ | | +--------------+ | | | | | | v | | v | +--------------+v| | +--------------+ | | Compressor 2 |+---------+| | |Decompressor 2| | ||<-->| State |<-->||<-------+ +------->| | | | (Encoder) | |Handler | |(UDVM) | | | |+---------+| | | +--------------+ +--------------+ | | SigComp layer | +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ----+-- + Figure 1: High-level architectural overview of one SigComppeer.endpoint Price, Hannu, et al. [Page 7] INTERNET-DRAFT SigComp March 1, 2002 Note that it is possible for SigComp to decompress messages from multiplecompressorsendpoints at different physical locations in anetwork. Thenetwork, as the architecture is designed to prevent data from onecompressorendpoint interfering with data from a differentcompressor.endpoint. A consequence of this design choice is that it is difficult for a malicious user to disruptdecompressorSigComp operation by inserting false compressed messages on thetransport. Price, Hannu, et al. [Page 7] INTERNET-DRAFT SigComp February 14 , 2002 The decompressorstransport layer. Each decompressor in the architecture of Figure 1should be viewed as containers for UDVMs; the actual decompressor functionalityishandled by invokingan instance of theUDVM.Universal Decompressor Virtual Machine (UDVM). Figure 2 gives a more detailed view of a UDVM, including all of the interfaces between the UDVM and its environment. +----------------+ +----------------+ | | Request compressed data | | | |-------------------------------->| | | |<--------------------------------| | | | Provide compressed data | | | | | Dispatcher | | | | | | | Outputuncompresseddecompressed data | | | |-------------------------------->| | | | | | | | +----------------+ | UDVM | | | +----------------+ | | Request state information | | | |-------------------------------->| | | |<--------------------------------| | | | Provide state information | | | | | State | | | | Handler | | | Make state creation request | | | |-------------------------------->| | | | Forwardcapabilityannouncement | | | | | | +----------------+ +----------------+ Figure 2: Interfaces between the UDVM and its environment Note that for simplicity, the UDVM indicates when it requires additional compressed data or state information using an explicit instruction. It then pauses and waits for the information to be supplied before continuing with the next instruction. This prevents the arrival of more data from interfering with the operation of the UDVM (e.g. by accidentally overwriting UDVM memory that is currently in use). Price, Hannu, et al. [Page 8] INTERNET-DRAFT SigComp March 1, 2002 3.1. Requirements on application From an application perspective the SigComp layertypicallyappears as a new transport, with similar behavior to the original transport used to carry uncompressed data (for example SigComp/UDP behaves similarly to native UDP).Price, Hannu, et al. [Page 8] INTERNET-DRAFT SigComp February 14 , 2002If the application wishes to mix SigComp messages with other types of data (e.g. uncompresseddata)data, or SigComp data for a different application) on the same transport then the transport must distinguish between thetwodifferent types of data.For UDP and TCP thisThis means that a new port will need to be reserved or discovered forcompressed data.the SigComp messages destined for a particular application. For example[SIP]SIP uses port 5060 for TCP and port 5061 for TLS/TCP, so it could similarly reserve another port for SigComp/TCP. In the interests of security, a new interface is required to the signaling application in order to leverage the authentication functions built into the application itself.For eachWhen the application receives a decompressed messagethat is accompanied by a state creation request,it determines thestate handler needsidentity of the sending endpoint and supplies this information tofind out whetherthe state handler. 3.2. Application-defined parameters When an applicationconsidersinvokes SigComp, a number of parameters are provided by themessageapplication tobe legitimate. If the decompressed message is considered to be invalid then the state handler cannot create the requested state information. This interface is marked on the architecture of Figure 1. 3.2. Application-defined parameters When an application invokes SigComp, a number of parameters are provided by the application to controlcontrol the maximum size of compressed messages, the UDVM memory size etc. Thetwo instances of the applicationlocal and remote applications that wish to communicate MUST initially agree on a common set of values for these parameters. Note thatif a reverse channel is available then SigComp can perform an internal "capability announcement" to indicate that additional memory or CPU cyclesthe majority of application-defined parameters areavailable. This means that it is generally sufficient toset to fixed values for a particular signaling application. However, endpoints implementing SigComp will typically have a wide range of capabilities; each offering a different amount of working memory, processing power and so on. In order to support this wide variation in endpoint capabilities, SigComp includes a mechanism for modifying the following application-definedparameter (thereparameters on the fly: UDVM_version UDVM_memory_size cycles_per_bit cycles_per_message Initial state The SigComp announcement mechanism isnodescribed further in Section 6.3. The advantage of building the announcement mechanism into SigComp is that it avoids the needto provide an external, application-specificfor any form of negotiationmechanism).to be performed by the application itself. Instead, it is sufficient to initialize Price, Hannu, et al. [Page 9] INTERNET-DRAFT SigComp March 1, 2002 all of the application-defined parameters to fixed values and modify them later using SigComp itself. Each application-defined parameter is described below.Appendix B discusses how eachNote that unless otherwise indicated, all of the parametersaffects SigComp operation in greater detail, and recommends default values for the parameters.can be stored as 2-byte integers. UDVM_version The UDVM_version parameter specifies the level of functionality available at the UDVM. The basic version of the UDVM (Version 0) is defined in this document.minimum_compression_ratiomaximum_expansion_size Theminimum_compression_ratiomaximum_expansion_size parameter prevents the generation of excessively large SigComp messages.ForIf set to 0 then the parameter is ignored by SigComp; for any other value then if ann byteuncompressedmessage,message is k bytes long, the corresponding SigComp message must be no larger than(n / minimum_compression_ratio) rounded down to the nearest byte.(k + maximum_expansion_size). Note thatthis parameter can be less than 1, (in which case a certain amount of message expansion is allowed) or 0 (in which case Price, Hannu, et al. [Page 9] INTERNET-DRAFT SigComp February 14 , 2002 no minimum_compression_ratio needs to be met). Anyany value other than 0 bans the creation of standalone SigComp messages (i.e. messages that do not contain a compressed application message). maximum_compressed_size The maximum_compressed_size parameter limits the size of one compressed message. SigComp rejects any message larger than the specified value. maximum_uncompressed_size The maximum_uncompressed_size parameter limits the size of one uncompressed message. SigComp rejects any message larger than the specified value. minimum_hash_size The minimum_hash_size parameter specifies the minimum size of the state identifier when creating new state information. This value needs to be sufficiently large to prevent malicious users from guessing a state identifier by brute force.overall_memory_sizeUDVM_memory_size Theoverall_memory_sizeUDVM_memory_size parameter specifies the total number of bytes in the UDVM memory.working_memory_start The working_memory_start parameter specifies the start of the UDVM memory area that can be modified. Memory addresses below this value are considered read-only by the UDVM. working_memory_end The working_memory_end parameter specifies the end of the UDVM memory area that can be modified. Memory addresses above this value are considered read-only by the UDVM.cycles_per_bit The cycles_per_bit parameter specifies the number of "CPU cycles" Price, Hannu, et al. [Page 10] INTERNET-DRAFT SigComp March 1, 2002 that can be used to decompress a single bit of data. One CPU cycle typically corresponds to a single UDVM instruction, although some of the high-level instructions may require additional cycles. cycles_per_message The cycles_per_message parameter specifies the number of additional CPU cycles made available at the start of a compressed message.Price, Hannu, et al. [Page 10] INTERNET-DRAFT SigComp February 14 , 2002These cycles can be useful when decompressing algorithms that upload additional data on a per-message basis, for example a new set of Huffman codes as with [DEFLATE]. The total maximum number of "CPU cycles" available for each compressed message is specified by the following formula:total_cyclesmaximum_cycles = message_size * cycles_per_bit + cycles_per_messagefirst_instructionmaximum_state_size Thefirst_instructionmaximum_state_size parameter specifies thememory addressmaximum amount ofthe first instruction tostate information that can beexecuted when the UDVM is initialized. Initial memory contents When the UDVM is invoked its memory is reset to contents definedsaved bythe application. This code is executeda local endpoint, forevery SigComp message (so typicallyeach remote endpoint with which itperforms a simple task such as extracting the first n bytes fromcommunicates. Note that themessage and interpreting themamount of state information is expressed as a multiple of the parameter UDVM_memory_size, because an item of stateidentifier). Initial state As well as decidinggenerally reflects theinitialcontents of the UDVMmemory, thememory. Initial state The application canalsostore useful information in the form of state. This predefined state is used to offer a range of well-known decompression algorithms to the compressor, which can choose to avoid uploading bytecode for a new algorithm if it supports one of the well-known algorithms. Each item of initial state can be made mandatory for every instance of the application, or it can be made optional (in which case support for the relevant state will need to be advertised before the state can be used). 4. SigComp message flow This chapter describes the SigComp messageflow, including the initialization, capability announcementflow andexchange of compressed messages. In the architecture of Figure 1, this chapter describesthe operation of the compressor and decompressor dispatcher. 4.1. Message exchange The local SigComp layer may send compressed data to a remote SigComp layer, and the local SigComp layer may also receive compressed data.However,Note however that compression in one direction does not necessarily imply compression in the reverse direction. Furthermore, even in the case that there are two unidirectional compressed flows between two Price, Hannu, et al. [Page 11] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002 SigComplayer,layers, there is no need to use the same compression algorithm at both compressors.4.1.1. Operation for each compressor-decompressor pair An endpoint that wants to send compressed data to a remote party must initialize a4.2. SigComplayer at the local party prior to its use, so that the decompressor dispatcher inmessage format In every SigComp message theremote endpoint assignsfirst few bytes are interpreted as adecompressorstate identifier that accesses some previously stored state information. This state information includes all of the data needed to decompress the SigComp message: including the decompression algorithm that will beused andapplied to theUDVM is loaded withremainder of the message, as well as any additional information that is required (e.g. one or more previously received messages if dynamic compressionalgorithm.is in use). Theprocessformat of the basic SigComp message isdescribedgiven in Figure3. +--------------+ +--------------+ | | | | | Endpoint A | | Endpoint B | | | | | +--------------+ +--------------+4: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 1 1 1 | length | +---+---+---+---+---+---+---+---+ | |SigComp Discovery: state_identifier (n-bytes) : ||<------------------------------->|| +---+---+---+---+---+---+---+---+ | | : Remaining SigCompRequest | |-------------------------------->| | | | | | Capabilities Announcement | |<--------------------------------| | | | | | UDVM Upload | |-------------------------------->| | |message : |Compressed Messages||- - - - - - - - - - - - - - - >|+---+---+---+---+---+---+---+---+ Figure3: Compressor-decompressor pair operation The4: Basic SigCompdiscovery mechanism itselfmessage The length field isoutsidea 3-bit value (MSBs before LSBs) that indicates thescopelength ofthis specification.the state identifier. Thefollowing three paragraphs specify aactual size n of the state identifier is calculated as follows: n = minimum_hash_size + length - 1 The state identifier is then extracted from the SigComp messageflow for discoveringand then executed as defined by thecapabilitiesSTATE-EXECUTE instruction ofEndpoint B and (if necessary) uploading a new decompression algorithmChapter 9. If the length value is set tothis endpoint. Note that if an application-defined default algorithm is available at all endpoints0 thenEndpoint A can immediately begin to compress messages andno state is accessed; instead thefollowing stages may be skipped. Endpoint A may send aentire SigCompRequestmessage is copied into the UDVM memory beginning at Address 6, and then executed starting from Address 6. All other addresses in the UDVM memory are initialized toEndpoint B. The0. Decompression failure occurs if the SigCompRequestmessage isa requesttoo short toinitialize a SigComp layer forcontain theapplication at Endpoint B and to know B's capabilities, i.e.expected state identifier, or if the requested state does not exist. See Section 8.2 for further details. Price, Hannu, et al. [Page 12] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002 4.3. Interfaces to and from theparameters in Section 3.2. If Endpoint A uses a UDVM decompression algorithm which only requirescompressor dispatcher When thedefault application- defined parameters, then this step mayapplication provides a message to beomitted. Endpoint B SHOULD answercompressed, it MUST also provide an "endpoint identity" that distinguishes the endpoint from other endpoints. The exact format of the endpoint identity is unimportant, provided that distinct endpoints have distinct endpoint identities. The SigCompRequest messagelayer contains one compressor for each remote endpoint witha Capabilities Announcement message,whichincludestheSigComp parameters that constrainlocal application is communicating; theoperation ofdispatcher forwards each new application message to theUDVM. Once Endpoint A has receivedappropriate compressor (invoking a new compressor if a new endpoint identity is encountered). Note that theCapabilities Announcement message,application MUST indicate to the compressor dispatcher when itchoosesno longer wishes to communicate with asuitable compression algorithmparticular endpoint, so thatB is ablethe resources taken by the corresponding compressor can be reclaimed. 4.4. Interfaces todecompressandsends a message containingfrom theUDVM decompression algorithm (unless Endpoint B already has the algorithm available). At this point, Endpoint B contains enough information to start decompressing messages received fromdecompressor dispatcher To ensure that SigComp can run over an unsecure transport layer, theapplication at Endpoint A. 4.1.2. Bi-directional initialization In scenarios where both endpoints decide to compress data indecompressor dispatcher invokes a new decompressor for eachofnew SigComp message. Resources for thedirections, a double initialization process must be done prior to start withdecompressor are released as soon as thenormal operation. The double initialization processmessage iscompriseddecompressed. Upon the arrival oftwoa SigComp message the decompressor dispatcher invokes an instance of theabove initialization processes, one in each direction,UDVM and loads it with the indicated state asdescribed inper Section4.1.1.4.2.SigComp message formatThebasic SigCompmessageconsists of a block of UDVM bytecode, the first n bytes of which are interpreted as a state identifier that accesses some previously stored state information. This state information comprisesis then decompressed by thedecompression algorithm that will be usedUDVM, returned todecompress the remainder oftheSigComp message, as well as any needed additional information (e.g. one or more previously received messages if dynamic compression is in use). Adecompressordispatcher MUST be abledispatcher, and passed on toseparate two SigComp messages; inthecase of UDP a SigComp message corresponds exactly to one UDP datagram. For TCP each 0xFFFF delimiterreceiving application. Note that when the UDVM isfollowedinvoked it does not receive any compressed data byadefault, but instead requests newSigComp message. The format ofdata explicitly using a specific instruction. Therefore, thebasic SigComp messagedispatcher isgiven in Figure 4: [Editors' Note: Specific SigComp messages such as theresponsible for buffering each SigCompRequest, the Capabilities Announcementmessage and passing theUDVM Upload may needdata tobe defined. A state identifier could be reserved for each specific type of message, just as state identifiers are reserved for each well-known algorithm.] Price, Hannu, et al. [Page 13] INTERNET-DRAFT SigComp February 14 , 2002 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | | : state_identifier (n-bytes) : | | +---+---+---+---+---+---+---+---+ | | : Remainingthe UDVMbytecode : | | +---+---+---+---+---+---+---+---+ Figure 4: Basic SigComp message Note that nwhen it is requested. If theapplication-defined parameter minimum_hash_size, an example value for which is given in Appendix B. Note alsoUDVM requests additional compressed data thatthe state informationisloaded into the UDVM memorynot yet available then it pauses andexecuted as definedwaits until enough data has been received by the dispatcher. Uncompressed data is also outputted by thefollowing piece ofUDVMbytecode: reserve state_identifier (n) INPUT-BYTECODE (n, state_identifier, fail) STATE-EXECUTE (state_identifier, n) :fail DECOMPRESSION-FAILURE Ifusing a specific instruction. Note that the UDVMmemoryhas no awareness of whether the underlying transport is message-based or stream-based, and so it always outputs uncompressed data as a stream. It isinitialized containingtheabove bytecode thenresponsibility of thestate identifier will automatically be extracted fromdispatcher to provide theSigCompuncompressed messageand the corresponding state information will be accessed and executed. 4.3. Interfacestoand from the dispatcher Oncetheremote party has initializedapplication in the expected form (i.e. as a stream or as a set of distinct, bounded messages). Price, Hannu, et al. [Page 13] INTERNET-DRAFT SigComplayer at the local party, the decompressor dispatcher is ready to receive compressed messages fromMarch 1, 2002 For aparticular remote party, decompress those messages, and pass them onto the application. The application providesstream-based transport, thecompressordispatcherwithdelimits messagesto be compressed. The encoder in the compressor compresses messages in such a way that the remote decompressor withby parsing theUDVM can decompresscompressed data stream for instances of 0xFF and taking the following actions: Occurs in datacorrectly (providing thatstream: Meaning: 0xFF 00 one 0xFF byte in thecompresseddata stream 0xFF 01 same, but the next byte isnot lost or damaged during transport). When a message is toquoted (could becompressed, the compressor selectsanother 0xFF) : : 0xFF 7F same, but thestatenext 127 bytes are quoted 0xFF 80 touse.0xFF FE reserved 0xFF FF message boundary Theidentifier ofreserved characters are useful for byte stuffing (if a compression algorithm generates compressed data containing theused state MUSTcharacter 0xFF then it should besent along withreplaced by thecompressed messagecharacter 0xFF00 to avoid accidentally inserting a message delimiter into theremote decompressor. Thecompressed data stream). 5. SigComp compressor An important feature of SigCompmessageisthen passedthat if two endpoints cannot agree on a common algorithm with which tounderlying layerssend and receive data, it is possible fortransport totheremote decompressor. [Editor's Note: State identifiers will needcompressor tobe reservedupload bytecode forwell- known decompression algorithms, and an additional state identifier Price, Hannu, et al. [Page 14] INTERNET-DRAFT SigComp February 14 , 2002 will be neededits own choice of algorithm toindicate thatthealgorithmdecompressor. In particular this means that it isbeing uploaded as part ofnot necessary to force all compressors to use thecompressed message.] Uponsame default algorithm; instead each implementer has thearrivalfreedom to pick one ofa SigComp messagethedecompressor dispatcher invokespredefined algorithms or to upload their own if needed. The overall requirement placed on thedecompressorcompressor is thatloadsof transparency, i.e. theUDVM with the indicated state. The message is then decompressed bycompressor MUST NOT send bytecode which cause theUDVM, returnedUDVM tothe decompressor dispatcher, and possibly passedincorrectly decompress a given message. The following more specific requirements are also placed ontothereceiving application. Note that whencompressor (they can be considered particular instances of theUDVMtransparency requirement): * It isinvoked it does not receive any compressed data by default, but instead requests new data explicitly usingRECOMMENDED that the compressor supply aspecific instruction. Therefore,CRC over thedispatcher is responsible for buffering each SigCompuncompressed messageand passing the datatotheensure that successful decompression has occurred. A UDVMwhen itinstruction isrequested. Uncompressed dataprovided to verify this CRC. * If the transport isalso outputted bymessage-based then theUDVM using a specific instruction. Depending oncompressor MUST preserve theparticular application,boundaries between messages. * If thedispatcher decides whether to forward a partially decompressed message immediately totransport is stream-based but theapplication, or to buffer and wait for a completeapplication defines its own internal messageto be successfully decompressed. For a stream-based transport,boundaries, then thedispatcher delimitscompressor SHOULD preserve the boundaries between messages byparsingusing thecompressed data stream for instances of 0xFF"end-of- message" character 0xFFFF reserved by SigComp. Price, Hannu, et al. [Page 14] INTERNET-DRAFT SigComp March 1, 2002 * The compressor MUST NOT exceed the maximum_compressed_size andtakingMUST ensure that thefollowing actions: Occurs in data stream: Action: 0xFFFF Delimit compressedmessage0xFF00 Replace with 0xFF 0xFF01 - 0xFFFE Decompression failurecan be decompressed using no more than the resources available at the remote decompressor. Thereserved character 0xFF00 is usefulreason forbyte stuffing (if a compression algorithm generates compressed data containing the character 0xFF then it should be replaced bypreserving thecharacter 0xFF00 to avoid accidentally insertingmessage boundaries over a stream-based transport is that damage to one compressed messagedelimiter intodoes not affect thecompressed data stream). 5. SigComp compressor An important featuredecompression ofSigComp is that if two endpoints cannot agreesubsequent messages. Moreover, the application typically vetoes state creation requests on acommon algorithm with whichper-message basis. 5.1. Supplying bytecode tosend and receive data, it is possible forthe UDVM A compressor MUST be certain that compressed data can be decompressed before the data is toupload bytecodebe sent, i.e. the UDVM instructions forits own choice of algorithm todecompression MUST be available at the remote decompressor.In particular this meansSeveral options exist for ensuring thatitthis bytecode isnot necessary to force all compressors to use the same default algorithm; instead each implementer has the freedom to pick one ofavailable: 1. Each SigComp message sent from thepredefined algorithms or to upload their own if needed. The overall requirement placed oncompressor contains the necessary UDVM instructions for decompression. 2. By setting up a reliable connection, such as a TCP connection, between a compressoris that of transparency, i.e.and its remote decompressor the UDVM instructions can be transferred and saved as state. 3. If there are predefined UDVM codes for well-known algorithms, a compressorMUST NOTonly needs to sendbytecode which causethe state identifier of that UDVM decompression algorithm code toincorrectly decompress a given message. Price, Hannu, et al. [Page 15] INTERNET-DRAFT SigComp February 14 , 2002its remote decompressor. Thefollowing more specific requirements are also placed on the compressor (theydecompressor canbe considered particular instances of the transparency requirement): * It is RECOMMENDED that the compressor supply a CRC overthen populate theuncompressed messageUDVM locally. In order toensure that successful decompression has occurred. Asave delay for "time-critical" sessions, the UDVMinstruction is providedinstructions should be uploaded prior toverifyany initiation of "time- critical" sessions. 5.2. Compression failure The compressor SHOULD make every effort to successfully compress an application message, but in certain cases thisCRC. * Ifmight not be possible (particularly if a low maximum_compressed_size has been set by thetransportapplication). In this case a "compression failure" ismessage-based then the compressor MUST preservecalled. Reasons for compression failure include theboundaries between messages.following: *If the transport is stream-based but the application defines its own internalA compressed or uncompressed messageboundaries, then the compressor SHOULD preserveexceeds theboundaries between messagesmaximum size defined byusingthe"end-of- message" character 0xFFFF reserved by SigComp.application. * Thecompressor MUST achieve the minimum_compression_ratio and MUST ensure that the message can be decompressed using no more than themaximum_compressed_size is exceeded for a certain message. * Insufficient resources are available at the compressor or at the remote decompressor.The reason for preserving the message boundaries overIf a compression failure occurs when compressing astream-based transport is that damage to one compressedmessagedoes not affect the decompression of subsequent messages. Moreover,then theapplication typically vetoes state creation requests on a per-message basis. Note that SigComp also reservescompressor informs thecharacter 0xFF00 over a stream- based transport, and replaces every instance of 0xFF00 with 0xFF before decompressing the data. This ensures that arbitrary compression algorithms can be used over a stream-based transport, provided that every instance of 0xFF in the compressed data stream is identified and replaced with 0xFF00. This "byte-stuffing" scheme prevents the compression algorithm from inserting a message delimiter into the data stream where one is not required. 5.1. Types of compression algorithm Any of the following classes of compression algorithm may be useful for particular applications: * Generic compressor (for example [DEFLATE] or a similar algorithm). * Protocol-aware compressor offering excellent performance for one particular type of data (for example the text messages generated by [SIP]). * Hybrid compressor with similar performance to [DEFLATE] for generic datadispatcher andsuperior performance for certain types of data.takes no further action. The Price, Hannu, et al. [Page16]15] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002Provided that the uncompressed data can be reconstructed atdispatcher MUST report this failure to theUDVM usingapplication. The application may then try other methods to deliver theavailable memorymessage. 6. State handling andCPU cycles, implementers have freedom to use a compression algorithmstate announcement This chapter defines the behavior oftheir choice. 5.2. Supplying bytecode totheUDVM A compressor MUST be certain that compressed data can be decompressed beforeSigComp state handler. The function of thedatastate handler is tobe sent, i.e. the UDVM instructions for decompression MUST be available atretain information between successive SigComp messages; it is thepeer decompressor. Several options exist for ensuringonly SigComp entity that is capable of thisbytecodefunction, and so it isavailable: 1. Each SigComp message sentof particular importance fromthe compressor contains the necessary UDVM instructions for decompression. 2. By setting up a reliable connection, such asaTCP connection, between a compressorsecurity perspective. 6.1. Storing andits peer decompressorretrieving state To provide security against the malicious insertion or modification of SigComp messages, the UDVMinstructions can be transferred and saved as state. 3. If there are predefined UDVM codes for well-known algorithms, a compressor only needs to sendmemory is reset after decompressing each message. This ensures that damaged SigComp messages do not prevent thestate identifiersuccessful decompression of subsequent valid messages. Note however thatUDVM decompression algorithm code to its peer decompressor. The decompressorthe overall compression ratio is often significantly higher if messages canthen populatebe compressed relative to theUDVM locally. In orderinformation stored in previous messages. For this reason it is possible tosave delaycreate "state" information for"time-critical" sessions,access when a later message is being decompressed. Both theUDVM instructions should be uploaded prior to any initiationcreation and access of"time- critical" sessions. 5.3. Compression failure The compressor SHOULD make every effortstate are designed tosuccessfully compress an application message, but in certain cases this might notbepossible (particularly ifsecure against malicious tampering with the compressed data. State can only be created when ahigh minimum_compression_ratiocomplete message has beenset bysuccessfully decompressed, and theapplication). In this casestate handler MUST NOT save state without permission from the application. Upon receiving a"compression failure" is called. Reasonsdecompressed message, the application may supply the state handler with the identity of the sending endpoint. Supplying this identity grants permission forcompression failure includethe state handler to do the following: *A compressed or uncompressed message exceeds the maximum size defined by the application. * The minimum_compression_ratio cannotAn item of state can beachieved for a certain message. * Insufficient resources are available atsaved using thecompressor or atmemory reserved for theremote decompressor. If a compression failure occursspecified endpoint. * Announcement information can be taken into account whencompressing a message thensending SigComp messages to thecompressor informsspecified endpoint. This is especially useful if thedispatcher and takes no further action. The dispatcherapplication has an authentication mechanism that canthen report this failurebe applied to determine whether theapplication.decompressed data is legitimate. Also note that state is not deleted when it is accessed. So even if a malicious user manages to access state information, subsequent messages compressed relative to this state can still be successfully decompressed. Instead, the state handler is responsible for deleting Price, Hannu, et al. [Page17]16] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20026. State handling and capability announcement This chapter definesstate information once it determines that thebehaviorstate will no longer be needed. Each item of state stores theSigCompfollowing information: Name: Type of data: state_identifier 16-byte value statehandler.start 2-byte value state_instruction 2-byte value state length 2-byte value state_value String of bytes Thefunctionstate_identifier must be supplied to retrieve an item of state from the statehandler is to retain information between successive SigComp messages; ithandler. State can be accessed using the UDVM instructions STATE-REFERENCE and STATE-EXECUTE, and can be created using the END-MESSAGE instruction. The state_value is a byte string that contains theonly SigComp entityactual value that iscapablecopied from/to the UDVM memory. The state_length specifies the number ofthis function,bytes contained within state_value, andsostate_start gives the UDVM memory address to which the state_value is copied when it isof particular importance from a security perspective. 6.1. Storing and retrieving state To provide security againstaccessed. Finally, state_instruction specifies themalicious insertionmemory address offalse compressed data,the next UDVMmemoryinstruction to execute when state isreset after each compressed message. This ensures that damaged compressed messages do not prevent the successful decompressionaccessed. The kind ofsubsequent valid messages. Note however thatinformation which is included in theoverall compression ratiostate_value isoften significantly higher if messages can be compressed relativeup to a particular compressor and theinformation storeduploaded instructions inprevious messages. For this reason it is possible to create "state" information for access whenthe remote UDVM. However alater messagecompressor MUST NOT use a state that isbeing decompressed. Bothnot known to be established at thecreationremote decompressor. 6.2. Saving andaccess ofdeleting states The stateare designedhandler for each endpoint is expected tobe secure against malicious tamperingoffer memory to store UDVM-created state. Every remote endpoint that wishes to communicate with thecompressed data. State can onlylocal endpoint expects to becreated whenable to store acomplete message has been successfully decompressed, andfixed amount of state; the number of bytes that it can store is given by the formula UDVM_memory_size * maximum_state_size. Note that each item of state costs (state_length + 22) bytes to store. The state handlerMUST vetokeeps track of which endpoint created each item of state; when a particular endpoint exceeds its allocated memory limit then sufficient items of statecreation request if instructedcreated by theapplication based on the contents of the decompressed message. Thissame endpoint are deleted (oldest state first) until enough memory isespecially useful ifavailable to accommodate the new state. Price, Hannu, et al. [Page 17] INTERNET-DRAFT SigComp March 1, 2002 The applicationhas an authentication mechanism that can be appliedMUST indicate todetermine whetherthedecompressed data is legitimate. Furthermore, a compressor can only access previously createdstateinformationhandler when it no longer wishes to communicate with a particular endpoint, so that the resources taken byproviding an [MD5] hash ofthe corresponding statetocan beaccessed.reclaimed. 6.3. Announcement Theadvantage of using a secure hash to access stateannouncement information isthat it is very difficultused toguessmodify thecorrect hashvaluewithout complete knowledgeofthe state being accessed. Also note that state is not deleted when it is accessed. So even if a malicious user manages to access state information, subsequent messages compressed relativecertain application-defined parameters. Since these parameter values are saved between SigComp messages, they are considered tothis state can stillbesuccessfully decompressed. Instead,part of the overall statehandler is responsible for deleting state information once it determines thatand hence are supplied from thestate will no longer be needed. Each item of state storesUDVM to thefollowing information: Name: Type of data: state_identifier 16-byte value state start 2-byte value state_instruction 2-byte valuestatelength 2-byte value state_value String of bytes Price, Hannu, et al. [Page 18] INTERNET-DRAFT SigComp February 14 , 2002handler. Thestate_identifier must be supplied to retrieve an itemfollowing list ofstate fromparameters is passed to the statehandler. State can be accessedhandler using the appropriate UDVMinstructions STATE-REFERENCE and STATE-EXECUTE, and can be created usinginstruction (namely the END-MESSAGEinstruction. The state_value isinstruction): +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_memory_size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_bit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_message | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | id_length 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : id_value_1 : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | id_length n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : id_value_n : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : reserved : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Announcement information Price, Hannu, et al. [Page 18] INTERNET-DRAFT SigComp March 1, 2002 If the application does not return abyte string that containsvalid endpoint identifier then theactual value thatannouncement information iscopied from/toautomatically discarded by theUDVM memory. The state_length specifiesstate handler. Otherwise it is passed to thenumbercompressor responsible for sending messages to the given endpoint. The reserved field allows for additional items ofbytes contained within state_value, and state_start givesdata to be added to theUDVM memory address from/to whichannouncement information in future. Note that thestate_value is copied. Finally, state_instructionlength field specifies thememory addresstotal length of thenextannouncement information including the reserved field. As usual, MSBs are stored preceding LSBs. The remaining items of data are explained in greater detail below: 6.3.1. UDVMinstruction to execute when state is accessed.version Thekindnext 2 bytes of the announcement informationwhich is included inspecify whether only thestate_valuebasic version of the UDVM isup to a particular compressor andavailable, or whether an upgraded version of theuploadedUDVM is available offering additional instructionsinetc. The basic version of theremote UDVM. However a compressor MUST not use a state thatUDVM isnot known to be established at the remote decompressor. 6.2. Guidelines for saving and deleting states [Editors' Note: Do we need something more?] A decompressor SHOULD NOT delete a state before it is confident enough that the state is not used by a peer compressor any more. 6.3. Capability announcement The capability announcement informationVersion 0, which isused to modifythevalue of certain application-defined parameters. Since these parameter values are saved between SigComp messages, they are considered toversion described in this document. Upgraded versions MUST bepart ofbackwards- compatible with theoverall state and hence are supplied frombasic version in the following sense: * If some UDVMtobytecode reaches thestate handler. IfEND-MESSAGE or DECOMPRESSION- FAILURE instructions when running on Version 0 of thestate handler rejects a state creation requestUDVM, then theaccompanying capability announcementsupgraded version MUST run the bytecode in an identical manner. This condition ensures that all bytecode that is valid for Version 0 of the UDVM will continue to berejected also. Ifvalid for upgraded versions of theunidirectional versionUDVM. However, bytecode that is invalid on Version 0 ofSigCompthe UDVM (i.e. bytecode that produces a decompression failure that isrunning thennot manually triggered) may become valid on upgraded versions. The simplest way to upgrade thecapability announcement informationUDVM in a backwards-compatible manner isautomatically discarded byto add additional UDVM instructions, as this will not affect thestate handler.operation of existing UDVM bytecode. 6.3.2. Memory size and CPU cycles Thefollowing blocknext 6 bytes of data specify new values for the application- defined parametersis passedUDVM_memory_size, cycles_per_bit and cycles_per_message. Note that this data can only be used to increase thestate handler usingamount of resources available at theappropriate UDVM instruction (currentlyremote UDVM. If the data specifies a parameter value that is smaller than the value already possessed by the state handler, the parameter keeps its original value (i.e. theEND-MESSAGE instruction): [Editors' Note: The capabilityannouncementblockdata for this parameter isyet to be finalized. More items may be added in future.]simply ignored). Price, Hannu, et al. [Page 19] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Total length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | overall_memory_size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_bit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_message | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Requested feedback length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : Requested feedback : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Returned feedback length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : Returned feedback : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Capability announcement block Note that the three 2-byte length fields specifyIn particular, only allowing thelengths ofparameter values to increase means that theentire capabilityannouncementblock, the requested feedback data and the returned feedback data respectively. As usual, MSBs are stored preceding LSBs. The remaining items of data are explained in greater detail below: 6.3.1. UDVM versionmechanism is robust against message loss or reordering. Thefirst 2 bytes of the capability announcement block specify whetherparameters can only be restored to their original values if reset or renegotiated by thebasic versionapplication. 6.3.3. State identifiers The list of state identifiers indicates that theUDVM is available,sending endpoint supports one orwhether an upgraded versionmore optional mechanisms (including well-known decompression algorithms, dictionaries ofthe UDVM is available offering additional instructions etc.common SIP phrases, feedback mechanisms etc.). Thebasic version ofinteger n specifies theUDVM is Version 0, which isnumber of state identifiers to follow. The field id_length_j specifies theversion describedlength in bytes of id_value_j, where acceptable values for id_length_j range from 1 to 16 inclusive. If a value outside thisdocument. Upgraded versions MUST be backwards- compatible withrange is received then thebasic version insubsequent state identifiers are ignored by the state handler. Each id_value_j indicates support for one optional mechanism at the sending endpoint. The optional mechanisms themselves, and their corresponding state identifiers, are beyond the scope of this document. 7. Overview of thefollowing sense: * If someUDVMbytecode reachesDecompression functionality for SigComp is provided by a "Universal Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine much like theEND-MESSAGE or DECOMPRESSION- FAILURE instructions when running on Version 0Java Virtual Machine but with a key difference: it is designed solely for the purpose of running decompression algorithms. The motivation for creating theUDVM, thenUDVM is to provide unlimited flexibility when choosing how to compress a given item of data. Rather than picking one of a small number of pre-negotiated compression algorithms, theupgraded version MUST runimplementer has thebytecode infreedom to select anidentical manner. This condition ensures that all bytecodealgorithm of their choice. The compressed data is then combined with a set of UDVM instructions that allow the original data to be extracted, and the result isvalidoutputted as UDVM bytecode. Since the UDVM is optimized specifically forVersion 0running decompression algorithms, the code size of a typical algorithm is small (often sub 100 bytes). Moreover the UDVMwill continueapproach does not add significant extra processing or memory requirements compared tobe valid for upgraded versionsrunning a fixed pre- programmed decompression algorithm. This chapter describes some basic features of the UDVM, including the well-known variables and instruction operands. Price, Hannu, et al. [Page 20] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002UDVM. However, bytecodeRecall thatis invalid on Version 0the amount of memory available to the UDVM(i.e. bytecode that producesis specified by the application-defined parameter UDVM_memory_size. Any attempt to read memory addresses beyond the overall memory size MUST cause a decompression failurethat is not manually triggered) may become valid on upgraded versions.(see Section 8.2). 7.1. Well-known variables Thesimplest way to upgrade the UDVMfirst few variables ina backwards-compatible manner is to add additionalthe UDVMinstructions, as this will not affectmemory have special tasks, for example specifying theoperationlocation ofexisting UDVM bytecode. 6.3.2. Memory sizethe stack used by the CALL andCPU cycles The next 6 bytesRETURN instructions. Each ofdata specify new values forthese well-known variables is a 2-byte integer. The following list gives theapplication- defined parameters overall_memory_size, cycles_per_bitname of each well-known variable andcycles_per_message. Note that this datathe memory address at which the variable canonlybeused to increasefound: Name: Starting memory address: byte_copy_left 0 byte_copy_right 2 stack_location 4 The MSBs of each variable are always stored before theamountLSBs. So, for example, the MSBs ofresources availablestack_location are stored at Address 4 whilst theremote UDVM. If the data specifies a parameter value thatLSBs are stored at Address 5. The use of each well-known variable issmaller thandescribed in thevalue already possessedfollowing sections of the document. 7.2. Instruction operands Each of the UDVM instructions is followed by 0 or more bytes containing thestate handler,operands required by theparameter keeps its original value (i.e.instruction. To reduce thecapability announcement datacode size of a typical UDVM program, each operand forthis parameter is simply ignored). In particular, only allowing the parameter values to increase means that the announcement mechanisma UDVM instruction isrobust against message loss or reordering.compressed using variable-length encoding. Theparameters can only be restoredaim is totheir originalstore more common operand valuesif reset or renegotiated byusing fewer bits than rarely occurring values. Three different types of operand are available: theapplication. 6.3.3. Requested feedbackliteral, the reference and the multitype. Therequested feedback dataoperand types that follow each UDVM instruction are specified in Chapter 9. The UDVM bytecode for each operand type isprovidedillustrated in Figure 7 to Figure 9, together with theUDVMinteger values represented by theremote compressor. By providing this data, the remote compressor is requestingbytecode. Note that thedata be returned to the compressor via a reverse channel (assuming that one is present). The compressorMSBs incontrol of the reverse channel SHOULD return this data by uploading it intothereturned feedback data block atbytecode are illustrated as preceding theremote UDVM. The data will then be passed backLSBs. Also, any string of bits marked with k consecutive "n"s is tothe remote compressorbe interpreted asexplained below. 6.3.4. Returned feedback The returned feedback data isanitem of feedback data that has successfully returnedinteger N fromthe remote entity. This data is passed0 to 2^k - 1 inclusive (with thelocal compressor (assuming that permission is granted by the application), which can make useMSBs ofitn illustrated asit wishes. Note that a compressor MUST only populate the returned feedback data withpreceding thebit-exact contents of a requested feedback data block previously provided to it.LSBs). Price, Hannu, et al. [Page 21] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20027. Overview of the UDVM Decompression functionality for SigComp is provided by a "Universal Decompressor Virtual Machine" (UDVM).TheUDVM is a virtual machine much likedecoded integer value of theJava Virtual Machine but with a key difference:bytecode can be interpreted in two ways. In some cases it isdesigned solely fortaken to be thepurposeactual value ofrunning decompression algorithms. The motivation for creatingtheUDVMoperand. In other cases it is taken toprovide unlimited flexibility when choosing how to compress a given item of data. Rather than picking one ofbe asmall number of pre-negotiated compression algorithms,memory address at which theimplementer has2-byte operand value can be found (MSBs found at thefreedom to select an algorithm of their choice.specified address, LSBs found at the following address). Thecompressed datalatter case is denoted by memory[X] where X isthen combined with a set of UDVM instructions that allowtheoriginal data to be extracted,address andthe resultmemory[X] isoutputted as UDVM bytecode. SincetheUDVM2- byte value starting at Address X. The simplest operand type isoptimized specifically for running decompression algorithms,thecode size ofliteral (#), which encodes atypical algorithmconstant integer from 0 to 65535 inclusive. A literal operand may require between 1 and 3 bytes depending on its value. Bytecode: Operand value: Range: 0nnnnnnn N 0 - 127 10nnnnnn nnnnnnnn N 0 - 16383 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 Figure 7: Bytecode for a literal (#) operand The second operand type issmall (often sub 100 bytes). MoreovertheUDVM approach does not add significant extra processing or memory requirements comparedreference ($), which is always used torunningaccess afixed pre- programmed decompression algorithm. This chapter describes some basic features of the UDVM, including2-byte value located elsewhere in thememory allocation, well-known variables and instruction parameters. 7.1.UDVMmemory allocationmemory. Thememory available to the UDVMbytecode for a reference operand ispartitioned intodecoded to be anumber of sections, providing space for program code, variables and miscellaneous data: <----- working_memory_size ------> | Fixed values | Variables | Miscellaneous data | Program code | +--------------+-----------+--------------------+--------------+ <--------------------- overall_memory_size --------------------> Figure 6: Memory allocation in the UDVM Recall thatconstant integer from 0 to 65535 inclusive, which is interpreted as theamount ofmemoryavailable toaddress containing theUDVM is specified byactual value of theapplication-defined parameters overall_memory_size, working_memory_start and working_memory_end.operand. Note thatall of these parameters are initialized by the application, but can be renegotiated on the fly using the capabilities announcement mechanism. The memory area from Address (working_memory_start) to Address (working_memory_end) inclusivereference operands canbe used to store arbitrary data (variables, program code, Huffman codes etc.). UDVM instructions are allowed to readalways take values fromor write to any address in this memory area. Price, Hannu, et al. [Page 22] INTERNET-DRAFT SigComp February 14 , 2002 The first part of this memory area is typically used0 tostore a number of 2-byte variables. UDVM instructions can reference these variables using a special instruction parameter65535 inclusive, asdescribed in Section 7.3. The memory area from Addressthey reference 2-byte values. Bytecode: Operand value: Range: 0nnnnnnn memory[2 * N] 0to Address (working_memory_start-1) and from Address (working_memory_end + 1) to Address (overall_memory_size65535 10nnnnnn nnnnnnnn memory[2 * N] 0 -1) inclusive65535 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 8: Bytecode for a reference ($) operand The third kind of operand iswrite-protected, so UDVM instructionsthe multitype (%), which canread from this memory area but cannot writebe used toit. Thisencode both actual values and memoryarea is intendedaddresses. The multitype operand also offers efficient encoding forstoringsmall integer values (both positive and negative) and for powers of 2. Bytecode: Operand value: Range: 00nnnnnn N 0 - 63 01nnnnnn memory[2 * N] 0 - 65535 1000011n 2 ^ (N + 6) 64 , 128 10001nnn 2 ^ (N + 8) 256 , ... , 32768 111nnnnn N + 65504 65504 - 65535 1001nnnn nnnnnnnn N + 61440 61440 - 65535 101nnnnn nnnnnnnn N 0 - 8191 Price, Hannu, et al. [Page 22] INTERNET-DRAFT SigComp March 1, 2002 110nnnnn nnnnnnnn memory[N] 0 - 65535 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 9: Bytecode for a multitype (%) operand 7.3. Byte copying A number of UDVMbytecode that caninstructions require a string of bytes to becompiled. Any attemptcopied toreadand from areas of the UDVM memory. This section defines how the byte copying operation should be performed. In general, the string of bytes is copied in ascending order of memoryaddressesaddress. So if a byte is copied from/to Address n then the next byte is copied from/to Address n + 1. As usual, if a byte is read from an address beyond the overall memory sizeor to write to memory addresses outside the working memory area MUST cause athen decompression failure(see Section 8.3). The first part of the write-protected UDVM memoryoccurs. Note however that if a byte isintended for storing variables whose values no longer need to be modified. The second part ofcopied from/to thewrite-protectedmemoryis intended for storing program code including UDVM instructions and their associated parameters. Note that if an instruction references a variable that has been write-protected, the compiled version ofaddress specified in byte_copy_right, theinstruction will typically run faster than ifbyte copy operation continues by copying thereferenced variable lies innext byte from/to theworkingmemoryarea. 7.2. Well-known variables The first few variablesaddress specified in byte_copy_left. This is useful for setting up a "circular buffer" within the UDVMmemory have special tasks, for example specifyingmemory. Note that thelocationstring of bytes is copied on a purely byte-by-byte basis. In particular, some of thestack usedlater bytes to be copied may themselves have been written into the UDVM memory by theCALL and RETURN instructions. Each of these well-known variablesbyte copying operation currently being performed. Equally, it is possible for a2-byte integer. The following list givesbyte copying operation to overwrite thename of each well-known variable andinstruction that called thememory address at whichbyte copy. If this occurs then thevariable canbyte copying operation MUST befound: Name: Startingcompleted as if the original instruction were still in place in the UDVM memoryaddress:(this also applies if byte_copy_left0or byte_copy_right2 stack_location 4 The MSBs of each variablearealways stored beforeoverwritten). 8. Decompressing a SigComp message This chapter lists theLSBs. So, for example,steps involved in theMSBsdecompression ofstack_location are stored at Address 4 whilsta single SigComp message. 8.1. Invoking theLSBs are stored at Address 5. The use of each well-known variable is described inUDVM Whenever thefollowing sectionsdispatcher receives a message to be decompressed, it invokes a new instance of thedocument.UDVM. The UDVM_memory_size is initialized using the corresponding application-defined parameter. The following steps are then taken: 1.) The number of remaining CPU cycles is set equal to the application-defined parameter cycles_per_message. Price, Hannu, et al. [Page 23] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20027.3. Instruction parameters EachNotes: The amount of compressed data available to the UDVMinstructionsisfollowed by 0 or more bytes containingexactly one compressed message. If theparameters required bytransport is stream-based then SigComp uses theinstruction. To reducereserved byte string 0xFFFF to delimit thecode size ofcompressed messages: the dispatcher takes the data between atypical UDVM program, each parameter forpair of neighboring reserved byte strings to be aUDVM instruction issingle compressedusing variable-length encoding.message. Theaimreserved byte string itself is not considered tostore more common parameter values using fewer bits than rarely occurring values. Three different typesbe part ofparameter are available: the literal,thereference and the multitype. The parameter types that follow each UDVM instruction are specified in Chapter 9.compressed message. TheUDVM bytecode for each parameter typecompressed data isillustrated in Figure 7not provided toFigure 9, together withtheinteger values representedUDVM by default. Instead, thebytecode. Note that the MSBs in the bytecode are illustrated as precedingUDVM requests compressed data using theLSBs. Also, any string of bits marked with k consecutive "n"sINPUT instructions (useful when running over a stream-based transport since there is no need tobe interpreted as an integer N from 0 to 2^k - 1 inclusive (with the MSBs of n illustrated as precedingwait for theLSBs).entire compressed message before decompression can begin). Thedecoded integer valuedispatcher MUST NOT make more than one compressed message available to a given instance of thebytecode can be interpreted in two ways.UDVM. Insome cases itparticular, the dispatcher MUST NOT concatenate two messages to form a single compressed message. This istakenbecause compressed messages are typically padded with trailing zero bits so that they are a whole number of bytes long. Concatenating two messages would cause these padding bits to be incorrectly interpreted as compressed data. 2.) Next, theactual value ofinstructions contained within theparameter. In other cases it is taken to be aUDVM memoryaddress at which the 2-byte parameter value can be found (MSBs foundare executed beginning at the address specifiedaddress, LSBs found atby thefollowing address).state as per Section 4.2. Notes: Thelatter caseinstructions are executed consecutively unless otherwise indicated (for example when the UDVM encounters a JUMP instruction). If the next instruction to be executed lies outside the available memory then decompression failure occurs (see Section 8.2). 3.) Each time an instruction isdenotedexecuted the number of available CPU cycles is decreased bymemory[X] where Xthe amount specified in Chapter 9. Additionally, if the UDVM requests n bits of compressed data (using one of the INPUT instructions) then the number of available CPU cycles is increased by n * cycles_per_bit. Notes: This means that theaddress and memory[X]total number of CPU cycles available for processing a compressed message is given by the2- byte value starting at Address X.formula: maximum_cycles = cycles_per_message + message_size * cycles_per_bit Thesimplest parameter typereason that this total is not allocated to theliteral (#), which encodes a constant integer from 0 to 65535 inclusive. A literal parameter may require between 1 and 3 bytes depending on its value. Bytecode: Parameter value: Range: 0nnnnnnn N 0 - 127 10nnnnnn nnnnnnnn N 0 - 16383 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 Figure 7: Bytecode for a literal (#) parameter The second parameter typeUDVM when it isthe reference ($), whichinvoked isalways used to access a 2-byte value located elsewhere inthat the UDVMmemory. The bytecode for a reference parameter is decodedcan begin tobedecompress aconstant integer from 0 to 65535 inclusive, which is interpreted as the memory address containing the actual value of the parameter. Notemessage thatreference parameters can always take values from 0 to 65535 inclusive, as they reference 2-byte values.has Price, Hannu, et al. [Page 24] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002Bytecode: Parameter value: Range: 0nnnnnnn memory[2 * N] 0 - 65535 10nnnnnn nnnnnnnn memory[2 * N] 0 - 65535 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 8: Bytecode for a reference ($) parameter The third kind of parameteronly been partially received. So the total message size may not be known when the UDVM is initialized. 4.) The UDVM stops executing instructions when it encounters an END-MESSAGE instruction or if decompression failure occurs. Notes: The UDVM passes uncompressed data to themultitype (%), whichdispatcher using the OUTPUT instruction. The OUTPUT instruction can be used toencode both actual valuesoutput a partially decompressed message; it is a dispatcher decision whether to use the data immediately or whether to buffer andmemory addresses. The multitype parameter also offers efficient encoding for small integer values (both positive and negative) and for powers of 2. Bytecode: Parameter value: Range: 00nnnnnn N 0 - 63 01nnnnnn memory[2 * N] 0 - 65535 1000011n 2 ^ (N + 6) 64 , 128 10001nnn 2 ^ (N + 8) 256 , ... , 32768 111nnnnn N + 65504 65504 - 65535 1001nnnn nnnnnnnn N + 61440 61440 - 65535 101nnnnn nnnnnnnn N 0 - 8191 110nnnnn nnnnnnnn memory[N] 0 - 65535 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 9: Bytecode for a multitype (%) parameter 7.4. Byte copying A number ofwait until the entire message has been decompressed. The UDVMinstructions require a string of bytes to be copiedpasses state creation requests toand from areas oftheUDVM memory.state handler using the END-MESSAGE instruction. Thissection defines howmeans that it is only possible to make a state creation request once thebyte copying operation should be performed. In general,message has been decompressed, which is necessary since thestringapplication typically determines the validity ofbytes is copied in ascending orderthese requests based on the contents ofmemory address. So ifthe decompressed message. 8.2. Decompression failure If abytecompressed message given to the UDVM iscopied from/to Address ncorrupted (either accidentally or maliciously) then thenext byte is copied from/to Address n + 1. As usual, ifUDVM may terminate with abyte is read from an address beyonddecompression failure. Reasons for decompression failure include theoverall memory sizefollowing: * A compressed oris written to an address outside the working memory area then decompression failure occurs. Note however that if a byte is copied from/to the memory address specified in byte_copy_right,uncompressed message exceeds thebyte copy operation continuesmaximum size defined bycopyingthenext byte from/toapplication. * The UDVM exceeds thememory address specified in byte_copy_left. This is usefulavailable CPU cycles forsetting updecompressing a"circular buffer" within themessage. * The UDVMmemory. Note that the string of bytes is copied onattempts to read apurely byte-by-byte basis. In particular, some ofmemory address beyond thelater bytes tooverall memory size. * An unknown instruction type is encountered. * An unknown operand type is encountered. * An instruction is encountered that cannot becopied may themselves have been written intoprocessed successfully by the UDVMmemory by(for example a RETURN instruction when no CALL instruction has previously been encountered). * The UDVM attempts to access non-existent state. * A manual decompression failure is triggered using thebyte copying operation currently being performed.DECOMPRESSION-FAILURE instruction. Price, Hannu, et al. [Page 25] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002Equally, it is possible for a byte copying operation to overwrite the instruction that called the byte copy.Ifthisa decompression failure occurs when decompressing a message then thebyte copying operation MUST be completed as if the original instruction were still in place in theUDVMmemory (this also applies if byte_copy_left or byte_copy_right are overwritten). 8. Decompressing a SigComp message This chapter listsinforms thesteps involved indispatcher and takes no further action. It is thedecompressionresponsibility ofa single SigComp message. 8.1. Invoking the UDVM Wheneverthe dispatcherreceives a messagetobe decompressed, it invokesdecide how to cope with the decompression failure. In general anew instance ofdispatcher SHOULD discard theUDVM. The overall_memory_sizecompressed message andinitial contents of theany decompressed data that has been outputted. 9. UDVMmemory are initialized using the corresponding application-defined parameters. The following steps are then taken: 1.) The number of remaining CPU cycles isinstruction setequal to the application-defined parameter cycles_per_message. Notes:Theamount of compressed data available to theUDVMis exactly one compressed message. If the transport is stream-based then SigComp uses the reserved byte string 0xFFFFcurrently understands 30 instructions, chosen todelimit the compressed messages: the dispatcher takessupport thedata between a pair of neighboring reserved byte strings to be a single compressed message. The reserved byte string itself is not considered to be partwidest possible range of compression algorithms with thecompressed message. The compressed data is not provided to the UDVM by default. Instead, the UDVM requests compressed data usingminimum possible overhead. Figure 10 lists theINPUTdifferent instructions(useful when running over a stream-based transport since there is no need to wait for the entire compressed message before decompression can begin). Note that in particular, this means that the application MUST define the initial contents ofand theUDVM memorybytecode values used tocontain at least one INPUT instruction. See Section 4.2 for an example of howstore theapplication might initializeinstructions at theUDVM memory.UDVM. Thedispatcher MUST NOT make more than one compressed message available to a given instancecost ofthe UDVM. In particular, the dispatcher MUST NOT concatenate two messages to form a single compressed message. Thiseach instruction in CPU cycles isbecause compressed messages are typically padded with trailing zero bits so that they are a whole number of bytes long. Concatenating two messages would cause these padding bits to be incorrectly interpreted as compressed data.also given: Instruction: Bytecode value: Cost in CPU cycles: DECOMPRESSION-FAILURE 0 1 AND 1 1 OR 2 1 NOT 3 1 ADD 4 1 SUBTRACT 5 1 MULTIPLY 6 1 DIVIDE 7 1 SORT-ASCENDING 8 1 + k * ceiling(log2(k)) SORT-DESCENDING 9 1 + k * ceiling(log2(k)) MD5 10 1 + length LOAD 11 1 MULTILOAD 12 1 + n COPY 13 1 + length COPY-LITERAL 14 1 + length COPY-OFFSET 15 1 + length + offset JUMP 16 1 COMPARE 17 1 CALL 18 1 RETURN 19 1 SWITCH 20 1 + n CRC 21 1 + length END-MESSAGE 22 1 + state length OUTPUT 23 1 + output_length NBO 24 1 INPUT-BYTECODE 25 1 + length INPUT-FIXED 26 1 INPUT-HUFFMAN 27 1 + n STATE-REFERENCE 28 1 + state_length STATE-EXECUTE 29 1 + state length Figure 10: UDVM instructions and corresponding bytecode values Price, Hannu, et al. [Page 26] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20022.) Next, theEach UDVM instruction costs a minimum of 1 CPU cycle. Certain high- level instructionscontained withinmay cost additional cycles depending on theUDVM memory are executed beginning atvalue of one of theaddress specified in first_instruction. Notes:instruction operands. Theinstructions are executed consecutively unless otherwise indicated (for exampleonly exception when calculating theUDVM encounters a JUMP instruction). Ifnumber of CPU cycles is that thenextSTATE-EXECUTE instructionto be executed lies outsidetakes (1 + state_length) cycles even though it does not have a state_length operand; instead theavailable memory then decompression failure occurs (see Section 8.3). 3.) Each time an instructionvalue of state length isexecutedprovided by thenumberstate handler as part ofavailable CPU cycles is decreasedthe state being accessed. All instructions are stored as a single byte to indicate the instruction type, followed by 0 or more bytes containing theamount specified in Chapter 9. Additionally, ifoperands required by theUDVM requests n bits of compressed data (using oneinstruction. The instruction specifies which of theINPUT instructions) then the numberthree operand types ofavailable CPU cyclesSection 7.2 isincreasedused in each case. For example, the ADD instruction is followed byn * cycles_per_bit. Notes: This means thattwo operands as shown below: ADD ($operand_1, %operand_2) When converted into bytecode thetotalnumber ofCPU cycles available for processing a compressed message is givenbytes required by theformula: total_cycles = cycles_per_message + message_size * cycles_per_bit The reason that this total is not allocated toADD instruction depends on theUDVM when it is invoked is thatsize of each operand value, and whether theUDVM can begin to decompresssecond (multitype) operand contains the operand value itself or amessage that has only been partially received. Somemory address where thetotal message size may notactual value of the operand can beknown whenfound. The instruction set available for the UDVMis initialized. 4.)offers a mix of low-level and high-level instructions. TheUDVM stops executinghigh-level instructionswhencan all be emulated using the low-level instructions provided, but given a choice itencounters an END-MESSAGE instruction or if decompression failure occurs. Notes: The UDVM passes uncompressed datais generally preferable tothe dispatcher using the OUTPUT instruction. The OUTPUTuse a single instructioncanrather than a large number of general-purpose instructions. The resulting bytecode will beusedmore compact (leading tooutput a partially decompressed message; it isadispatcher decision whether to use the data immediately or whether to bufferhigher overall compression ratio) andwait untildecompression will typically be faster because theentire message has been decompressed. The UDVM passes state creation requests toimplementation of thestate handler usingcompression-specific instructions can be optimized for theEND-MESSAGE instruction. This means that itUDVM. Each instruction isonly possible to makeexplained in more detail below: 9.1. Mathematical instructions The following instructions provide astate creation request once the message has been decompressed, which is necessary since the application typically determines the validitynumber ofthese requests basedmathematical operations including bit manipulation, arithmetic and sorting. 9.1.1. Bit manipulation The AND, OR and NOT instructions provide simple bit manipulation on 2-byte words. AND ($operand_1, %operand_2) OR ($operand_1, %operand_2) NOT ($operand_1) After thecontents ofoperation is complete, thedecompressed message. 8.2. Successful decompression The END-MESSAGE instruction indicates thatvalue of thecompressed message has been successfully decompressed and passed tofirst operand is overwritten with thedispatcher.result. Note that since this operand is a Price, Hannu, et al. [Page 27] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002thatreference, theactual uncompressed message is outputted beforehand usingmemory address specified by theOUTPUT instruction; this allowsoperand is always overwritten and not theUDVM to output each part ofoperand itself. 9.1.2. Arithmetic The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform arithmetic on 2-byte words. ADD ($operand_1, %operand_2) SUBTRACT ($operand_1, %operand_2) MULTIPLY ($operand_1, %operand_2) DIVIDE ($operand_1, %operand_2) After themessage tooperation is complete, thedispatcher as soon as it has been decompressed. The END-MESSAGE instruction provides two additional pieces of information to the state handler:first operand is overwritten with thestate creation request andresult. Note that in all cases thecapability announcement block. The state creation request mechanismarithmetic operation isdiscussed below: The UDVM may optionally save part of its memoryperformed modulo 2^16. So forretrieval by later messages. However to prevent malicious storage of a large amount of unnecessary state information, the application itself MUST give permission before any state can be created. The state handler typically makes a decision on whether state can be created based on the contents ofexample, subtracting 1 from 0 gives thedecompressed message, particularly ifresult 65535. For themessage contains authentication data that can verify whether or notSUBTRACT instruction thesendersecond operand islegitimate. The END-MESSAGE instruction requestssubtracted from thecreation of state usingfirst. Similarly, for theparameters state start and state length, which together denote a byte string state_value. Provided thatDIVIDE instruction theapplication gives permission, state_valuefirst operand isbyte copied from the UDVM memory (obeyingdivided by therules of Section 7.4) and stored together with a 16-byte state identifiersecond operand. Note thatcan be used to accessif thestate by a later compressed message. To provide security against malicious access,second operand does not divide exactly into theidentifier for any item of state created byfirst operand then theUDVMremainder isderived fromignored. 9.1.3. Sorting The SORT-ASCENDING and SORT-DESCENDING instructions sort lists of 2- byte words. SORT-ASCENDING (%start, %n, %k) SORT-DESCENDING (%start, %n, %k) The start operand specifies the[MD5] hashstarting memory address of thestate_valueblock of data to bestored.sorted. Thestate identifierblock of data itself isconstructed by taking the 16-byte [MD5] hash and replacing all but the first hash_length most significant bytes with zeroes. Note that if hash_length is 16 then the unmodified [MD5] hash is the state identifier. Decompression failure occurs if hash_length is less than the application-defined parameter minimum_hash_size or greater than 16. Ifdivided into n lists each containing k words. The SORT-ASCENDING instruction applies astate identifier already exists (hash collision occurs), the decompressor should check whether the requested state is identicalcertain permutation to theestablished state, and countlists, such that thestate creation requestfirst list is sorted into ascending order (treating each data word assuccessful if thisan integer). The same permutation is applied to all n lists, so lists other than thecase. Iffirst will notthen the state creation request is unsuccessful. The existing state MUST NOTnecessarily bereplaced withsorted into order. For example, therequested statefirst list might contain a set of integers to besaved. This is to avoidsorted, whilst thesituation where a compressed message cannotsecond list might bedecompressed because a needed item of state has been replaced (possibly by a malicious sender). Each itemused to keep track ofstate stores the following information (accessed bythestate_identifier):integers: Before sorting After sorting List 1 List 2 List 1 List 2 8 1 1 2 Price, Hannu, et al. [Page 28] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002Name: Type1 2 1 3 1 3 3 4 3 4 8 1 In the case ofdata: state_identifier 16-byte value state start 2-byte value state_instruction 2-byte value state length 2-byte value state_value Stringtwo words ofbytes Note that state_start, state_length and state_instruction are all parameters fromdata with theEND-MESSAGE instruction, whereas state_identifier and state_value are created as specified above. This state can subsequently be accessed by usingsame value, theSTATE-REFERENCE and STATE-EXECUTE instructions (by providingoriginal ordering of thecorrect state identifier). 8.3. Decompression failure If a compressed message given tolist is preserved. The SORT-DESCENDING instruction behaves as above, except that theUDVMfirst list iscorrupted (either accidentally or maliciously) thensorted into descending order. 9.1.4. MD5 The MD5 instruction calculates an MD5 hash over the specified area of UDVMmay terminate with a decompression failure. Reasons for decompression failure include the following: * A compressed or uncompressed message exceedsmemory. MD5 (%position, %length, %destination) The position and length operands define themaximum size defined bystring of bytes over which theapplication. *MD5 hash is calculated. Byte copying rules are enforced as per Section 7.3. TheUDVM exceedsdestination operand gives theavailable CPU cycles for decompressing a message. *starting address to which the resulting 16-byte hash will be copied. 9.2. Memory management instructions TheUDVM attemptsfollowing instructions are used toread a memory address beyondmanipulate theoverallUDVM memory. Bytes can be copied from one area of memorysize, ortowrite into a memory address outside the workinganother, and areas of memoryarea. * An unknowncan be write-protected to make it easier for UDVM code to be compiled. 9.2.1. LOAD The LOAD instructiontype is encountered. * An unknown parameter type is encountered. * Ansets a 2-byte variable to a certain specified value. The format of a LOAD instruction isencountered that cannotas follows: LOAD (%address, %value) The first operand specifies the starting address of the 2-byte variable, whilst the second operand specifies the value to beprocessed successfully byloaded into this variable. As usual, MSBs are stored before LSBs in the UDVM(for example a RETURNmemory. 9.2.2. MULTILOAD The MULTILOAD instructionwhen no CALL instruction has previously been encountered). * The UDVM attempts to access non-existent state. * A manual decompression failure is triggered using the DECOMPRESSION-FAILURE instruction. If a decompression failure occurs when decompressingsets amessage then the UDVM informs the dispatcher and takes no further action. It iscontiguous block of 2-byte variables to specified values. MULTILOAD (%address, #n, %value_0, ..., %value_n-1) The first operand specifies theresponsibilitystarting address of thedispatcher to decide how to cope withcontiguous variables, whilst the operands value_0 through to value_n-1 specify Price, Hannu, et al. [Page 29] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002decompression failure. In general a dispatcher SHOULD discardthecompressed message and any decompressed data that has been outputted. 9. UDVM instruction setvalues to load into these variables (in the same order as they appear in the instruction). 9.2.3. COPY The COPY instruction is used to copy a string of bytes from one part of the UDVMcurrently understands 28 instructions, chosenmemory tosupportanother. COPY (%position, %length, %destination) The position operand specifies thewidest possible rangememory address ofcompression algorithms withtheminimum possible overhead. Figure 10 listsfirst byte in thedifferent instructionsstring to be copied, and thebytecode values usedlength operand specifies the number of bytes tostorebe copied. The destination operand gives theinstructions ataddress to which theUDVM. The costfirst byte in the string will be copied. Note that byte copying is performed as per the rules ofeachSection 7.3. 9.2.4. COPY-LITERAL A modified version of the COPY instructionin CPU cyclesisalso given: Instruction: Bytecode value: Cost in CPU cycles: DECOMPRESSION-FAILURE 0 1 AND 1 1 OR 2 1 NOT 3 1 ADD 4 1 SUBTRACT 5 1 MULTIPLY 6 1 DIVIDE 7 1 LOAD 8 1 MULTILOAD 9 1 + n WORKING-MEMORY 10 1 COPY 11 1 + lengthgiven below: COPY-LITERAL (%position, %length, $destination) The COPY-LITERAL12 1 + length COPY-OFFSET 13 1 + length + offset JUMP 14 1 COMPARE 15 1 CALL 16 1 RETURN 17 1 SWITCH 18 1 + n CRC 19 1 + length END-MESSAGE 20 1 + state length OUTPUT 21 1 + output_length NBO 22 1 INPUT-BYTECODE 23 1 + length INPUT-FIXED 24 1 INPUT-HUFFMAN 25 1 + n STATE-REFERENCE 26 1 + state_length STATE-EXECUTE 27 1 + state length Figure 10: UDVM instructions and corresponding bytecode values Each UDVMinstructioncostsbehaves as aminimum of 1 CPU cycle. Certain high- level instructions may cost additional cycles depending on the value of one of theCOPY instructionparameters. Price, Hannu, et al. [Page 30] INTERNET-DRAFT SigComp February 14 , 2002 The only exception when calculatingexcept that after copying, thenumber of CPU cyclesdestination operand isthatreplaced with theSTATE-EXECUTE instruction takes (1 + state_length) cycles even though it does not have a state_length parameter; insteadmemory address immediately following thevalue of state length is provided byaddress to which thestate handler as part offinal byte was copied. If thestate being accessed. All instructions are stored as a singlefinal byte was copied toindicatetheinstruction type, followed by 0 or more bytes containingmemory address specified in byte_copy_right, theparameters required bydestination operand is set to theinstruction. The instruction specifies whichmemory address specified in byte_copy_left. 9.2.5. COPY-OFFSET A further version of thethree parameter types of Section 7.3 is used in each case. For example, the ADDCOPY-LITERAL instruction isfollowed by two parameters as showngiven below:ADD ($parameter_1, %parameter_2) When converted into bytecode the number of bytes required by the ADDCOPY-OFFSET (%offset, %length, $destination) The COPY-OFFSET instructiondepends on the sizebehaves as a COPY-LITERAL instruction except that an offset operand is given instead ofeach parameter value, and whether the second (multitype) parameter contains the parameter value itself ora position operand. To derive a suitable position operand, starting at the memory addresswherespecified by destination, theactual valueUDVM counts backwards a total of offset memory addresses. If theparameter canmemory address specified in byte_copy_left is reached, the next memory address is taken to befound.byte_copy_right. The COPY-OFFSET instructionset available for the UDVM offersthen behaves as amix of low-level and high-level instructions. The high-level instructions can allCOPY-LITERAL instruction, taking the position operand to beemulated usingthelow-levellast memory address reached in the above step. Price, Hannu, et al. [Page 30] INTERNET-DRAFT SigComp March 1, 2002 9.3. Program flow instructionsprovided, but given a choice it is generally preferable to use a singleThe following instructions alter the flow of UDVM code. Each instructionrather thanjumps to one of alargenumber ofgeneral-purpose instructions. The resulting bytecode will be more compact (leading tomemory addresses based on ahigher overall compression ratio) and decompression will typically be faster because the implementationcertain specified criterion. Note that all of thecompression-specificinstructionscan be optimized forgive theUDVM. Each instruction is explainedmemory addresses inmore detail below: 9.1. Bit manipulation instructions The AND, OR and NOT instructions provide simple bit manipulation on 2-byte words. AND ($parameter_1, %parameter_2) OR ($parameter_1, %parameter_2) NOT ($parameter_1) Aftertheoperation is complete,form of deltas relative to thevaluememory address of thefirst parameterinstruction. The actual memory address isoverwritten withcalculated as follows: memory_address = (memory_address_of_instruction + delta) modulo 2^16 Note that certain I/O instructions (see Section 9.4) can also alter program flow. 9.3.1. JUMP The JUMP instruction moves program execution to theresult.specified memory address. JUMP (%delta) Note thatsince this parameter isif the address (specified as areference,delta from the address of the JUMP instruction) lies beyond the overall UDVM memory size then decompression failure occurs. 9.3.2. COMPARE The COMPARE instruction compares two operands and then jumps to one of three specified memory addresses depending on the result. COMPARE (%operand_1, %operand_2, %delta_1, %delta_2, %delta_3) If operand_1 < operand_2 then the UDVM continues instruction execution at the (relative) memory address specified by delta 1. If operand_1 = operand_2 then it jumps to theparameter is always overwritten and notaddress specified by delta_2. If operand_1 > operand_2 then it jumps to theparameter itself. 9.2. Arithmeticaddress specified by delta_3. 9.3.3. CALL and RETURN The CALL and RETURN instructions provide support for compression algorithms with a nested structure. CALL (%delta) RETURN TheADD, SUBTRACT, MULTIPLYCALL andDIVIDERETURN instructionsperform arithmetic onmake use of a stack of 2-bytewords.variables stored at the memory address specified by the well-known variable stack_location. The stack contains the following variables: Price, Hannu, et al. [Page 31] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002ADD ($parameter_1, %parameter_2) SUBTRACT ($parameter_1, %parameter_2) MULTIPLY ($parameter_1, %parameter_2) DIVIDE ($parameter_1, %parameter_2) AfterName: Starting memory address: stack_free stack_location stack[0] stack_location + 2 stack[1] stack_location + 4 stack[2] stack_location + 6 : : The MSBs of these variables are stored before theoperation is complete, the first parameter is overwritten with the result. Note thatLSBs inall casesthearithmetic operation is performed modulo 2^16. So for example, subtracting 1 from 0 givesUDVM memory. When theresult 65535. ForUDVM reaches a CALL instruction, it finds theSUBTRACT instructionmemory address of thesecond parameter is subtracted frominstruction immediately following thefirst. Similarly,CALL instruction and copies this 2-byte value into stack[stack_free] ready forthe DIVIDElater retrieval. It then increases stack_free by 1 and continues instruction execution at thefirst parameter is divided(relative) memory address specified by thesecond parameter. Note that if the second parameter does not divide exactly intooperand. When thefirst parameterUDVM reaches a RETURN instruction it decreases stack_free by 1, and then continues instruction execution at theremainder is ignored. 9.3. Memory management instructions The following instructions are used to manipulatebyte position stored in stack[stack_free]. If theUDVM memory. Bytes can be copied fromvariable stack_free is ever increased beyond 65535 or decreased below 0 then a bad compressed message has been received and decompression failure occurs (see Section 8.2). Decompression failure also occurs if oneareaofmemory to another,the above instructions is encountered andareasthe value ofmemory can be write-protected to make it easier for UDVM code to be compiled. 9.3.1. LOADstack_location is smaller than 6 (this prevents the stack from overwriting the well-known variables). 9.3.4. SWITCH TheLOADSWITCH instructionsets a 2-byte variable toperforms acertain specified value. The formatconditional jump based on the value of one of its operands. SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) When aLOADSWITCH instruction isas follows: LOAD (%address, %value) The first parameter specifiesencountered thestarting address ofUDVM reads the2-byte variable, whilstvalue of j. It then continues instruction execution at thesecond parameter(relative) address specified by delta j. If j specifiesthea valueto be loaded into this variable. As usual, MSBs are stored before LSBs in the UDVM memory. 9.3.2. MULTILOADof n or more, a bad compressed message has been received and decompression failure occurs. 9.3.5. CRC TheMULTILOADCRC instructionsetsverifies acontiguous blockstring of bytes using a 2-bytevariables to specified values. MULTILOAD (%address, #n, %value_0, ..., %value_n-1) The first parameter specifies the starting address of the contiguous variables, whilst the parameters value_0 through to value_n-1 specify the values to load into these variables (in the same order as they appear in the instruction).CRC. CRC (%value, %position, %length, %delta) Price, Hannu, et al. [Page 32] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20029.3.3. WORKING-MEMORYTheWORKING-MEMORY instructionactual CRC calculation isused to prevent part ofperformed using theUDVM memory from being modified. This can be very useful when offering UDVM code for compilation. WORKING-MEMORY (%memory_start, %memory_end)generator polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte Frame Check Sequence (FCS) of [RFC-1662]. Theparameters memory_startposition andmemory_end specify the new working memory area for the UDVM. These parameters replacelength operands define theapplication- defined parameters working_memory_start and working_memory_end, but only whilestring of bytes over which thecurrent messageCRC isbeing decompressed. Whenevaluated. Byte copying rules are enforced as per Section 7.3. Important note: Since anew instance of the UDVMCRC calculation isinvoked the working memory areaalways performed over a bitstream, for interoperability it isset bynecessary to define theoriginal application-defined parameters.order in which bits are supplied within each individual byte. In this case the MSBs of the byte MUST be supplied to the CRC calculation before the LSBs. The value operand contains the expected integer value of the 2-byte CRC. Ifmemory_end < memory_start, or iftheparameters reference a memory address beyondcalculated CRC matches theoverall UDVM memory size,expected value thendecompression failure occurs. AftertheWORKING-MEMORY instruction has been encountered,UDVM continues at the following instruction. Otherwise theonly way to write intoUDVM jumps to the (relative) memorywithinaddress specified by delta. 9.4. I/O instructions The following instructions allow theprotected region isUDVM to interface with its environment. Note that in the overall SigComp architecture all of these interfaces pass tocanceltheprotection using another WORKING-MEMORY instruction (ordecompressor dispatcher or toinvoke a new instance oftheUDVM). 9.3.4. COPYstate handler. 9.4.1. END-MESSAGE TheCOPYEND-MESSAGE instructionis used to copy a string of bytes from one part ofsuccessfully terminates the UDVMmemoryand passes state information toanother. COPY (%position, %length, %destination) The position parameter specifies the memory address ofthefirst byte instate handler. END-MESSAGE (%hash_length, %state_start, %state_length, %state_instruction, %announcement_location) Note that thestring to be copied, andactual uncompressed message is outputted separately using thelength parameter specifiesOUTPUT instruction; this conserves memory at thenumber of bytesUDVM because there is no need to buffer an entire uncompressed message before it can becopied. The destination parameter gives the addresspassed towhich the first byte inthestring will be copied.dispatcher. Note thatbyte copyingif the announcement_location operand isperformed as perset to 0 then no announcement information is provided, otherwise it points to therules of Section 7.4. 9.3.5. COPY-LITERAL A modified versionstarting memory address of theCOPY instruction is given below: COPY-LITERAL (%position, %length, $destination) The COPY-LITERAL instruction behavesannouncement information asa COPYper Section 6.3. The END-MESSAGE instructionexcept that after copying, the destination parameter is replaced withrequests thememory address immediately followingcreation of state using theaddress tooperands state start and state length, whichthe finaltogether denote a bytewas copied. Ifstring state_value. Provided that thefinalapplication gives permission, state_value is bytewascopiedtofrom the UDVM memoryaddress(obeying the rules of Section 7.3) and stored together with a 16-byte state identifier that can be used to access the state by a later compressed message. Price, Hannu, et al. [Page 33] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002specified in byte_copy_right, the destination parameter is set to the memory address specified in byte_copy_left. 9.3.6. COPY-OFFSET A further version of the COPY-LITERAL instruction is given below: COPY-OFFSET (%offset, %length, $destination) The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction except that an offset parameter is given instead of a position parameter.Toderive a suitable position parameter, starting atprovide security against malicious access, thememory address specifiedidentifier for any item of state created bydestination,the UDVMcounts backwards a total of offset memory addresses. If the memory address specified in byte_copy_leftisreached,derived from thenext memory address is taken[MD5] hash of the state_value to bebyte_copy_right.stored. TheCOPY-OFFSET instruction then behaves as a COPY-LITERAL instruction,state identifier is constructed by taking theposition parameter to be16-byte [MD5] hash and replacing all but thelast memory address reached infirst hash_length most significant bytes with zeroes. Note that if hash_length is 16 then theabove step. 9.4. Program flow instructionsunmodified [MD5] hash is the state identifier. Decompression failure occurs if hash_length is less than the application-defined parameter minimum_hash_size or greater than 16. If a state identifier already exists (hash collision occurs), the decompressor should check whether the requested state is identical to the established state, and count the state creation request as successful if this is the case. If not then the state creation request is unsuccessful. Thefollowing instructions alterexisting state MUST NOT be replaced with theflow of UDVM code. Each instruction jumpsrequested state toone ofbe saved. This is to avoid the situation where anumber of memory addresses based oncompressed message cannot be decompressed because acertain specified criterion. Note that allneeded item of state has been replaced (possibly by a malicious sender). 9.4.2. DECOMPRESSION-FAILURE The DECOMPRESSION-FAILURE instruction triggers a manual decompression failure. This is useful if theinstructions giveUDVM program discovers that it cannot successfully decompress thememory addresses inmessage (e.g. by using theform of deltas relativeCRC instruction). This instruction has no operands. 9.4.3. OUTPUT The OUTPUT instruction provides successfully decompressed data to the dispatcher. OUTPUT (%output_start, %output_length) The operands define the starting memory address and length of theinstruction. The actual memory address is calculated as follows: memory_address = (memory_address_of_instruction + delta) modulo 2^16 Note that certain I/O instructions (see Section 9.5) can also alter program flow. 9.4.1. JUMP The JUMP instruction moves program executionbyte string to be provided to thespecified memory address. JUMP (%delta)dispatcher. Note thatiftheaddress (specified asOUTPUT instruction can be used to output adelta frompartially decompressed message; each time theaddressinstruction is encountered it appends a byte string to the end of theJUMP instruction) lies beyonddata previously passed to the dispatcher via the OUTPUT instruction. The string of data is byte copied from theoverallUDVM memorysize then decompressionobeying the rules of Section 7.3. Decompression failureoccurs.occurs if the cumulative number of bytes provided to the dispatcher exceeds the application-defined parameter maximum_uncompressed_size. Price, Hannu, et al. [Page 34] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20029.4.2. COMPARE The COMPARE instruction compares two parametersSince there is technically a difference between outputting a 0-byte decompressed message, andthen jumpsnot outputting a decompressed message at all, the OUTPUT instruction needs toone of three specified memory addresses depending ondistinguish between theresult. COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3) If parameter_1 < parameter_2 thentwo cases. Thus, if the UDVMcontinuesterminates before encountering an OUTPUT instructionexecution at the (relative) memory address specified by delta 1.it is considered not to have outputted a decompressed message. Ifparameter_1 = parameter_2 thenitjumpsencounters one or more OUTPUT instructions, each of which provides 0 bytes of data to theaddress specified by delta_2. If parameter_1 > parameter_2dispatcher, then itjumpsis considered tothe address specified by delta_3. 9.4.3. CALL and RETURN The CALL and RETURN instructions provide support for compression algorithms withhave outputted anested structure. CALL (%delta) RETURN0-byte decompressed message. 9.4.4. NBO TheCALLNBO instruction modifies the order in which compressed bits are passed to the UDVM. As the INPUT-FIXED andRETURNINPUT-HUFFMAN instructionsmake use ofread individual bits from within astack of 2-byte variables stored at the memory address specified bybyte, to avoid ambiguity it is necessary to define thewell-known variable stack_location.order in which these bits are read. Thestack containsdefault operation is to read thefollowing variables: Name: Starting memory address: stack_free stack_location stack[0] stack_location + 2 stack[1] stack_location + 4 stack[2] stack_location + 6 : : TheMSBsof these variables are storedbefore the LSBs, but if the NBO instruction is encountered then the LSBsinare read before theUDVM memory. WhenMSBs. Both cases are illustrated below: MSB LSB MSB LSB MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 Byte 0 Byte 1 Default operation After NBO instruction The NBO instruction can only be used before bitwise compressed data is passed to theUDVM reachesUDVM. Therefore, aCALL instruction,decompression failure occurs if itfinds the memory addressis encountered after an INPUT-FIXED or an INPUT-HUFFMAN instruction has been used. 9.4.5. INPUT-BYTECODE The INPUT-BYTECODE instruction requests a certain number of bytes of compressed data from theinstruction immediately followingdispatcher. INPUT-BYTECODE (%length, %destination, %delta) The length operand indicates theCALL instruction and copies this 2-byte value into stack[stack_free] ready for later retrieval. It then increases stack_free by 1requested number of bytes of compressed data, andcontinues instruction execution atthe(relative)destination operand specifies the starting memory addressspecified byto which they should be copied. Byte copying is performed as per theparameter. Whenrules of Section 7.3. If theUDVM reaches a RETURN instruction it decreases stack_free by 1, and then continuesinstructionexecution atrequests data that lies beyond thebyte position stored in stack[stack_free]. Ifend of thevariable stack_free is ever increased beyond 65535 or decreased below 0 then a badcompressedmessage has been received and decompression failure occurs (see Section 8.3).message, no data is returned. Instead the UDVM moves Price, Hannu, et al. [Page 35] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002Decompressionprogram execution to the memory address specified by the formula (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. The INPUT-BYTECODE instruction can only be used before bitwise compressed data is passed to the UDVM. Therefore, a decompression failurealsooccurs ifone of the above instructionsit is encounteredand the value of stack_location is smaller than 6 (this prevents the stack from overwriting the well-known variables). 9.4.4. SWITCHafter an INPUT-FIXED or an INPUT- HUFFMAN instruction has been used. 9.4.6. INPUT-FIXED TheSWITCHINPUT-FIXED instructionperformsrequests aconditional jump based on the valuecertain number ofonebits ofits parameters. SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) When a SWITCH instruction is encounteredcompressed data from theUDVM readsdispatcher. INPUT-FIXED (%length, %destination, %delta) The length operand indicates thevaluerequested number ofj. It then continues instruction execution at the (relative) address specified by delta j.bits. Ifj specifies a value of n or more, a bad compressed message has been receivedthis operand does not lie between 1 and 16 inclusive then a decompression failure occurs.9.4.5. CRC The CRC instruction verifies a string of bytes using a 2-byte CRC. CRC (%value, %position, %length, %delta)Theactual CRC calculation is performed usingdestination operand specifies thegenerator polynomial x^16 + x^12 + x^5 + 1,memory address to whichcoincides withthe2-byte Frame Check Sequence (FCS) of [RFC-1662]. The position and length parameters define the string of bytes over whichcompressed data should be copied. Note that theCRC is evaluated. Byte copying rulesrequested bits areenforcedinterpreted asper Section 7.4. Important note: Since a CRC calculation is always performed overabitstream, for interoperability it is necessary2-byte integer ranging from 0 todefine the order in which bits are supplied within each individual byte. In this case2^length - 1. Under default operation the MSBs ofthe byte MUST be supplied to the CRC calculation before the LSBs. The value parameter contains the expectedthis integervalue ofare provided first, but if an NBO instruction has been executed then the2-byte CRC.LSBs are provided first. If thecalculated CRC matches the expected value theninstruction requests data that lies beyond theUDVM continues atend of thefollowing instruction. Otherwisecompressed message, no data is returned. Instead the UDVMjumpsmoves program execution to the(relative)memory address specified bydelta. 9.5. I/O instructions The following instructions allow the UDVM to interface with its environment. Note that intheoverall SigComp architecture allformula (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 9.4.7. INPUT-HUFFMAN The INPUT-HUFFMAN instruction requests a variable number ofthese interfaces pass to the decompressor dispatcher or tobits of compressed data from thestate handler. Price, Hannu, et al. [Page 36] INTERNET-DRAFT SigComp February 14 , 2002 9.5.1. END-MESSAGEdispatcher. TheEND-MESSAGEinstructionsuccessfully terminates the UDVMinitially requests a small number of bits andpasses state information to the state handler. END-MESSAGE (%hash_length, %state_start, %state_length, %state_instruction, %capability_announcement_location) The actions taken bycompares theUDVM upon encounteringresult against a certain criterion; if theEND-MESSAGE instructioncriterion is not met then additional bits aredescribed in Section 8.2. Note also that the capability_announcement_location parameter points to the starting memory address ofrequested until thecapability announcement block of Section 6.3. 9.5.2. DECOMPRESSION-FAILUREcriterion is achieved. TheDECOMPRESSION-FAILUREINPUT-HUFFMAN instructiontriggers a manual decompression failure. Thisisuseful if the UDVM program discoversfollowed by three mandatory operands plus n additional sets of operands. Every additional set contains four operands as shown below: INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, %upper_bound_n, %uncompressed_n) Note thatit cannot successfully decompressif n = 0 then themessage (e.g.INPUT-HUFFMAN instruction is ignored byusingtheCRC instruction). This instruction has no parameters. 9.5.3. OUTPUT The OUTPUTUDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then decompression failure occurs. Price, Hannu, et al. [Page 36] INTERNET-DRAFT SigComp March 1, 2002 In all other cases, the behavior of the INPUT-HUFFMAN instructionprovides successfully decompressedis defined below: 1.) Set j = 1. 2.) Request an additional bits_j compressed bits. Interpret the total (bits_1 + ... + bits_j) bits of compressed datatorequested so far as an integer H, with thedispatcher. OUTPUT (%output_start, %output_length) The parameters definefirst bit to be supplied as thestarting memory addressMSB andlength ofthebyte stringlast bit to beprovided tosupplied as thedispatcher. NoteLSB (note that this is always theOUTPUT instruction can be used to output a partially decompressed message; each timecase, independently of whether the NBO instruction has been used). 3.) If data isencountered it appends a byte string torequested that lies beyond the end of thedata previously passed tocompressed message, terminate thedispatcher viaINPUT-HUFFMAN instruction and move program execution to theOUTPUT instruction.memory address specified by the formula (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 4.) If (H < lower_bound_j) or (H > upper_bound_j) then set j = j + 1. Then go back to Step 2, unless j > n in which case decompression failure occurs. 5.) Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the memory address specified by the destination operand. 9.4.8. STATE-REFERENCE ThestringSTATE-REFERENCE instruction retrieves some previously stored state information. STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, %state_destination) The id_start and id_length operands specify the location ofdata is byte copied fromtheUDVM memory obeyingstate identifier used to retrieve therulesstate information. The state identifier is always 16 bytes long; if id_length is less than 16 then the remaining least significant bytes ofSection 7.4.the identifier are padded with zeroes. Decompression failure occurs if id_length is greater than 16. Decompression failure also occurs if no state information matching thecumulative numberstate identifier can be found. Note that when accessing state information that has been previously created by the UDVM, the state identifier is always taken from an [MD5] hash ofbytes provided tothedispatcher exceedsstate to be retrieved. However this is not necessarily the case for application-definedparameter maximum_uncompressed_size. Since there is technically a difference between outputting a 0-byte decompressed message,state as per Section 3.2. The state_start andnot outputting a decompressed message at all,state_length operands define theOUTPUT instruction needsstarting byte and number of bytes todistinguish betweencopy from thetwo cases. Thus, ifstate_value contained in theUDVM terminates before encountering an OUTPUT instruction itidentified item of state. If more state isconsidered not to have outputted a decompressed message. If it encounters one or more OUTPUT instructions, each of which provides 0 bytes of data to the dispatcher, then itrequested than isconsidered to have outputted a 0-byte decompressed message.actually available then decompression failure occurs. Price, Hannu, et al. [Page 37] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20029.5.4. NBOTheNBOstate_destination operand contains a UDVM memory address. The requested state is byte copied to this memory address using the rules of Section 7.3. 9.4.9. STATE-EXECUTE The STATE-EXECUTE instructionmodifiesretrieves and runs some previously stored state information. STATE-EXECUTE (%id_start, %id_length) The id_start and id_length operands function as per theorder in which compressed bits are passedSTATE- REFERENCE instruction. STATE-EXECUTE is similar to STATE-REQUEST except that it does not require theUDVM. Asamount of state being requested or theINPUT-FIXED and INPUT-HUFFMAN instructions read individual bits from within a byte,proposed destination for the state toavoid ambiguitybe specified explicitly. Instead, itis necessary to definesimply puts theorder in which these bits are read. The default operation is to readstate_value back into theMSBs beforeUDVM memory using theLSBs, but ifoperands state_start and state_length contained as part of theNBO instructionstate information. The entire state_value (all state length bytes of it) isencounteredbyte copied into the memory address specified by state start. The UDVM then jumps to theLSBs(absolute) memory address specified by state_instruction. Note that state start, state length and state_instruction areread beforeall stored together with state_value as part of an item of state information. 10. Security considerations 10.1. Security goals The overall security goal of theMSBs. Both cases are illustrated below: MSB LSB MSB LSB MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 Byte 0 Byte 1 Default operation After NBO instruction The NBO instruction can only be used before bitwise compressed dataSigComp architecture ispassedto not create risks that are in addition to those already present in theUDVM. Therefore, a decompression failure occurs if itapplication protocols. There isencountered after an INPUT-FIXED or an INPUT-HUFFMAN instruction has been used. 9.5.5. INPUT-BYTECODE The INPUT-BYTECODE instruction requests a certain number of bytesno intention for SigComp to enhance the security ofcompressed data fromthedispatcher. INPUT-BYTECODE (%length, %destination, %delta) The length parameter indicatesprotocols, as it always can be circumvented by not using compression. More specifically, therequested numberhigh-level security goals can be described as: -- do not worsen security ofbytesexisting application protocol -- do not create any new security issues -- do not hinder deployment ofcompressed data,application security Price, Hannu, et al. [Page 38] INTERNET-DRAFT SigComp March 1, 2002 10.2. Security risks and mitigations This subsection identifies thedestination parameter specifiespotential security risks associated with thestarting memory address to which they shouldoverall SigComp architecture, and details the proposed solution for each risk. ** Confidentiality risks *** Attacking SigComp by snooping into state of other users State can only becopied. Byte copyingaccessed using a state identifier, which isperformed as per the rulesa (prefix of a) cryptographic hash ofSection 7.4. Iftheinstruction requests data that lies beyondstate being referenced. This implies that theend ofreferencing packet already needs knowledge about thecompressed message, no datastate. To enforce this, a reference length of 72 bits isreturned. Insteaddefined. This also minimizes theUDVM moves program executionprobability of an accidental state collision. Generally, ways to obtain knowledge about thememory address specified bystate identifier (e.g., passive attacks) will also easily provide knowledge about theformula (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16.state referenced, so no new vulnerability results. TheINPUT-BYTECODE instruction can only be used before bitwise compressed data is passedapplication needs to handle state identifiers with theUDVM. Therefore, a decompression failure occurs ifsame care itis encountered after an INPUT-FIXED or an INPUT- HUFFMAN instruction has been used. Price, Hannu, et al. [Page 38] INTERNET-DRAFT SigComp February 14 , 2002 9.5.6. INPUT-FIXED The INPUT-FIXED instruction requests a certain number of bits of compressed data fromwould handle thedispatcher. INPUT-FIXED (%length, %destination, %delta)state itself. ** Integrity risks Thelength parameter indicatesSigComp approach assumes that there is appropriate integrity protection below and/or above therequested number of bits. If this parameter does not lie between 1 and 16 inclusive then a decompression failure occurs. The destination parameter specifiesSigComp layer. However, thememory addressstate establishment mechanism provides additional potential towhichcompromise thecompressed data shouldintegrity of the messages (which, however, would most likely becopied. Note thatdetectable at therequested bits are interpreted as a 2-byte integer ranging from 0application layer). *** Attacking SigComp by faking state or making unauthorized changes to2^length - 1. Under default operation the MSBs of this integer are provided first, butstate State cannot be destroyed or changed by a malicious sender -- it can only add new state. Faking state is only possible if the hash allows intentional collision. ** Availability risks (avoid DoS vulnerabilities) *** Use of SigComp as a tool in a DoS attack to another target SigComp cannot easily be used as anNBO instruction has been executedamplifier in a reflection attack, as it only generates one decompressed message per incoming compressed message. This packet is then handed to theLSBs are provided first. Ifapplication; theinstruction requests data that lies beyondutility as a reflection amplifier is therefore limited by theendutility of thecompressed message, no data is returned. Instead the UDVM moves program executionapplication. However, it must be noted that SigComp can be used to generate larger packets as input to thememory address specified byapplication than have to be sent from theformula (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 9.5.7. INPUT-HUFFMAN The INPUT-HUFFMAN instruction requestsPrice, Hannu, et al. [Page 39] INTERNET-DRAFT SigComp March 1, 2002 malicious sender; this therefore can send smaller packets (at avariable number of bitslower bandwidth) than are delivered to the application. Depending on the reflection characteristics ofcompressed data fromthedispatcher. The instruction initially requestsapplication, this can be considered asmall numbermild form ofbits and comparesamplification. The application MUST limit theresult againstnumber of packets reflected to acertain criterion;potential target -- even ifthe criterion is not met then additional bits are requested until the criterion is achieved. The INPUT-HUFFMAN instructionSigComp isfollowed by three mandatory parameters plus n additional setsused to generate a large amount ofparameters. Every additional set contains four parametersinformation from a small incoming attack packet. *** Attacking SigComp asshown below: INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, %upper_bound_n, %uncompressed_n) Note that if n = 0 thentheINPUT-HUFFMAN instruction is ignoredDoS target bythe UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then decompression failure occurs. In all other cases, the behaviorfilling it with state Excessive state can only be installed by a malicious sender (or a set of malicious senders) with theINPUT-HUFFMAN instruction is defined below: 1.) Set j = 1. 2.) Request an additional bits_j compressed bits. Interpretconsent of thetotal (bits_1 + ... + bits_j) bitsapplication. The system consisting ofcompressed data requested so Price, Hannu, et al. [Page 39] INTERNET-DRAFTSigCompFebruary 14 , 2002 farand application is thus approximately asan integer H, with the first bit to be suppliedvulnerable as theMSB andapplication itself, unless it allows thelast bitinstallation of state from a message where it would not have installed state itself. If this is desirable to increase the compression ratio, the effect can besupplied asmitigated by adding feedback at theLSB (noteapplication level thatthis is always the case, independently ofindicates whether theNBO instruction has been used). 3.) If data isstate requested was actually installed -- This allows a system under attack to gracefully degrade by no longer installing compressor state thatlies beyond the end of the compressed message, terminate the INPUT-HUFFMAN instruction and move program execution to the memory address specifiedis not matched by application state. *** Attacking theformula (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 4.) If (H < lower_bound_j)UDVM by faking state or(H > upper_bound_j) then set j = j + 1. Then go back to Step 2, unless j > n in which case decompression failure occurs. 5.) Copy (H + uncompressed_j - lower_bound_j) modulo 2^16making unauthorized changes to state (See "Integrity risks" above.) *** Attacking thememory address specifiedUDVM bythe destination parameter. 9.5.8. STATE-REFERENCE The STATE-REFERENCE instruction retrieves some previously stored state information. STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, %state_destination)sending it looping code Theid_start and id_length parameters specifyapplication sets an upper limit to thelocationnumber ofthe state identifier"CPU cycles" that can be usedto retrieveper compressed message and per input bit in thestate information.compressed message. Thestate identifierdamage inflicted by sending packets with looping code isalways 16 bytes long;therefore limited, although this may still be substantial ifid_length is less than 16 then the remaining least significant bytesa large number ofthe identifierCPU cycles arepadded with zeroes. Decompression failure occurs if id_length is greater than 16. Decompression failure also occurs if no state information matchingoffered by thestate identifier canUDVM. However, this would befound. Note that when accessing state informationtrue for any decompressor thathas been previously createdcan receive packets from anywhere. 11. IANA considerations The SigComp solution currently requires two identifiers to be assigned by IANA: theUDVM,UDVM_version and the stateidentifier is always taken from an [MD5] hashidentifier. Upgraded versions of thestateUDVM will contain additional instructions tobe retrieved. However this is not necessarily the case for application-defined state as per Section 3.2. The state_start and state_length parameters defineimprove thestarting byte and numberperformance ofbytes to copy fromthestate_value containedoverall SigComp solution; new UDVM_version parameters will be needed inthe identified item of state. If more state is requested than is actually available then decompression failure occurs. The state_destination parameter contains a UDVM memory address. The requested state is byte copied tothismemory address using the rules of Section 7.4.case. 12. Acknowledgements Thanks to Price, Hannu, et al. [Page 40] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 20029.5.9. STATE-EXECUTE The STATE-EXECUTE instruction retrieves and runs some previously stored state information. STATE-EXECUTE (%id_start, %id_length) The id_start and id_length parameters function as per the STATE- REFERENCE instruction. STATE-EXECUTE is similar to STATE-REQUEST except that it does not require the amount of state being requested or the proposed destinationAbigail Surtees (abigail.surtees@roke.co.uk) Mark A West (mark.a.west@roke.co.uk) Lawrence Conroy (lwc@roke.co.uk) Christian Schmidt (christian.schmidt@icn.siemens.de) Max Riegel (maximilian.riegel@icn.siemens.de) Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) Stefan Forsgren (stefan.forsgren@epl.ericsson.se) Krister Svanbro (krister.svanbro@epl.ericsson.se) Miguel Garcia (miguel.a.garcia@ericsson.com) Christopher Clanton (christopher.clanton@nokia.com) Khiem Le (khiem.le@nokia.com) Ka Cheong Leung (kacheong.leung@nokia.com) forthe state to be specified explicitly. Instead, it simply puts the state_value back into the UDVM memory using the parameters state_startvaluable input andstate_length contained as part of the state information. The entire state_value (all state length bytes of it) is byte copied into the memory address specified by state start. The UDVM then jumps to the (absolute) memory address specified by state_instruction. Note that state start, state length and state_instruction are all stored together with state_value as part of an item of state information. 10. Security considerations 10.1 Security goals The overall security goal of the SigComp architecture is to not create risks that are in addition to those already present in the application protocols. There is no intention for SigComp to enhance the security of the protocols, as it always can be circumvented by not using compression. More specifically, the high-level security goals can be described as: -- do not worsen security of existing application protocol -- do not create any new security issues -- do not hinder deployment of application security 10.2 Security risks and mitigations This subsection identifies the potential security risks associated with the overall SigComp architecture, and details the proposed solution for each risk.review. 13. Authors' addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Hans Hannu Tel: +46 920 20 21 84 Email: hans.hannu@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Carsten Bormann Tel: +49 421 218 7024 Email: cabo@tzi.org Universitaet Bremen TZI Postfach 330440 D-28334 Bremen, Germany Jan Christoffersson Tel: +46 920 20 28 40 Email: jan.christoffersson@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Price, Hannu, et al. [Page 41] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002** Confidentiality risks *** Attacking SigComp by snooping into state of other users State can only be accessed using a state identifier, which is a (prefix of a) cryptographic hash of the state being referenced. This implies that the referencing packet already needs knowledge about the state. To enforce this, a reference length of 48 bits is defined. This also minimizes the probability of an accidental state collision. Generally, ways to obtain knowledge about the state identifier (e.g., passive attacks) will also easily provide knowledge about the state referenced, so no new vulnerability results. The application needs to handle state identifiers with the same care it would handle the state itself. ** Integrity risks The SigComp approach assumes that there is appropriate integrity protection below and/or above the SigComp layer. However, the state establishment mechanism provides additional potential to compromise the integrity of the messages (which, however, would most likely be detectable at the application layer). *** Attacking SigComp by faking state or making unauthorized changes to state State cannot be destroyed or changed by a malicious sender -- it can only add new state. Faking state is only possible if the hash allows intentional collision. ** Availability risks (avoid DoS vulnerabilities) *** Use of SigComp as a tool in a DoS attack to another target SigComp cannot easily be used as an amplifier in a reflection attack, as it only generates one decompressed message per incoming compressed message. This packet is then handed to the application; the utility as a reflection amplifier is therefore limited by the utility of the application. However, it must be noted that SigComp can be used to generate larger packets as input to the application than have to be sent from the malicious sender; this therefore can send smaller packets (at a lower bandwidth) than are delivered to the application. Depending on the reflection characteristics of the application, this can be considered a mild form of amplification. The application MUST limit the number of packets reflected to a potential target -- even if SigComp is used to generate a large amount of information from a small incoming attack packet. Price, Hannu, et al. [Page 42] INTERNET-DRAFT SigComp February 14 , 2002 *** Attacking SigComp as the DoS target by filling it with state Excessive state can only be installed by a malicious sender (or a set of malicious senders) with the consent of the application. The system consisting of SigComp and application is thus approximately as vulnerable as the application itself, unless it allows the installation of state from a message where it would not have installed state itself. If this is desirable to increase the compression ratio, the effect can be mitigated by adding feedback at the application level that indicates whether the state requested was actually installed -- This allows a system under attack to gracefully degrade by no longer installing compressor state that is not matched by application state. *** Attacking the UDVM by faking state or making unauthorized changes to state (See "Integrity risks" above.) *** Attacking the UDVM by sending it looping code The application sets an upper limit to the number of "CPU cycles" that can be used per compressed message and per input bit in the compressed message. The damage inflicted by sending packets with looping code is therefore limited, although this may still be substantial if a large number of CPU cycles are offered by the UDVM. However, this would be true for any decompressor that can receive packets from anywhere. 11. IANA considerations The SigComp solution currently requires two identifiers to be assigned by IANA: the UDVM_version and the state identifier. Upgraded versions of the UDVM will contain additional instructions to improve the performance of the overall SigComp solution; new UDVM_version parameters will be needed in this case. Well-known decompression algorithms will also need to be assigned fixed state identifiers. 12. Acknowledgements Thanks to Abigail Surtees (abigail.surtees@roke.co.uk) Mark A West (mark.a.west@roke.co.uk) Lawrence Conroy (lwc@roke.co.uk) Price, Hannu, et al. [Page 43] INTERNET-DRAFT SigComp February 14 , 2002 Christian Schmidt (christian.schmidt@icn.siemens.de) Max Riegel (maximilian.riegel@icn.siemens.de) Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) Stefan Forsgren (stefan.forsgren@epl.ericsson.se) Krister Svanbro (krister.svanbro@epl.ericsson.se) Christopher Clanton (christopher.clanton@nokia.com) Khiem Le (khiem.le@nokia.com) Ka Cheong Leung (kacheong.leung@nokia.com) for valuable input and review. 13. AuthorsĘ addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Hans Hannu Tel: +46 920 20 21 84 Email: hans.hannu@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Carsten Bormann Tel: +49 421 218 7024 Email: cabo@tzi.org Universitaet Bremen TZI Postfach 330440 D-28334 Bremen, Germany Jan Christoffersson Tel: +46 920 20 28 40 Email: jan.christoffersson@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Zhigang Liu Tel: +1 972 894-5935 Email: zhigang.liu@nokia.com Nokia Research Center 6000 Connection Drive Price, Hannu, et al. [Page 44] INTERNET-DRAFT SigComp February 14 , 2002 Irving, TX 75039 USA Jonathan Rosenberg Email: jdrosen@dynamicsoft.com dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 14. References [SIP] "SIP: Session Initiation Protocol", Handley et al, RFC 2543, Internet Engineering Task Force, March 1999 [RTSP] "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A. Rao and R. Lanphier, , RFC 2326, April 1998 [HTTP] "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et al.", RFC 2616, June 1999 [SIPsrv] "SIP: Locating SIP Servers", J. Rosenberg, H. Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002, work in progress [DEFLATE] "DEFLATE Compressed Data Format Specification version 1.3", P. Deutsch, RFC 1951, Internet Engineering Task Force, May 1996 [SCTP] "Stream Control Transmission Protocol", Stewart et al, RFC 2960, Internet Engineering Task Force, October 2000 [MD5] "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321, Internet Engineering Task Force, April 1992 [RFC-1662] "PPP in HDLC-like Framing", Simpson et al, Internet Engineering Task Force, July 1994 [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use in RFCs to Indicate Requirement Levels", Scott Bradner, Internet Engineering Task Force, March 1997 Price, Hannu, et al. [Page 45] INTERNET-DRAFT SigComp February 14 , 2002 Appendix A. Mnemonic language Writing UDVM programs directly in bytecode would be a daunting task, so a simple mnemonic language is provided to facilitate the creation of new decompression algorithms. Most importantly, the language allows the parameters of an instruction to be specified as text names rather than as integer values. If an instruction parameter is given as a text name, it should correspond to exactly one instance of a label, a reserved memory address or an externally defined keyword. A label is simply a text name preceded by a colon, for example: :loop JUMP (loop) For any parameters corresponding to a label, the integer value of the parameter is calculated by the following formula: parameter_value = (instruction_address - label_address) modulo 2^16 Note that the "label address" is simply the memory address of the instruction immediately following the label. In particular, the above example can be rewritten as JUMP (0). A reserved memory address is specified using the "reserve" keyword followed by a text_name and (optionally) an integer value. For example: reserve apples reserve pears (8) reserve bananas LOAD (bananas, 5) For any parameters corresponding to a reserved memory address, the integer value of the parameter is the next free memory address that has not yet been reserved. Starting at this address, the specified number of bytes of memory are then reserved (if no value is given then a total of 2 bytes is reserved). The first instance of a "reserve" keyword begins reserving memory at Address 6 (to avoid overwriting the three well-known variables of Section 7.2). So the above example can be rewritten as LOAD (16, 5). An externally defined keyword is specified outside of the mnemonic language. All of the application-defined parameters are considered to be externally defined keywords and can be referenced in the mnemonic code (useful for adapting the code based on the available memory or CPU cycles). The following additional keywords can also be used: Price, Hannu, et al. [Page 46] INTERNET-DRAFT SigComp February 14 , 2002 Keyword: Corresponding value: byte_copy_left 0 byte_copy_right 2 stack_location 4 reserved_end See below bytecode_length See below total_length See below The keyword reserved_end specifies the highest reserved memory address for the entire mnemonic code (taking into account all the occasions where memory is reserved). The keyword bytecode_length specifies the total size of the bytecode corresponding to the mnemonic code. Any instances of bytecode_length are initially replaced with 3 bytes of zeroes, and then are filled in after the remainder of the bytecode has been generated. Similarly, the keyword total_length specifies the total amount of memory required at the UDVM including bytecode and reserved memory addresses. A complete description of the mnemonic language and how it should be translated into bytecode is given below: Instructions: Instruction names are given in capitals. Replace each name with the corresponding 1-byte value as per Chapter 9. $: When appended to the front of an instruction parameter then the parameter is a memory address rather than a direct value. This symbol is mandatory for reference parameters, optional for multitype parameters and disallowed for literals. Integers: Instruction parameters can be given in the form of decimal integers. They are converted into the shortest bytecode capable of representing the integer by the rules of Section 7.3. Text references: Instruction parameters can also be given in the form of lowercase names. These names should match exactly one label, reserved memory address or externally defined keyword as described above. Labels: Label names are given as a colon followed by lowercase text. They are deleted when converting the mnemonics to bytecode. Reserved memory: Memory addresses are reserved using the "reserve" keyword. The line containing the reserve keyword Price, Hannu, et al. [Page 47] INTERNET-DRAFT SigComp February 14 , 2002 is deleted when converting to bytecode. .LSB: When appended to the end of a text name, the integer value corresponding to the name is increased by 1. This is useful for addressing the LSBs of a 2-byte variable. 0b, 0d: Bytecode values can be specified directly in binary or decimal via the appropriate prefix. The direct bytecode continues until a character occurs that is not an integer or whitespace. Whitespace: All whitespace (plus brackets and commas) just delimit the instructions. Delete. Comments: These are indicated by a semicolon and continue to the end of the line. Delete. Once the mnemonic code has been converted into bytecode, it can be executed by copying the bytecode into the UDVM memory beginning at the first memory address that has not been reserved by an instance of the "reserve" keyword. Program execution is assumed to begin at this address. Note that further to the rules outlined above, well-written mnemonic code will also have the following properties: * Any instance of a memory address will be specified as a text reference rather than an integer value. This ensures that the mnemonic code is portable. * The mnemonic code will not write to any memory address except those reserved by the "reserve" keyword. This ensures that the code can be compiled. Appendix B. Example application-defined parameters This appendix gives some example values for each of the application- defined parameters. These values are geared towards the compression of a signaling protocol such as [SIP]. Note that all of the proposed values are fixed and not negotiated between the two instances of the application invoking SigComp. This is because it is possible for the application invoking the decompressor to receive compressed messages from several different applications, and it is difficult to determine which message corresponds to which application. [SIP] does this using "From:" and "To:" fields in the message itself, but these are not visible until the message has been decompressed. It is simpler just to fix a set of parameter for every instance of the application. Price, Hannu, et al. [Page 48] INTERNET-DRAFT SigComp February 14 , 2002 UDVM_version 0 minimum_compression_ratio 0.5 maximum_compressed_size 65535 maximum_uncompressed_size 65535 minimum_hash_size 6 overall_memory_size 8192 working_memory_start 0 working_memory_end 8191 cycles_per_bit 20 cycles_per_message 2000 first_instruction 6 Note that the parameters overall_memory_size, cycles_per_bit and cycles_per_message can be increased on the fly using the capabilities announcement mechanism. This mechanism is designed to function correctly even when the receiving application is sent compressed messages from several different applications. The initial contents of the UDVM memory also need to be defined. It is not enough simply to initialize the memory containing all zeroes, as the UDVM would be unable to input any compressed data. Instead, for each new compressed message the memory should be initialized containing a simple decompressor capable of extracting the first few bytes of compressed data. These bytes can then be interpreted as a state identifier to retrieve the correct decompression algorithm. As an example, the following mnemonic code can be converted to bytecode and pasted into the UDVM memory beginning at Address 6: reserve state_identifier (6) INPUT-BYTECODE (6, state_identifier, fail) STATE-EXECUTE (state_identifier, 6) :fail DECOMPRESSION-FAILURE Finally, the application can define initial state that is available to the UDVM. Examples of application-defined state include common decompression algorithms, dictionaries of common text phrases etc. Appendix C. Example decompression algorithms This appendix gives examples of decompression algorithms which can be run on the UDVM in the form of bytecode. C.1. Example UDVM code for simple LZ77 decompression The first example gives the code required to decompress data from a very simple LZ77-based algorithm. The UDVM is instructed to interpret a compressed message as a set of 4-byte characters, where each character contains a 2-byte position integer followed by a 2-byte Price, Hannu,Zhigang Liu Tel: +1 972 894-5935 Email: zhigang.liu@nokia.com Nokia Research Center 6000 Connection Drive Irving, TX 75039 USA Jonathan Rosenberg Email: jdrosen@dynamicsoft.com dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 14. References [SIP] "SIP: Session Initiation Protocol", Handley etal. [Page 49] INTERNET-DRAFT SigComp February 14al, RFC 2543, Internet Engineering Task Force, March 1999 [RTSP] "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A. Rao and R. Lanphier, ,2002 length integer. Taken together these integers point to a previously received text stringRFC 2326, April 1998 [HTTP] "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et al.", RFC 2616, June 1999 [SIPsrv] "SIP: Locating SIP Servers", J. Rosenberg, H. Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002, work in progress [DEFLATE] "DEFLATE Compressed Data Format Specification version 1.3", P. Deutsch, RFC 1951, Internet Engineering Task Force, May 1996 [SCTP] "Stream Control Transmission Protocol", Stewart et al, RFC 2960, Internet Engineering Task Force, October 2000 [MD5] "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321, Internet Engineering Task Force, April 1992 [RFC-1662] "PPP inthe UDVM memory, which is then copied to the end of the uncompressed message. Since the compressor can only send references to strings already presentHDLC-like Framing", Simpson et al, Internet Engineering Task Force, July 1994 [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use inthe UDVM memory, before the first message is decompressed the memory must be initialized with a static dictionary containing the 256 ASCII characters. The algorithm write-protects the memory containing the UDVM instructions used to decompress each character, so that they can easily be compiled to improve the speed of decompression. A 2-byte CRC over the uncompressed message is appended to the end of the compressed message,RFCs toverify that correct decompression has occurred. The algorithm also requests that the contents of the UDVM memory be saved using the state request mechanism, so that it can be retrieved by sending the appropriate 6-byte hash. reserve byte_copy_left reserve byte_copy_right reserve uncompressed_start reserve uncompressed_end reserve uncompressed_length reserve position reserve length reserve static_dictionary (256) reserve circular_buffer (2048) WORKING-MEMORY (uncompressed_start, reserved_end) MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary, circular_buffer, 0, 0, 0) :unpack_static_dictionary ; The following instructions initialize the static dictionary. COPY-LITERAL (position.LSB, 1, $uncompressed_start) ADD ($position, 1) COMPARE ($position, 256, unpack_static_dictionary, next_character, 0) :next_character INPUT-FIXED (16, position, fail) INPUT-FIXED (16, length, end_of_message) COPY-LITERAL ($position, $length, $uncompressed_end) ADD ($uncompressed_length, $length) JUMP (next_character) :failIndicate Requirement Levels", Scott Bradner, Internet Engineering Task Force, March 1997 Price, Hannu, et al. [Page50]42] INTERNET-DRAFT SigCompFebruary 14 ,March 1, 2002DECOMPRESSION-FAILURE :end_of_message CRC ($position, $uncompressed_start, $uncompressed_length, fail) OUTPUT ($uncompressed_start, $uncompressed_length) END-MESSAGE (6, 0, total_length, next_character, 0)AppendixD.A. Document history - October 19, 2001, version 00 First version. The draft describes the current ideas, from people involved in the ROHC WG, of how to perform compression of application signaling messages. - October 31, 2001, version 01 Second version. Additional section, 5.2.1, which describes when a message identifier can be reused. - November 21, 2001, version 02 Third version. Section 6 has been moved to a separate draft. The third version describes a modular solution, providing flexibility for implementers to decide which functions they want to integrate. - January 28, 2002, version 03 Fourth version. SigComp version 02 is divided into this draft, a UDVM draft andaan extended operation mechanisms draft. Compressor/decompressor (UDVM) state approach has been introduced for security reasons. - February 14, 2002, version 04 Fifth version. Describes the complete base SigComp solution including the UDVM. - March 1, 2002, version 05 Sixth version. Comments from several authors and contributors have been taken into account. Announcement mechanism has been updated. This Internet-Draft expires inAugustSeptember 2002. Price, Hannu, et al. [Page51]43] ----