draft-ietf-rohc-sigcomp-04.txt  -->   draft-ietf-rohc-sigcomp-05.txt

view Side-By-Side changes

 
 
 
Network Working Group                  Richard Price, Siemens/Roke Manor 
INTERNET-DRAFT                                      Hans Hannu, Ericsson  
Expires: August September 2002                  Carsten Bormann, TZI/Uni Bremen 
                                           Jan Christoffersson, Ericsson 
                                                      Zhigang Liu, Nokia 
                                         Jonathan Rosenberg, dynamicsoft 
  
                                                       February 14, 
  
                                                           March 1, 2002 
 
 
                      Signaling Compression (SigComp) 
                     <draft-ietf-rohc-sigcomp-04.txt> 
                     <draft-ietf-rohc-sigcomp-05.txt> 
                                     
    
Status of this memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or cite them other than as "work in progress". 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/lid-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
   This document is a submission of the IETF ROHC WG. Comments should be 
   directed to its mailing list, rohc@cdt.luth.se. rohc@ietf.org. 
    
    
Abstract 
    
   This document defines SigComp, a solution for compressing messages 
   generated by text-based application protocols such as SIP [SIP] and RTSP [RTSP]. The 
   architecture and pre-requisites of SigComp are outlined, along with 
   the format of the SigComp message. 
    
   Decompression functionality for the SigComp solution is provided by a 
   "Universal Decompressor Virtual Machine" optimized for the task of 
   running decompression algorithms. The UDVM can be configured to 
   understand the output of many well-known compressors such as 
   [DEFLATE]. 

 
 
 
Price, Hannu, et al.                                            [Page 1]  

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
Table of contents 
   
   1.  Introduction..................................................2 
   2.  Terminology...................................................3 
   3.  SigComp Architecture..........................................5 Architecture..........................................6 
   4.  SigComp message flow..........................................11 
   5.  SigComp compressor............................................15 compressor............................................14 
   6.  State handling and capability announcement....................18 announcement...............................16 
   7.  Overview of the UDVM..........................................22 UDVM..........................................20 
   8.  Decompressing a SigComp message...............................26 message...............................23 
   9.  UDVM instruction set..........................................30 set..........................................26 
   10. Security considerations.......................................41 considerations.......................................38 
   11. IANA considerations...........................................43 considerations...........................................40 
   12. Acknowledgements..............................................43 Acknowledgements..............................................41 
   13. AuthorsĘ addresses............................................44 addresses............................................41 
   14. References....................................................45 References....................................................42 
   Appendix A. Mnemonic language.....................................46 
   Appendix B. Example application-defined parameters................48 
   Appendix C. Example decompression algorithms......................49 
   Appendix D. Document history......................................51 history......................................43 
 
1.  Introduction 
    
   The Session Initiation Protocol (SIP) [SIP], along with many other 
   application protocols used for multimedia communications such as RTSP 
   [RTSP], is a textual protocol engineered for bandwidth rich links. As 
   a result, the SIP messages have not been optimized in terms of size. 
   Typical SIP messages are range from a few hundred bytes to as high as 2000. two 
   thousand. To date, this has not been a significant problem.  
    
   With the planned usage of these protocols in wireless handsets as 
   part of 2.5G and 3G cellular networks, the large size of these 
   messages is problematic. With low-rate IP connectivity, store-and-
   forward delays are significant. Taking into account retransmits, and 
   the multiplicity of messages that are required in some flows, call 
   setup and feature invocation are adversely affected. Therefore, we 
   believe there is merit in reducing these message sizes.  
 
   This document outlines the architecture and pre-requisites of the 
   SigComp solution including the capability announcement and UDVM 
   algorithm upload, along with solution, the format of the SigComp message. message, algorithm 
   upload, and the Universal Decompressor Virtual Machine (UDVM) that 
   provides decompression functionality. 
    
   SigComp is typically offered to applications as a "shim" layer between the 
   application and the transport. The service provided is that of the 
   underlying transport plus compression. Both connection-oriented and 
   connectionless transports are supported by SigComp. 
    
   This document focuses on the signaling scenario where an endpoint 
   sends and receives data to/from an outbound/inbound end-terminal 
   communicates with a proxy. However, However SigComp is designed to run over both connectionless and connection-
   oriented transports and hence may be applicable to other 
   scenarios with multiple endpoints compressing and decompressing data. 
    
    
    

 
 
 
Price, Hannu, et al.                                            [Page 2] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
2.  Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC-2119]. 
    
   SigComp 
    
     The overall solution for signaling compression, comprising the  
     compressor, decompressor, dispatchers and state handler. 
    
   Application 
    
     For the purpose of this document, an application is a text-based 
     protocol software that: 
    
     a) sends application data to the compressor dispatcher 
     b) receives data from the decompressor dispatcher 
     c) decides whether authenticates the sender of a decompressed message and gives 
        permission for state information may to be saved in the sender's name 
    
   Application message 
    
     An uncompressed message provided to or from the application. 
    
   Endpoint 
    
     One instance of an application plus a SigComp layer. Each endpoint  
     is capable of sending and/or receiving SigComp messages. 
    
   Endpoint identity 
    
     A unique indicator assigned to each endpoint by the application  
     (for example an URI). The application authenticates the sender of 
     a decompressed message, and provides their endpoint identity to the 
     SigComp state handler. 
    
   Transport 
    
     Mechanism for passing data between two instances of an application. endpoints. SigComp is  
     capable of sending messages over a wide range of transports  
     including TCP, UDP and [SCTP]. 
    
   Message-based transport 
    
     A transport that carries data as a set of distinct, bounded messages. 
    
   Stream-based transport 
    
     A transport that carries data as a continuous stream with no  
     message boundaries. In this case, 

 
 
 
Price, Hannu, et al.                                            [Page 3] 

INTERNET-DRAFT                  SigComp reserves the character  
     0xFFFF to delimit messages in the compressed stream.                   March 1, 2002 
 
 
   Application-defined parameters 
    
     Parameters that must be agreed upon by the applications local and remote  
     endpoints invoking SigComp. Depending on Values for the situation these application-defined  
     parameters might be are typically fixed  
     a-priori or negotiated. 
    
   Application message 
    
     An uncompressed message, as provided from or to the application, 
     which is to be compressed by the compressor. When delivered 
     from the decompressor the data has passed through meet the decompression 
     process and is referred to as decompressed data or requirements of a decompressed 
     message. 
    

 
 
 
Price, Hannu, et al.                                            [Page 3] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
     particular signaling application. 
    
   SigComp message 
    
     May contain a compressed application message in the form of UDVM  
     bytecode. In case of a message- based transport, message-based transport such as UDP, a  
     SigComp message corresponds to exactly one (UDP) datagram. For a  
     stream-based transport, transport such as TCP, each SigComp message is  
     separated by a 0xFFFF reserved delimiter. 
    
   Standalone SigComp message 
    
     A SigComp message that does not include any compressed application  
     data. Certain signaling applications may not allow standalone 
     SigComp messages due to security requirements. 
    
   Compressor 
    
     The compressor 
    
     Entity that invokes the an encoder, and keeps track of states that can  
     be used for compression. It is responsible for supplying UDVM 
     bytecode to the remote decompressor in order for compressed 
     data to be decompressed. 
    
   Encoder 
    
     Encodes data according to a (compression) algorithm into UDVM 
     bytecode. The encoded data can be decoded by a UDVM. particular compression algorithm. 
    
   Compressor dispatcher 
    
     A layer 
    
     Entity that receives uncompressed application messages, invokes a 
     compressor, and forwards the resulting SigComp messages to a remote SigComp 
     layer.  
     endpoint. 
    
   Decompressor 
    
     The decompressor 
    
     Entity that is responsible for converting a SigComp message into  
     uncompressed data. Decompression functionality is provided by the  
     UDVM. 
    
   Decompressor dispatcher 
    
     A layer 
    
     Entity that receives SigComp messages, invokes a decompressor, and 
     forwards the decompressed application messages to an application. 
    
    
    

 
 
 
Price, Hannu, et al.                                            [Page 4] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Virtual machine 
    
     A machine architecture designed to be implemented in software 
     (although silicon implementations are of course possible). 
    
   Universal Decompressor Virtual Machine (UDVM) 
    
     The virtual machine described in this document. The UDVM is used  
     for decompression of SigComp messages. 
    
   Bytecode 
    
     Machine code that can be executed by a virtual machine. UDVM 
     bytecode is a combination of UDVM instructions and compressed data. 

 
 
 
Price, Hannu, et al.                                            [Page 4] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
    
   Per-message compression 
    
     Compression that does not reference data from previous messages.  
     SigComp can decompress a message of this type using only the  
     application-defined parameters and the data in the message itself. 
    
   Dynamic compression 
    
     Compression relative to messages sent prior to the current  
     compressed message. SigComp stores and retrieves this data using  
     the state handler. 
    
   State 
    
     Data saved for retrieval by later SigComp messages. The data An item of  
     state typically reflects the contents of the UDVM memory after 
     decompressing a message, but state can also be saved created by the 
     compressor or by the application. 
    
   State handler 
    
     Entity responsible for storing and accessing state information 
     once permission is granted by the application. 
    
   State identifier 
    
     Reference used to access an item of state previously saved created by the  
     compressor, the decompressor or the application. 
    
   CPU cycles 
    
     A measure of the amount of "CPU power" required to execute a UDVM  
     instruction (the simplest UDVM instructions require a single CPU  
     cycle). An upper limit is placed on the number of cycles that can  
     be used to decompress each bit in a compressed message. 


 
 
 
Price, Hannu, et al.                                            [Page 5] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
3.  SigComp Architecture 
 

   In the SigComp architecture compression and decompression is 
   performed at two communicating endpoints. entities. SigComp is offered to 
   applications as a "shim" layer between the application and the 
   underlying transport, and so these entities are endpoints when viewed 
   from a transport layer perspective. Note however that from the 
   application perspective SigComp is applied on a per-hop basis. 
    
   Figure 1 shows the layout of a communicating endpoint that implements 
   a SigComp layer. The figure does not mandate any particular 
   implementation, but is shown to the reader for the sake of clarity. 
    
   The SigComp is typically offered to applications as a "shim" layer 
   between the application and is further decomposed in the transport. Note however that for 
   certain applications following components: 
    
   - A compressor dispatcher: this is the compressed SigComp message may be passed 
   back to interface from the  
     application. The compressor dispatcher receives an application itself  
     message and an identifier for additional processing before 
   transmission. For example, the application may wish to apply 
   encryption to receiving endpoint. Based on the compressed 
     endpoint identity the compressor dispatcher invokes a particular  
     compressor, which returns a SigComp message before handing it that is forwarded to  
     the 
   transport. 
    
   The remote SigComp layer is common for several text based protocol 
   applications (identified as Application 1 and Application 2 in Figure 
   1). These applications are not part of the SigComp layer. 

 
 
 
Price, Hannu, et al.                                            [Page 5] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The SigComp layer is further decomposed in the following components: 
    
   - A compressor dispatcher: this is the interface from the  
     applications. Application messages are received by the compressor  
     dispatcher, and based on the application requirements, the  
     compressor dispatcher invokes a particular compressor to achieve  
     the desired compression ratio using the allocated processing and 
     memory resources. The compressor returns a SigComp message that is  
     forwarded to the remote SigComp peer. endpoint. 
    
   - A decompressor dispatcher: this is the interface towards the  
     applications.  
     application. A SigComp message is received by the decompressor  
     dispatcher and an instance of the UDVM a decompressor is invoked. Once the  
     dispatcher has received the (decompressed) application data it  
     determines the target application and forward  
     forwards the message to it. the application. 
    
   - One or more compressors: the compressors a distinct compressor is invoked for each contain an algorithm  
     to perform 
     remote endpoint with which the compression. local application wishes to  
     communicate. A compressor receives an (uncompressed) application  
     message from the compressor dispatcher, compresses the  
     message, and returns a SigComp message to the compressor  
     dispatcher. During the compression process, process the compressor may  
     invoke the state handler to restore a previous state or save a new  
     one. The  
     state. Each compressor is responsible for providing the remote  
     decompressor with suitable UDVM bytecode to reconstruct the  
     original application message. Within the compressor, the entity  
     which runs the actual compression chooses a certain algorithm (minus state management 
     issues) is known as to encode the "encoder".  
     data, (e.g. [DEFLATE]). 
    
   - One or more decompressors: the decompressors contain the needed 
     UDVM to perform the decompression. The since SigComp can run over an unsecure 
     transport layer, a distinct decompressor must be invoked on a 
     per-message basis. A decompressor receives a SigComp message from  
     the decompressor dispatcher, decompress decompresses the message, and returns  
     the (decompressed) application message to the decompressor  
     dispatcher. During the decompression process, the decompressor may  
     invoke the state handler to restore a previous state or save a new one.  
     state. 
    
   - State handler: this entity contains enough logic to store and  
     retrieve states. A state State is data information that is stored between  
     SigComp messages: this data can be saved either by a compressor, a  
     decompressor or an application. The saved state may be used for  
     (de)compression between a compressor and its peer decompressor.  
     The For security purposes the state  

 
 
 
Price, Hannu, et al.                                            [Page 6] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
     handler is also responsible for asking must always ask the application to grant permission for new  
     states to be saved by the state handler. saved. State  
     parameters creation and retrieval of states are further  
     described in Chapter 6. 
    
    





 
 
 
Price, Hannu, et al.                                            [Page 6] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
               +-----------------+        +-----------------+ 
               | 
    
    
               +---------------------------------------------+ 
               |                                             | 
               | 
               |  Application 1  |<-+  +->|                 Application 2  | 
               |                 |  |                 | 
               |                                             | 
               |                                             | 
               +---------------------------------------------+ 
                      |                    |         ^ 
            Message & |           Endpoint | 
               +-----------------+  |  |  +-----------------+         |   ^ Decompressed 
             endpoint |           identity |         | ^ message 
             identity |                    |  Application msgs.         | 
                      |                    | +-------------------------+         | 
       +-- -- -- -- --| | | - --|-- -- -- -- -- -- --|-- -- -- |- --  |-- -- -- -- --+ + 
       |              |                    | +-----------------------+         |              | 
                      v                    v         | 
       |          | | 
       |    +--------------+        |  |  +--------------+  +--------------+    | 
    SigComp |              |  |              |  |              | SigComp 
    message |  Compressor  |  |    State     |  | Decompressor | message 
    <-------|  dispatcher  |  |   handler    |  |  dispatcher  |<-------    
       |    |              |  |              |  |              |    | 
            +--------------+        |  |  +--------------+  +--------------+ 
       |           ^  ^             |  |          ^  ^  ^  ^          ^  ^           | 
                   |  |          |  |  |  |          |  | 
       |           |  |          |  |  |  |          |  |           | 
                   |  |          |  |  |  |          |  | 
       |           v  |          |  |  |  |          v  |           | 
            +--------------+        v     |  |  |  |     +--------------+ 
       |    | Compressor 1 |    +---------+     |  |  |  |     |Decompressor 1|    | 
            |              |<-->|  State  |<-->|              |<----+  |  |  +---->|              | 
       |    |  (Encoder)   |        | Handler  |        |    (UDVM)    |    | 
            |              |    +---------+        |  |        |              | 
       |    +--------------+        |  |        +--------------+    | 
                      |             |  |                | 
       |              v             |  |                v           | 
            +--------------+           v        |  |        +--------------+ 
       |    | Compressor 2 |    +---------+        |  |        |Decompressor 2|    | 
            |              |<-->|  State  |<-->|              |<-------+  +------->|              | 
       |    |  (Encoder)   |                    | Handler |    |    (UDVM)    |    | 
            |              |    +---------+                    |              | 
       |    +--------------+                    +--------------+    | 
    
       |                        SigComp layer                       | 
       +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --+ -- + 
    
    Figure 1: High-level architectural overview of one SigComp peer. endpoint 
    


 
 
 
Price, Hannu, et al.                                            [Page 7] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Note that it is possible for SigComp to decompress messages from 
   multiple 
   compressors endpoints at different physical locations in a network. The network, as 
   the architecture is designed to prevent data from one compressor endpoint 
   interfering with data from a different compressor. endpoint. A consequence of 
   this design choice is that it is difficult for a malicious user to 
   disrupt decompressor SigComp operation by inserting false compressed messages on 
   the transport. 


 
 
 
Price, Hannu, et al.                                            [Page 7] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The decompressors transport layer. 
    
   Each decompressor in the architecture of Figure 1 should be viewed as containers for 
   UDVMs; the actual decompressor functionality is handled by invoking an instance of 
   the UDVM. Universal Decompressor Virtual Machine (UDVM). Figure 2 gives a 
   more detailed view of a UDVM, including all of the interfaces between 
   the UDVM and its environment. 
    
    
   +----------------+                                 +----------------+ 
   |                |     Request compressed data     |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |     Provide compressed data     |                | 
   |                |                                 |   Dispatcher   | 
   |                |                                 |                | 
   |                |    Output uncompressed decompressed data     |                | 
   |                |-------------------------------->|                | 
   |                |                                 |                | 
   |                |                                 +----------------+ 
   |      UDVM      | 
   |                |                                 +----------------+ 
   |                |    Request state information    |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |    Provide state information    |                | 
   |                |                                 |     State      | 
   |                |                                 |    Handler     | 
   |                |   Make state creation request   |                | 
   |                |-------------------------------->|                | 
   |                |      Forward capability announcement       |                | 
   |                |                                 |                | 
   +----------------+                                 +----------------+ 
    
         Figure 2: Interfaces between the UDVM and its environment 
    
   Note that for simplicity, the UDVM indicates when it requires 
   additional compressed data or state information using an explicit 
   instruction. It then pauses and waits for the information to be 
   supplied before continuing with the next instruction. This prevents 
   the arrival of more data from interfering with the operation of the 
   UDVM (e.g. by accidentally overwriting UDVM memory that is currently 
   in use). 
    
    
    

 
 
 
Price, Hannu, et al.                                            [Page 8] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
3.1.  Requirements on application 
    
   From an application perspective the SigComp layer typically appears as a new 
   transport, with similar behavior to the original transport used to 
   carry uncompressed data (for example SigComp/UDP behaves similarly to 
   native UDP). 
    


 
 
 
Price, Hannu, et al.                                            [Page 8] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
    
   If the application wishes to mix SigComp messages with other types of 
   data (e.g. uncompressed data) data, or SigComp data for a different 
   application) on the same transport then the transport must 
   distinguish between the two different types of data. For UDP and 
   TCP this This means that a 
   new port will need to be reserved or discovered for 
   compressed data. the SigComp 
   messages destined for a particular application. For example [SIP] SIP uses 
   port 5060 for TCP and port 5061 for TLS/TCP, so it could similarly 
   reserve another port for SigComp/TCP. 
    
   In the interests of security, a new interface is required to the 
   signaling application in order to leverage the authentication 
   functions built into the application itself. For each When the application 
   receives a decompressed message that is accompanied by a state creation request, it determines the state 
   handler needs identity of the 
   sending endpoint and supplies this information to find out whether the state handler. 
    
3.2.  Application-defined parameters 
    
   When an application considers invokes SigComp, a number of parameters are 
   provided by the 
   message application to be legitimate. If the decompressed message is considered 
   to be invalid then the state handler cannot create the requested 
   state information. This interface is marked on the architecture of 
   Figure 1.  
    
3.2.  Application-defined parameters 
    
   When an application invokes SigComp, a number of parameters are 
   provided by the application to control control the maximum size of compressed 
   messages, the UDVM memory size etc. The two instances of the 
   application local and remote applications 
   that wish to communicate MUST initially agree on a common set of 
   values for these parameters. 
    
   Note that if a reverse channel is available then SigComp can perform 
   an internal "capability announcement" to indicate that additional 
   memory or CPU cycles the majority of application-defined parameters are available. This means that it is generally 
   sufficient to set to 
   fixed values for a particular signaling application. However, 
   endpoints implementing SigComp will typically have a wide range of 
   capabilities; each offering a different amount of working memory, 
   processing power and so on. In order to support this wide variation 
   in endpoint capabilities, SigComp includes a mechanism for modifying 
   the following application-defined parameter 
   (there parameters on the fly: 
    
   UDVM_version 
   UDVM_memory_size 
   cycles_per_bit 
   cycles_per_message 
   Initial state 
    
   The SigComp announcement mechanism is no described further in Section 
   6.3. 
    
   The advantage of building the announcement mechanism into SigComp is 
   that it avoids the need to provide an external, application-specific for any form of negotiation mechanism). to be performed 
   by the application itself. Instead, it is sufficient to initialize 


 
 
 
Price, Hannu, et al.                                            [Page 9] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   all of the application-defined parameters to fixed values and modify 
   them later using SigComp itself. 
    
   Each application-defined parameter is described below. Appendix B 
   discusses how each 
    
   Note that unless otherwise indicated, all of the parameters affects SigComp operation in 
   greater detail, and recommends default values for the parameters. can be 
   stored as 2-byte integers. 
    
   UDVM_version 
    
     The UDVM_version parameter specifies the level of functionality 
     available at the UDVM. The basic version of the UDVM (Version 0) 
     is defined in this document. 
    
   minimum_compression_ratio 
    
   maximum_expansion_size 
    
     The minimum_compression_ratio maximum_expansion_size parameter prevents the generation of 
     excessively large SigComp messages. For If set to 0 then the parameter 
     is ignored by SigComp; for any other value then if an n byte uncompressed  
     message,  
     message is k bytes long, the corresponding SigComp message must be  
     no larger than 
     (n / minimum_compression_ratio) rounded down to the nearest byte. (k + maximum_expansion_size). Note that this parameter can be less than 1, (in which case a 
     certain amount of message expansion is allowed) or 0 (in which case  

 
 
 
Price, Hannu, et al.                                            [Page 9] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
     no minimum_compression_ratio needs to be met). Any any value  
     other than 0 bans the creation of standalone SigComp messages (i.e.  
     messages that do not contain a compressed application message). 
    
   maximum_compressed_size 
    
     The maximum_compressed_size parameter limits the size of one  
     compressed message. SigComp rejects any message larger than the 
     specified value. 
    
   maximum_uncompressed_size 
    
     The maximum_uncompressed_size parameter limits the size of one  
     uncompressed message. SigComp rejects any message larger than the 
     specified value. 
    
   minimum_hash_size 
    
     The minimum_hash_size parameter specifies the minimum size of the 
     state identifier when creating new state information. This value  
     needs to be sufficiently large to prevent malicious users from  
     guessing a state identifier by brute force. 
    
   overall_memory_size 
    
   UDVM_memory_size 
    
     The overall_memory_size UDVM_memory_size parameter specifies the total number of 
     bytes in the UDVM memory. 
    
   working_memory_start 
    
     The working_memory_start parameter specifies the start of the UDVM  
     memory area that can be modified. Memory addresses below this 
     value are considered read-only by the UDVM. 
    
   working_memory_end 
    
     The working_memory_end parameter specifies the end of the UDVM  
     memory area that can be modified. Memory addresses above this 
     value are considered read-only by the UDVM. 
    
   cycles_per_bit 
    
     The cycles_per_bit parameter specifies the number of "CPU cycles"  

 
 
 
Price, Hannu, et al.                                           [Page 10] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
     that can be used to decompress a single bit of data. One CPU cycle  
     typically corresponds to a single UDVM instruction, although some  
     of the high-level instructions may require additional cycles. 
    
   cycles_per_message 
    
     The cycles_per_message parameter specifies the number of additional  
     CPU cycles made available at the start of a compressed message.  

 
 
 
Price, Hannu, et al.                                           [Page 10] 

INTERNET-DRAFT                  SigComp              February 14 , 2002  
     These cycles can be useful when decompressing algorithms that  
     upload additional data on a per-message basis, for example a new  
     set of Huffman codes as with [DEFLATE]. 
    
     The total maximum number of "CPU cycles" available for each  
     compressed message is specified by the following formula: 
    
     total_cycles 
    
     maximum_cycles = message_size * cycles_per_bit + cycles_per_message 
    
   first_instruction 
    
   maximum_state_size 
    
     The first_instruction maximum_state_size parameter specifies the memory address maximum amount of the 
     first instruction to 
     state information that can be executed when the UDVM is initialized. 
    
   Initial memory contents 
    
     When the UDVM is invoked its memory is reset to contents defined saved by  
     the application. This code is executed a local endpoint, for every SigComp message 
     (so typically each 
     remote endpoint with which it performs a simple task such as extracting the  
     first n bytes from communicates. Note that the message and interpreting them amount of  
     state information is expressed as a multiple of the parameter  
     UDVM_memory_size, because an item of state  
     identifier). 
    
   Initial state 
    
     As well as deciding generally 
     reflects the initial contents of the UDVM memory, the memory. 
    
   Initial state 
    
     The application can also store useful information in the form of state.  
     This predefined state is used to offer a range of well-known 
     decompression algorithms to the compressor, which can choose to 
     avoid uploading bytecode for a new algorithm if it supports one of 
     the well-known algorithms. Each item of initial state can be made 
     mandatory for every instance of the application, or it can be made 
     optional (in which case support for the relevant state will need to  
     be advertised before the state can be used). 
    
    
4.  SigComp message flow 
    
   This chapter describes the SigComp message flow, including the 
   initialization, capability announcement flow and exchange of compressed 
   messages. 
    
   In the architecture of Figure 1, this chapter describes the operation of 
   the compressor and decompressor dispatcher. 
    
4.1.  Message exchange 
    
   The local SigComp layer may send compressed data to a remote SigComp 
   layer, and the local SigComp layer may also receive compressed data. 
   However, 
   Note however that compression in one direction does not necessarily 
   imply compression in the reverse direction. Furthermore, even in the 
   case that there are two unidirectional compressed flows between two 


 
 
 
Price, Hannu, et al.                                           [Page 11] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   SigComp layer, layers, there is no need to use the same compression 
   algorithm at both compressors. 
    
4.1.1.  Operation for each compressor-decompressor pair 
 
   An endpoint that wants to send compressed data to a remote party must 
   initialize a 
    
4.2.  SigComp layer at the local party prior to its use, so 
   that the decompressor dispatcher in message format 
    
   In every SigComp message the remote endpoint assigns first few bytes are interpreted as a 
   decompressor 
   state identifier that accesses some previously stored state 
   information. 
    
   This state information includes all of the data needed to decompress 
   the SigComp message: including the decompression algorithm that will 
   be used and applied to the UDVM is loaded with remainder of the message, as well as any additional 
   information that is required (e.g. one or more previously received 
   messages if dynamic compression 
   algorithm. is in use). 
    
   The process format of the basic SigComp message is described given in Figure 3. 
    
    
             +--------------+                  +--------------+ 
             |              |                  |              | 
             |  Endpoint A  |                  |  Endpoint B  | 
             |              |                  |              | 
             +--------------+                  +--------------+ 4: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   | 1   1   1   1   1 |  length   | 
   +---+---+---+---+---+---+---+---+ 
   |                               |        SigComp Discovery 
   :   state_identifier (n-bytes)  :  
   | 
                    |<------------------------------->|                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   Remaining SigComp Request          | 
                    |-------------------------------->| 
                    |                                 | 
                    |                                 | 
                    |     Capabilities Announcement   | 
                    |<--------------------------------| 
                    |                                 | 
                    |                                 | 
                    |           UDVM Upload           | 
                    |-------------------------------->| 
                    |                                 | message   : 
   |        Compressed Messages                               | 
                    |- - - -  - -  - - - - - - - - - >| 
   +---+---+---+---+---+---+---+---+ 
    
    Figure 3: Compressor-decompressor pair operation 
    
   The 4: Basic SigComp discovery mechanism itself message 
    
   The length field is outside a 3-bit value (MSBs before LSBs) that indicates 
   the scope length of this 
   specification. the state identifier. The following three paragraphs specify a actual size n of the state 
   identifier is calculated as follows: 
    
                   n  =  minimum_hash_size + length - 1 
    
   The state identifier is then extracted from the SigComp message flow for discovering and 
   then executed as defined by the capabilities STATE-EXECUTE instruction of Endpoint B and (if necessary) uploading a new 
   decompression algorithm Chapter 
   9. 
    
   If the length value is set to this endpoint. Note that if an 
   application-defined default algorithm is available at all endpoints 0 then Endpoint A can immediately begin to compress messages and no state is accessed; instead 
   the 
   following stages may be skipped. 
    
   Endpoint A may send a entire SigComp Request message is copied into the UDVM memory beginning 
   at Address 6, and then executed starting from Address 6. 
    
   All other addresses in the UDVM memory are initialized to Endpoint B. The 0. 
    
   Decompression failure occurs if the SigComp Request message is a request too short to initialize a SigComp layer 
   for 
   contain the application at Endpoint B and to know B's capabilities, i.e. expected state identifier, or if the requested state does 
   not exist. See Section 8.2 for further details. 

 
 
 
Price, Hannu, et al.                                           [Page 12] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
4.3.  Interfaces to and from the parameters in Section 3.2. If Endpoint A uses a UDVM 
   decompression algorithm which only requires compressor dispatcher 
    
   When the default application-
   defined parameters, then this step may application provides a message to be omitted. 
    
   Endpoint B SHOULD answer compressed, it MUST 
   also provide an "endpoint identity" that distinguishes the endpoint 
   from other endpoints. 
    
   The exact format of the endpoint identity is unimportant, provided 
   that distinct endpoints have distinct endpoint identities. 
    
   The SigComp Request message layer contains one compressor for each remote endpoint 
   with a 
   Capabilities Announcement message, which includes the SigComp 
   parameters that constrain local application is communicating; the operation of dispatcher 
   forwards each new application message to the UDVM. 
    
   Once Endpoint A has received appropriate compressor 
   (invoking a new compressor if a new endpoint identity is 
   encountered). 
    
   Note that the Capabilities Announcement message, application MUST indicate to the compressor dispatcher 
   when it chooses no longer wishes to communicate with a suitable compression algorithm particular endpoint, 
   so that B is able the resources taken by the corresponding compressor can be 
   reclaimed. 
    
4.4.  Interfaces to 
   decompress and sends a message containing from the UDVM decompression 
   algorithm (unless Endpoint B already has the algorithm available). 
                          
   At this point, Endpoint B contains enough information to start 
   decompressing messages received from decompressor dispatcher 
    
   To ensure that SigComp can run over an unsecure transport layer, the application at Endpoint A. 
    
4.1.2.  Bi-directional initialization 
    
   In scenarios where both endpoints decide to compress data in 
   decompressor dispatcher invokes a new decompressor for each of new 
   SigComp message. Resources for the directions, a double initialization process must be done prior to 
   start with decompressor are released as soon 
   as the normal operation. 
    
   The double initialization process message is comprised decompressed. 
    
   Upon the arrival of two a SigComp message the decompressor dispatcher 
   invokes an instance of the above 
   initialization processes, one in each direction, UDVM and loads it with the indicated state 
   as described in per Section 4.1.1. 4.2.  SigComp message format The basic SigComp message consists of a block of UDVM bytecode, the 
   first n bytes of which are interpreted as a state identifier that 
   accesses some previously stored state information. 
    
   This state information comprises is then decompressed by the decompression algorithm that 
   will be used UDVM, 
   returned to decompress the remainder of the SigComp message, as 
   well as any needed additional information (e.g. one or more 
   previously received messages if dynamic compression is in use). 
    
   A decompressor dispatcher MUST be able dispatcher, and passed on to separate two SigComp 
   messages; in the case of UDP a SigComp message corresponds exactly to 
   one UDP datagram. For TCP each 0xFFFF delimiter 
   receiving application. 
    
   Note that when the UDVM is followed invoked it does not receive any compressed 
   data by a default, but instead requests new 
   SigComp message. 
    
   The format of data explicitly using a 
   specific instruction. Therefore, the basic SigComp message dispatcher is given in Figure 4: 
    
   [Editors' Note: Specific SigComp messages such as the responsible for 
   buffering each SigComp 
   Request, the Capabilities Announcement message and passing the UDVM Upload may need data to be defined. A state identifier could be reserved for each specific 
   type of message, just as state identifiers are reserved for each 
   well-known algorithm.]  
    
    

 
 
 
Price, Hannu, et al.                                           [Page 13] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   state_identifier (n-bytes)  :  
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :    Remaining the UDVM bytecode    : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
    Figure 4: Basic SigComp message 
    
   Note that n when 
   it is requested. If the application-defined parameter minimum_hash_size, 
   an example value for which is given in Appendix B. 
    
   Note also UDVM requests additional compressed data that the state information 
   is loaded into the UDVM memory not yet available then it pauses and executed as defined waits until enough data has 
   been received by the dispatcher. 
    
   Uncompressed data is also outputted by the following piece of UDVM bytecode: 
    
   reserve state_identifier (n) 
   INPUT-BYTECODE (n, state_identifier, fail) 
   STATE-EXECUTE (state_identifier, n) 
   :fail 
   DECOMPRESSION-FAILURE 
    
   If using a specific 
   instruction. Note that the UDVM memory has no awareness of whether the 
   underlying transport is message-based or stream-based, and so it 
   always outputs uncompressed data as a stream. It is initialized containing the above bytecode then 
   responsibility of the state identifier will automatically be extracted from dispatcher to provide the SigComp uncompressed message and the corresponding state information will be accessed and 
   executed. 
    
4.3.  Interfaces 
   to and from the dispatcher 
    
   Once the remote party has initialized application in the expected form (i.e. as a stream or as a set 
   of distinct, bounded messages). 
    



 
 
 
Price, Hannu, et al.                                           [Page 13] 

INTERNET-DRAFT                  SigComp layer at the local 
   party, the decompressor dispatcher is ready to receive compressed 
   messages from                   March 1, 2002 
 
 
   For a particular remote party, decompress those messages, 
   and pass them onto the application. 
    
   The application provides stream-based transport, the compressor dispatcher with delimits messages to 
   be compressed. The encoder in the compressor compresses messages in 
   such a way that the remote decompressor with by 
   parsing the UDVM can decompress compressed data stream for instances of 0xFF and taking 
   the following actions: 
    
   Occurs in data correctly (providing that stream:     Meaning: 
    
   0xFF 00                    one 0xFF byte in the compressed data stream 
   0xFF 01                    same, but the next byte is not lost or 
   damaged during transport). 
    
   When a message is to quoted (could  
                              be compressed, the compressor selects another 0xFF) 
      :                                           : 
   0xFF 7F                    same, but the state next 127 bytes are quoted 
   0xFF 80 to use. 0xFF FE         reserved 
   0xFF FF                    message boundary 
    
   The identifier of reserved characters are useful for byte stuffing (if a 
   compression algorithm generates compressed data containing the used state MUST 
   character 0xFF then it should be sent along with replaced by the 
   compressed message character 0xFF00 to 
   avoid accidentally inserting a message delimiter into the remote decompressor. The compressed 
   data stream). 
    
    
5.  SigComp compressor  
 
   An important feature of SigComp message is 
   then passed that if two endpoints cannot agree 
   on a common algorithm with which to underlying layers send and receive data, it is 
   possible for transport to the remote 
   decompressor. 
    
   [Editor's Note: State identifiers will need compressor to be reserved upload bytecode for well-
   known decompression algorithms, and an additional state identifier 

 
 
 
Price, Hannu, et al.                                           [Page 14] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   will be needed its own choice of 
   algorithm to indicate that the algorithm decompressor. In particular this means that it is being uploaded as 
   part of 
   not necessary to force all compressors to use the compressed message.] 
    
   Upon same default 
   algorithm; instead each implementer has the arrival freedom to pick one of a SigComp message 
   the decompressor dispatcher 
   invokes predefined algorithms or to upload their own if needed. 
    
   The overall requirement placed on the decompressor compressor is that loads of 
   transparency, i.e. the UDVM with the indicated 
   state. The message is then decompressed by compressor MUST NOT send bytecode which cause 
   the UDVM, returned UDVM to the 
   decompressor dispatcher, and possibly passed incorrectly decompress a given message. 
    
   The following more specific requirements are also placed on to the receiving 
   application. 
    
   Note that when 
   compressor (they can be considered particular instances of the UDVM 
   transparency requirement): 
    
   *    It is invoked it does not receive any compressed 
   data by default, but instead requests new data explicitly using RECOMMENDED that the compressor supply a 
   specific instruction. Therefore, CRC over the dispatcher is responsible for 
   buffering each SigComp 
        uncompressed message and passing the data to the ensure that successful decompression has 
        occurred. A UDVM when 
   it instruction is requested. 
    
   Uncompressed data provided to verify this CRC. 
    
   *    If the transport is also outputted by message-based then the UDVM using a specific 
   instruction. Depending on compressor MUST 
        preserve the particular application, boundaries between messages. 
    
   *    If the dispatcher 
   decides whether to forward a partially decompressed message 
   immediately to transport is stream-based but the application, or to buffer and wait for a complete application defines its 
        own internal message to be successfully decompressed. 
    
   For a stream-based transport, boundaries, then the dispatcher delimits compressor SHOULD 
        preserve the boundaries between messages by 
   parsing using the compressed data stream for instances of 0xFF "end-of-
        message" character 0xFFFF reserved by SigComp. 
    

 
 
 
Price, Hannu, et al.                                           [Page 14] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   *    The compressor MUST NOT exceed the maximum_compressed_size and taking  
        MUST ensure that the following actions: 
    
   Occurs in data stream:     Action: 
    
   0xFFFF                     Delimit compressed message 
   0xFF00                     Replace with 0xFF 
   0xFF01 - 0xFFFE            Decompression failure can be decompressed using no more 
        than the resources available at the remote decompressor. 
    
   The reserved character 0xFF00 is useful reason for byte stuffing (if a 
   compression algorithm generates compressed data containing the 
   character 0xFF then it should be replaced by preserving the character 0xFF00 to 
   avoid accidentally inserting message boundaries over a stream-based 
   transport is that damage to one compressed message delimiter into does not affect 
   the compressed 
   data stream). 
    
5.  SigComp compressor  
 
   An important feature decompression of SigComp is that if two endpoints cannot agree subsequent messages. Moreover, the application 
   typically vetoes state creation requests on a common algorithm with which per-message basis. 
    
5.1.  Supplying bytecode to send and receive data, it is 
   possible for the UDVM 
    
   A compressor MUST be certain that compressed data can be decompressed 
   before the data is to upload bytecode be sent, i.e. the UDVM instructions for its own choice of 
   algorithm to 
   decompression MUST be available at the remote decompressor. In particular this means Several 
   options exist for ensuring that it this bytecode is 
   not necessary to force all compressors to use the same default 
   algorithm; instead each implementer has the freedom to pick one of available: 
    
   1. Each SigComp message sent from the predefined algorithms or to upload their own if needed. 
    
   The overall requirement placed on compressor contains the  
      necessary UDVM instructions for decompression. 
    
   2. By setting up a reliable connection, such as a TCP connection,  
      between a compressor is that of 
   transparency, i.e. and its remote decompressor the UDVM  
      instructions can be transferred and saved as state. 
    
   3. If there are predefined UDVM codes for well-known algorithms, a  
      compressor MUST NOT only needs to send bytecode which cause the state identifier of that UDVM  
      decompression algorithm code to incorrectly decompress a given message. 
    

 
 
 
Price, Hannu, et al.                                           [Page 15] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 its remote decompressor. The following more specific requirements are also placed on the 
   compressor (they  
      decompressor can be considered particular instances of the 
   transparency requirement): 
    
   *    It is RECOMMENDED that the compressor supply a CRC over then populate the 
        uncompressed message UDVM locally.  
    
   In order to ensure that successful decompression has 
        occurred. A save delay for "time-critical" sessions, the UDVM instruction is provided 
   instructions should be uploaded prior to verify any initiation of "time-
   critical" sessions. 
    
5.2.  Compression failure 
    
   The compressor SHOULD make every effort to successfully compress an 
   application message, but in certain cases this CRC. 
    
   *    If might not be possible 
   (particularly if a low maximum_compressed_size has been set by the transport 
   application). In this case a "compression failure" is message-based then the compressor MUST 
        preserve called.  
   Reasons for compression failure include the boundaries between messages. following: 
    
   *    If the transport is stream-based but the application defines its 
        own internal    A compressed or uncompressed message boundaries, then the compressor SHOULD 
        preserve exceeds the boundaries between messages maximum size 
        defined by using the "end-of-
        message" character 0xFFFF reserved by SigComp. application. 
    
   *    The compressor MUST achieve the minimum_compression_ratio and  
        MUST ensure that the message can be decompressed using no more 
        than the maximum_compressed_size is exceeded for a certain message. 
    
   *    Insufficient resources are available at the compressor or at the 
        remote decompressor. 
    
   The reason for preserving the message boundaries over 
    
   If a compression failure occurs when compressing a stream-based 
   transport is that damage to one compressed message does not affect 
   the decompression of subsequent messages. Moreover, then the application 
   typically vetoes state creation requests on a per-message basis. 
    
   Note that SigComp also reserves 
   compressor informs the character 0xFF00 over a stream-
   based transport, and replaces every instance of 0xFF00 with 0xFF 
   before decompressing the data. This ensures that arbitrary 
   compression algorithms can be used over a stream-based transport, 
   provided that every instance of 0xFF in the compressed data stream is 
   identified and replaced with 0xFF00. This "byte-stuffing" scheme 
   prevents the compression algorithm from inserting a message delimiter 
   into the data stream where one is not required. 
    
5.1.  Types of compression algorithm 
    
   Any of the following classes of compression algorithm may be useful 
   for particular applications: 
    
   *    Generic compressor (for example [DEFLATE] or a similar 
        algorithm). 
    
   *    Protocol-aware compressor offering excellent performance for  
        one particular type of data (for example the text messages  
        generated by [SIP]). 
    
   *    Hybrid compressor with similar performance to [DEFLATE] for 
        generic data dispatcher and superior performance for certain types of data. takes no further action. The 


 
 
 
Price, Hannu, et al.                                           [Page 16] 15] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   Provided that the uncompressed data can be reconstructed at 
 
 
   dispatcher MUST report this failure to the UDVM 
   using application. The 
   application may then try other methods to deliver the available memory message. 
    
    
6.  State handling and CPU cycles, implementers have freedom 
   to use a compression algorithm state announcement 
    
   This chapter defines the behavior of their choice. 
    
5.2.  Supplying bytecode to the UDVM 
    
   A compressor MUST be certain that compressed data can be decompressed 
   before SigComp state handler. The 
   function of the data state handler is to be sent, i.e. the UDVM instructions for 
   decompression MUST be available at retain information between 
   successive SigComp messages; it is the peer decompressor. Several 
   options exist for ensuring only SigComp entity that is 
   capable of this bytecode function, and so it is available: 
    
   1. Each SigComp message sent of particular importance from the compressor contains the  
      necessary UDVM instructions for decompression. 
    
   2. By setting up a reliable connection, such as 
   a TCP connection,  
      between a compressor security perspective. 
 
6.1.  Storing and its peer decompressor retrieving state 
    
   To provide security against the malicious insertion or modification 
   of SigComp messages, the UDVM  
      instructions can be transferred and saved as state. 
    
   3. If there are predefined UDVM codes for well-known algorithms, a  
      compressor only needs to send memory is reset after decompressing 
   each message. This ensures that damaged SigComp messages do not 
   prevent the state identifier successful decompression of subsequent valid messages. 
    
   Note however that UDVM  
      decompression algorithm code to its peer decompressor. The  
      decompressor the overall compression ratio is often 
   significantly higher if messages can then populate be compressed relative to the UDVM locally.  
    
   In order 
   information stored in previous messages. For this reason it is 
   possible to save delay create "state" information for "time-critical" sessions, access when a later 
   message is being decompressed. 
    
   Both the UDVM 
   instructions should be uploaded prior to any initiation creation and access of "time-
   critical" sessions. 
    
5.3.  Compression failure 
    
   The compressor SHOULD make every effort state are designed to successfully compress an 
   application message, but in certain cases this might not be possible 
   (particularly if secure 
   against malicious tampering with the compressed data. State can only 
   be created when a high minimum_compression_ratio complete message has been set by successfully 
   decompressed, and the 
   application). In this case state handler MUST NOT save state without 
   permission from the application. 
    
   Upon receiving a "compression failure" is called.  
    
   Reasons decompressed message, the application may supply the 
   state handler with the identity of the sending endpoint. Supplying 
   this identity grants permission for compression failure include the state handler to do the 
   following: 
    
   *    A compressed or uncompressed message exceeds the maximum size 
        defined by the application. 
    
   *    The minimum_compression_ratio cannot    An item of state can be achieved for a certain 
        message. 
    
   *    Insufficient resources are available at saved using the compressor or at memory reserved for the 
        remote decompressor. 
    
   If a compression failure occurs 
        specified endpoint. 
    
   *    Announcement information can be taken into account 
        when compressing a message then sending SigComp messages to the 
   compressor informs specified endpoint. 
    
   This is especially useful if the dispatcher and takes no further action. The 
   dispatcher application has an authentication 
   mechanism that can then report this failure be applied to determine whether the application. decompressed 
   data is legitimate. 
    
   Also note that state is not deleted when it is accessed. So even if a 
   malicious user manages to access state information, subsequent 
   messages compressed relative to this state can still be successfully 
   decompressed. Instead, the state handler is responsible for deleting 


 
 
 
Price, Hannu, et al.                                           [Page 17] 16] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
6.  State handling and capability announcement 
    
   This chapter defines 
 
 
   state information once it determines that the behavior state will no longer be 
   needed. 
    
   Each item of state stores the SigComp following information: 
    
   Name:                      Type of data: 
    
   state_identifier           16-byte value 
   state handler. start                2-byte value 
   state_instruction          2-byte value 
   state length               2-byte value 
   state_value                String of bytes 
    
   The 
   function state_identifier must be supplied to retrieve an item of state 
   from the state handler is to retain information between 
   successive SigComp messages; it handler. State can be accessed using the UDVM 
   instructions STATE-REFERENCE and STATE-EXECUTE, and can be created 
   using the END-MESSAGE instruction. 
    
   The state_value is a byte string that contains the only SigComp entity actual value that 
   is 
   capable copied from/to the UDVM memory. The state_length specifies the 
   number of this function, bytes contained within state_value, and so state_start gives 
   the UDVM memory address to which the state_value is copied when it is of particular importance from 
   a security perspective. 
 
6.1.  Storing and retrieving state 
    
   To provide security against 
   accessed. 
    
   Finally, state_instruction specifies the malicious insertion memory address of false 
   compressed data, the next 
   UDVM memory instruction to execute when state is reset after each compressed 
   message. This ensures that damaged compressed messages do not prevent 
   the successful decompression accessed. 
    
   The kind of subsequent valid messages. 
   Note however that information which is included in the overall compression ratio state_value is often 
   significantly higher if messages can be compressed relative up to 
   a particular compressor and the 
   information stored uploaded instructions in previous messages. For this reason it is 
   possible to create "state" information for access when the remote 
   UDVM. However a later 
   message compressor MUST NOT use a state that is being decompressed. 
    
   Both not known to 
   be established at the creation remote decompressor. 
    
6.2. Saving and access of deleting states 
 
   The state are designed handler for each endpoint is expected to be secure 
   against malicious tampering offer memory to 
   store UDVM-created state. Every remote endpoint that wishes to 
   communicate with the compressed data. State can only local endpoint expects to be created when able to store a complete message has been successfully 
   decompressed, and 
   fixed amount of state; the number of bytes that it can store is given 
   by the formula UDVM_memory_size * maximum_state_size. 
    
   Note that each item of state costs (state_length + 22) bytes to 
   store. 
    
   The state handler MUST veto keeps track of which endpoint created each item of 
   state; when a particular endpoint exceeds its allocated memory limit 
   then sufficient items of state creation 
   request if instructed created by the application based on the contents of the 
   decompressed message. This same endpoint are 
   deleted (oldest state first) until enough memory is especially useful if available to 
   accommodate the new state. 
    



 
 
 
Price, Hannu, et al.                                           [Page 17] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   The application 
   has an authentication mechanism that can be applied MUST indicate to determine 
   whether the decompressed data is legitimate. 
    
   Furthermore, a compressor can only access previously created state 
   information handler when it no longer 
   wishes to communicate with a particular endpoint, so that the 
   resources taken by providing an [MD5] hash of the corresponding state to can be accessed. reclaimed. 
     
6.3.  Announcement 
    
   The advantage of using a secure hash to access state announcement information is 
   that it is very difficult used to guess modify the correct hash value without 
   complete knowledge of the state being accessed. 
    
   Also note that state is not deleted when it is accessed. So even if a 
   malicious user manages to access state information, subsequent 
   messages compressed relative certain 
   application-defined parameters. Since these parameter values are 
   saved between SigComp messages, they are considered to this state can still be successfully 
   decompressed. Instead, part of the 
   overall state handler is responsible for deleting 
   state information once it determines that and hence are supplied from the state will no longer be 
   needed. 
    
   Each item of state stores UDVM to the following information: 
    
   Name:                      Type of data: 
    
   state_identifier           16-byte value 
   state start                2-byte value 
   state_instruction          2-byte value state length               2-byte value 
   state_value                String of bytes 

 
 
 
Price, Hannu, et al.                                           [Page 18] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
   handler. 
    
   The state_identifier must be supplied to retrieve an item following list of state 
   from parameters is passed to the state handler. State can be accessed handler using 
   the appropriate UDVM 
   instructions STATE-REFERENCE and STATE-EXECUTE, and can be created 
   using instruction (namely the END-MESSAGE instruction. 
    
   The state_value is 
   instruction): 
    
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |            length             |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |         UDVM_version          |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |       UDVM_memory_size        |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |        cycles_per_bit         |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |      cycles_per_message       |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |              n                | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |          id_length 1          | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :          id_value_1           : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
                    :         : 
                    :         : 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |          id_length n          | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :          id_value_n           : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :           reserved            : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
         Figure 5: Announcement information 
    

 
 
 
Price, Hannu, et al.                                           [Page 18] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   If the application does not return a byte string that contains valid endpoint identifier then 
   the actual value that announcement information is copied from/to automatically discarded by the UDVM memory. The state_length specifies state 
   handler. Otherwise it is passed to the 
   number compressor responsible for 
   sending messages to the given endpoint. 
    
   The reserved field allows for additional items of bytes contained within state_value, and state_start gives data to be added to 
   the UDVM memory address from/to which announcement information in future. 
    
   Note that the state_value is copied. 
    
   Finally, state_instruction length field specifies the memory address total length of the next 
   announcement information including the reserved field. As usual, MSBs 
   are stored preceding LSBs. 
    
   The remaining items of data are explained in greater detail below: 
    
6.3.1.  UDVM instruction to execute when state is accessed. version 
    
   The kind next 2 bytes of the announcement information which is included in specify whether only 
   the state_value basic version of the UDVM is up to 
   a particular compressor and available, or whether an upgraded 
   version of the uploaded UDVM is available offering additional instructions in 
   etc. 
    
   The basic version of the remote 
   UDVM. However a compressor MUST not use a state that UDVM is not known to 
   be established at the remote decompressor. 
    
6.2. Guidelines for saving and deleting states 
 
   [Editors' Note: Do we need something more?] 
    
   A decompressor SHOULD NOT delete a state before it is confident 
   enough that the state is not used by a peer compressor any more.  
      
6.3.  Capability announcement 
    
   The capability announcement information Version 0, which is used to modify the value 
   of certain application-defined parameters. Since these parameter 
   values are saved between SigComp messages, they are considered to version 
   described in this document. Upgraded versions MUST be 
   part of backwards-
   compatible with the overall state and hence are supplied from basic version in the following sense: 
    
   *    If some UDVM to bytecode reaches the 
   state handler.  
    
   If END-MESSAGE or DECOMPRESSION-
        FAILURE instructions when running on Version 0 of the state handler rejects a state creation request UDVM, then 
        the 
   accompanying capability announcements upgraded version MUST run the bytecode in an identical 
        manner. 
    
   This condition ensures that all bytecode that is valid for Version 0 
   of the UDVM will continue to be rejected also. 
    
   If valid for upgraded versions of the unidirectional version 
   UDVM. However, bytecode that is invalid on Version 0 of SigComp the UDVM 
   (i.e. bytecode that produces a decompression failure that is running then not 
   manually triggered) may become valid on upgraded versions. 
    
   The simplest way to upgrade the 
   capability announcement information UDVM in a backwards-compatible manner 
   is automatically discarded by to add additional UDVM instructions, as this will not affect the 
   state handler. 
   operation of existing UDVM bytecode. 
    
6.3.2.  Memory size and CPU cycles 
    
   The following block next 6 bytes of data specify new values for the application-
   defined parameters is passed UDVM_memory_size, cycles_per_bit and 
   cycles_per_message. 
    
   Note that this data can only be used to increase the state handler 
   using amount of 
   resources available at the appropriate UDVM instruction (currently remote UDVM. If the data specifies a 
   parameter value that is smaller than the value already possessed by 
   the state handler, the parameter keeps its original value (i.e. the END-MESSAGE 
   instruction): 
    
   [Editors' Note: The capability 
   announcement block data for this parameter is yet to be 
   finalized. More items may be added in future.] simply ignored). 

 
 
 
Price, Hannu, et al.                                           [Page 19] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |       Total length        |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |       UDVM_version        |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |    overall_memory_size    |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |      cycles_per_bit       |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |    cycles_per_message     |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      | Requested feedback length |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |                           | 
      :    Requested feedback     : 
      |                           |             
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      | Returned feedback length  |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |                           | 
      :     Returned feedback     : 
      |                           |             
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   Figure 5: Capability announcement block 
    
   Note that the three 2-byte length fields specify 
 
 
   In particular, only allowing the lengths of parameter values to increase means 
   that the 
   entire capability announcement block, the requested feedback data and 
   the returned feedback data respectively. As usual, MSBs are stored 
   preceding LSBs. 
    
   The remaining items of data are explained in greater detail below: 
    
6.3.1.  UDVM version mechanism is robust against message loss or 
   reordering. 
    
   The first 2 bytes of the capability announcement block specify 
   whether parameters can only be restored to their original values if reset 
   or renegotiated by the basic version application. 
    
6.3.3.  State identifiers 
    
   The list of state identifiers indicates that the UDVM is available, sending endpoint 
   supports one or whether 
   an upgraded version more optional mechanisms (including well-known 
   decompression algorithms, dictionaries of the UDVM is available offering additional 
   instructions etc. common SIP phrases, 
   feedback mechanisms etc.). 
    
   The basic version of integer n specifies the UDVM is Version 0, which is number of state identifiers to follow. 
   The field id_length_j specifies the version 
   described length in bytes of id_value_j, 
   where acceptable values for id_length_j range from 1 to 16 inclusive. 
   If a value outside this document. Upgraded versions MUST be backwards-
   compatible with range is received then the basic version in subsequent state 
   identifiers are ignored by the state handler. 
    
   Each id_value_j indicates support for one optional mechanism at the 
   sending endpoint. The optional mechanisms themselves, and their 
   corresponding state identifiers, are beyond the scope of this 
   document. 
    
    
7.  Overview of the following sense: 
    
   *    If some UDVM bytecode reaches 
    
   Decompression functionality for SigComp is provided by a "Universal 
   Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine 
   much like the END-MESSAGE or DECOMPRESSION-
        FAILURE instructions when running on Version 0 Java Virtual Machine but with a key difference: it is 
   designed solely for the purpose of running decompression algorithms. 
    
   The motivation for creating the UDVM, then UDVM is to provide unlimited 
   flexibility when choosing how to compress a given item of data. 
   Rather than picking one of a small number of pre-negotiated 
   compression algorithms, the upgraded version MUST run implementer has the bytecode in freedom to select an identical 
        manner. 
    
   This condition ensures that all bytecode 
   algorithm of their choice. The compressed data is then combined with 
   a set of UDVM instructions that allow the original data to be 
   extracted, and the result is valid outputted as UDVM bytecode. 
    
   Since the UDVM is optimized specifically for Version 0 running decompression 
   algorithms, the code size of a typical algorithm is small (often sub 
   100 bytes). Moreover the UDVM will continue approach does not add significant extra 
   processing or memory requirements compared to be valid for upgraded versions running a fixed pre-
   programmed decompression algorithm. 
    
   This chapter describes some basic features of the UDVM, including the 
   well-known variables and instruction operands. 
    


 
 
 
Price, Hannu, et al.                                           [Page 20] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   UDVM. However, bytecode 
 
 
   Recall that is invalid on Version 0 the amount of memory available to the UDVM 
   (i.e. bytecode that produces is specified 
   by the application-defined parameter UDVM_memory_size. Any attempt to 
   read memory addresses beyond the overall memory size MUST cause a 
   decompression failure that is not 
   manually triggered) may become valid on upgraded versions. (see Section 8.2). 
    
7.1.  Well-known variables 
    
   The simplest way to upgrade the UDVM first few variables in a backwards-compatible manner 
   is to add additional the UDVM instructions, as this will not affect memory have special tasks, for 
   example specifying the 
   operation location of existing UDVM bytecode. 
    
6.3.2.  Memory size the stack used by the CALL and CPU cycles 
    
   The next 6 bytes 
   RETURN instructions. Each of data specify new values for these well-known variables is a 2-byte 
   integer. 
    
   The following list gives the application-
   defined parameters overall_memory_size, cycles_per_bit name of each well-known variable and 
   cycles_per_message. 
    
   Note that this data the 
   memory address at which the variable can only be used to increase found: 
    
   Name:           Starting memory address: 
    
   byte_copy_left             0 
   byte_copy_right            2 
   stack_location             4 
    
   The MSBs of each variable are always stored before the amount LSBs. So, for 
   example, the MSBs of 
   resources available stack_location are stored at Address 4 whilst 
   the remote UDVM. If the data specifies a 
   parameter value that LSBs are stored at Address 5. 
    
   The use of each well-known variable is smaller than described in the value already possessed following 
   sections of the document. 
    
7.2.  Instruction operands 
    
   Each of the UDVM instructions is followed by 0 or more bytes 
   containing the state handler, operands required by the parameter keeps its original value (i.e. instruction. 
    
   To reduce the 
   capability announcement data code size of a typical UDVM program, each operand for this parameter is simply ignored). 
    
   In particular, only allowing the parameter values to increase means 
   that the announcement mechanism a 
   UDVM instruction is robust against message loss or 
   reordering. compressed using variable-length encoding. The parameters can only be restored 
   aim is to their original store more common operand values if reset 
   or renegotiated by using fewer bits than 
   rarely occurring values. 
    
   Three different types of operand are available: the application. 
    
6.3.3.  Requested feedback literal, the 
   reference and the multitype. The requested feedback data operand types that follow each UDVM 
   instruction are specified in Chapter 9. 
    
   The UDVM bytecode for each operand type is provided illustrated in Figure 7 to 
   Figure 9, together with the UDVM integer values represented by the remote 
   compressor. By providing this data, the remote compressor is 
   requesting 
   bytecode. 
    
   Note that the data be returned to the compressor via a reverse 
   channel (assuming that one is present). 
    
   The compressor MSBs in control of the reverse channel SHOULD return this 
   data by uploading it into the returned feedback data block at bytecode are illustrated as preceding the 
   remote UDVM. The data will then be passed back 
   LSBs. Also, any string of bits marked with k consecutive "n"s is to the remote 
   compressor 
   be interpreted as explained below. 
    
6.3.4.  Returned feedback 
    
   The returned feedback data is an item of feedback data that has 
   successfully returned integer N from the remote entity. This data is passed 0 to 2^k - 1 inclusive (with the local compressor (assuming that permission is granted by the 
   application), which can make use 
   MSBs of it n illustrated as it wishes. 
    
   Note that a compressor MUST only populate the returned feedback data 
   with preceding the bit-exact contents of a requested feedback data block 
   previously provided to it. LSBs). 
    

 
 
 
Price, Hannu, et al.                                           [Page 21] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
7.  Overview of the UDVM 
    
   Decompression functionality for SigComp is provided by a "Universal 
   Decompressor Virtual Machine" (UDVM). 
 
 
   The UDVM is a virtual machine 
   much like decoded integer value of the Java Virtual Machine but with a key difference: bytecode can be interpreted in two 
   ways. In some cases it is 
   designed solely for taken to be the purpose actual value of running decompression algorithms. 
    
   The motivation for creating the UDVM 
   operand. In other cases it is taken to provide unlimited 
   flexibility when choosing how to compress a given item of data. 
   Rather than picking one of be a small number of pre-negotiated 
   compression algorithms, memory address at which 
   the implementer has 2-byte operand value can be found (MSBs found at the freedom to select an 
   algorithm of their choice. specified 
   address, LSBs found at the following address). The compressed data latter case is 
   denoted by memory[X] where X is then combined with 
   a set of UDVM instructions that allow the original data to be 
   extracted, address and the result memory[X] is outputted as UDVM bytecode. 
    
   Since the UDVM 2-
   byte value starting at Address X. 
    
   The simplest operand type is optimized specifically for running decompression 
   algorithms, the code size of literal (#), which encodes a typical algorithm 
   constant integer from 0 to 65535 inclusive. A literal operand may 
   require between 1 and 3 bytes depending on its value. 
    
   Bytecode:                  Operand value:      Range: 
    
   0nnnnnnn                        N                   0 - 127 
   10nnnnnn nnnnnnnn               N                   0 - 16383 
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
    
               Figure 7: Bytecode for a literal (#) operand 
    
   The second operand type is small (often sub 
   100 bytes). Moreover the UDVM approach does not add significant extra 
   processing or memory requirements compared reference ($), which is always used to running 
   access a fixed pre-
   programmed decompression algorithm. 
    
   This chapter describes some basic features of the UDVM, including 2-byte value located elsewhere in the 
   memory allocation, well-known variables and instruction parameters. 
    
7.1. UDVM memory allocation memory. The memory available to the UDVM 
   bytecode for a reference operand is partitioned into decoded to be a number of 
   sections, providing space for program code, variables and 
   miscellaneous data: 
    
                  <----- working_memory_size ------> 
    
   | Fixed values | Variables | Miscellaneous data | Program code | 
   +--------------+-----------+--------------------+--------------+ 
    
   <--------------------- overall_memory_size --------------------> 
    
                  Figure 6: Memory allocation in the UDVM 
    
   Recall that constant integer 
   from 0 to 65535 inclusive, which is interpreted as the amount of memory available to address 
   containing the UDVM is specified 
   by actual value of the application-defined parameters overall_memory_size, 
   working_memory_start and working_memory_end. operand. 
    
   Note that all of these
   parameters are initialized by the application, but can be 
   renegotiated on the fly using the capabilities announcement 
   mechanism. 
    
   The memory area from Address (working_memory_start) to Address 
   (working_memory_end) inclusive reference operands can be used to store arbitrary data 
   (variables, program code, Huffman codes etc.). UDVM instructions are 
   allowed to read always take values from or write to any address in this memory area. 
    

 
 
 
Price, Hannu, et al.                                           [Page 22] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The first part of this memory area is typically used 0 to store a 
   number of 2-byte variables. UDVM instructions can reference these 
   variables using a special instruction parameter 65535 
   inclusive, as described in 
   Section 7.3. 
    
   The memory area from Address they reference 2-byte values. 
    
   Bytecode:                  Operand value:      Range: 
    
   0nnnnnnn                        memory[2 * N]       0 to Address (working_memory_start - 1) 
   and from Address (working_memory_end + 1) to Address 
   (overall_memory_size 65535 
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 1) inclusive 65535 
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
              Figure 8: Bytecode for a reference ($) operand 
    
   The third kind of operand is write-protected, so UDVM 
   instructions the multitype (%), which can read from this memory area but cannot write be used to it. 
   This 
   encode both actual values and memory area is intended addresses. The multitype operand 
   also offers efficient encoding for storing small integer values (both 
   positive and negative) and for powers of 2. 
    
   Bytecode:                  Operand value:      Range: 
    
   00nnnnnn                        N                   0 - 63 
   01nnnnnn                        memory[2 * N]       0 - 65535 
   1000011n                        2 ^ (N + 6)        64 , 128 
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768 
   111nnnnn                        N + 65504       65504 - 65535 
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535 
   101nnnnn nnnnnnnn               N                   0 - 8191 

 
 
 
Price, Hannu, et al.                                           [Page 22] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   110nnnnn nnnnnnnn               memory[N]           0 - 65535 
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
              Figure 9: Bytecode for a multitype (%) operand 
    
7.3.  Byte copying 
    
   A number of UDVM bytecode that can instructions require a string of bytes to be 
   compiled. 
    
   Any attempt copied 
   to read and from areas of the UDVM memory. This section defines how the 
   byte copying operation should be performed. 
    
   In general, the string of bytes is copied in ascending order of 
   memory addresses address. So if a byte is copied from/to Address n then the 
   next byte is copied from/to Address n + 1. As usual, if a byte is 
   read from an address beyond the overall memory size 
   or to write to memory addresses outside the working memory area MUST 
   cause a then 
   decompression failure (see Section 8.3). 
    
   The first part of the write-protected UDVM memory occurs. 
    
   Note however that if a byte is intended for 
   storing variables whose values no longer need to be modified. The 
   second part of copied from/to the write-protected memory is intended for storing 
   program code including UDVM instructions and their associated 
   parameters. Note that if an instruction references a variable that 
   has been write-protected, the compiled version of address 
   specified in byte_copy_right, the instruction 
   will typically run faster than if byte copy operation continues by 
   copying the referenced variable lies in next byte from/to the 
   working memory area. 
    
7.2.  Well-known variables 
    
   The first few variables address specified in 
   byte_copy_left. This is useful for setting up a "circular buffer" 
   within the UDVM memory have special tasks, for 
   example specifying memory. 
    
   Note that the location string of bytes is copied on a purely byte-by-byte 
   basis. In particular, some of the stack used later bytes to be copied may 
   themselves have been written into the UDVM memory by the CALL and 
   RETURN instructions. Each of these well-known variables byte copying 
   operation currently being performed. 
    
   Equally, it is possible for a 2-byte 
   integer. 
    
   The following list gives byte copying operation to overwrite the name of each well-known variable and 
   instruction that called the 
   memory address at which byte copy. If this occurs then the variable can byte 
   copying operation MUST be found: 
    
   Name:           Starting completed as if the original instruction 
   were still in place in the UDVM memory address: (this also applies if 
   byte_copy_left             0 or byte_copy_right            2 
   stack_location             4 
    
   The MSBs of each variable are always stored before overwritten). 
    
    
8.  Decompressing a SigComp message 
    
   This chapter lists the LSBs. So, for 
   example, steps involved in the MSBs decompression of stack_location are stored at Address 4 whilst a 
   single SigComp message. 
    
8.1.  Invoking the LSBs are stored at Address 5. 
    
   The use of each well-known variable is described in UDVM 
    
   Whenever the following 
   sections dispatcher receives a message to be decompressed, it 
   invokes a new instance of the document. UDVM. The UDVM_memory_size is 
   initialized using the corresponding application-defined parameter. 
   The following steps are then taken: 
    
   1.)   The number of remaining CPU cycles is set equal to the 
   application-defined parameter cycles_per_message. 
    

 
 
 
Price, Hannu, et al.                                           [Page 23] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
7.3.  Instruction parameters 
    
   Each 
 
 
   Notes: 
    
   The amount of compressed data available to the UDVM instructions is followed by 0 or more bytes 
   containing exactly one 
   compressed message. If the parameters required by transport is stream-based then SigComp 
   uses the instruction. 
    
   To reduce reserved byte string 0xFFFF to delimit the code size of compressed 
   messages: the dispatcher takes the data between a typical UDVM program, each parameter for pair of neighboring 
   reserved byte strings to be a UDVM instruction is single compressed using variable-length encoding. message. The 
   aim reserved 
   byte string itself is not considered to store more common parameter values using fewer bits than 
   rarely occurring values. 
    
   Three different types be part of parameter are available: the literal, the 
   reference and the multitype. The parameter types that follow each 
   UDVM instruction are specified in Chapter 9. compressed 
   message. 
    
   The UDVM bytecode for each parameter type compressed data is illustrated in Figure 7 not provided to Figure 9, together with the integer values represented UDVM by default. Instead, 
   the 
   bytecode. 
    
   Note that the MSBs in the bytecode are illustrated as preceding UDVM requests compressed data using the 
   LSBs. Also, any string of bits marked with k consecutive "n"s INPUT instructions 
   (useful when running over a stream-based transport since there is no 
   need to 
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the 
   MSBs of n illustrated as preceding wait for the LSBs). entire compressed message before decompression 
   can begin). 
    
   The decoded integer value dispatcher MUST NOT make more than one compressed message 
   available to a given instance of the bytecode can be interpreted in two 
   ways. UDVM. In some cases it particular, the 
   dispatcher MUST NOT concatenate two messages to form a single 
   compressed message. This is taken because compressed messages are typically 
   padded with trailing zero bits so that they are a whole number of 
   bytes long. Concatenating two messages would cause these padding bits 
   to be incorrectly interpreted as compressed data. 
    
   2.)   Next, the actual value of instructions contained within the 
   parameter. In other cases it is taken to be a UDVM memory address at which 
   the 2-byte parameter value can be found (MSBs found are 
   executed beginning at the address specified 
   address, LSBs found at by the following address). state as per 
   Section 4.2. 
    
   Notes:    
    
   The latter case instructions are executed consecutively unless otherwise 
   indicated (for example when the UDVM encounters a JUMP instruction). 
    
   If the next instruction to be executed lies outside the available 
   memory then decompression failure occurs (see Section 8.2). 
    
   3.)   Each time an instruction is 
   denoted executed the number of available 
   CPU cycles is decreased by memory[X] where X the amount specified in Chapter 9. 
   Additionally, if the UDVM requests n bits of compressed data (using 
   one of the INPUT instructions) then the number of available CPU 
   cycles is increased by n * cycles_per_bit. 
    
   Notes: 
    
   This means that the address and memory[X] total number of CPU cycles available for 
   processing a compressed message is given by the 2-
   byte value starting at Address X. formula: 
    
    maximum_cycles = cycles_per_message + message_size * cycles_per_bit 
    
   The simplest parameter type reason that this total is not allocated to the literal (#), which encodes a 
   constant integer from 0 to 65535 inclusive. A literal parameter may 
   require between 1 and 3 bytes depending on its value. 
    
   Bytecode:                  Parameter value:         Range: 
    
   0nnnnnnn                        N                   0 - 127 
   10nnnnnn nnnnnnnn               N                   0 - 16383 
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
    
              Figure 7: Bytecode for a literal (#) parameter 
    
   The second parameter type UDVM when it is the reference ($), which 
   invoked is always used 
   to access a 2-byte value located elsewhere in that the UDVM memory. The 
   bytecode for a reference parameter is decoded can begin to be decompress a constant 
   integer from 0 to 65535 inclusive, which is interpreted as the memory 
   address containing the actual value of the parameter. 
    
   Note message that reference parameters can always take values from 0 to 65535 
   inclusive, as they reference 2-byte values. has 

 
 
 
Price, Hannu, et al.                                           [Page 24] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   Bytecode:                  Parameter value:         Range: 
    
   0nnnnnnn                        memory[2 * N]       0 - 65535 
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535 
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
             Figure 8: Bytecode for a reference ($) parameter 
    
   The third kind of parameter 
 
 
   only been partially received. So the total message size may not be 
   known when the UDVM is initialized. 
    
   4.)   The UDVM stops executing instructions when it encounters an 
   END-MESSAGE instruction or if decompression failure occurs. 
    
   Notes: 
    
   The UDVM passes uncompressed data to the multitype (%), which dispatcher using the OUTPUT 
   instruction. The OUTPUT instruction can be used to encode both actual values output a partially 
   decompressed message; it is a dispatcher decision whether to use the 
   data immediately or whether to buffer and memory addresses. The multitype 
   parameter also offers efficient encoding for small integer values 
   (both positive and negative) and for powers of 2. 
    
   Bytecode:                  Parameter value:         Range: 
    
   00nnnnnn                        N                   0 - 63 
   01nnnnnn                        memory[2 * N]       0 - 65535 
   1000011n                        2 ^ (N + 6)        64 , 128 
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768 
   111nnnnn                        N + 65504       65504 - 65535 
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535 
   101nnnnn nnnnnnnn               N                   0 - 8191 
   110nnnnn nnnnnnnn               memory[N]           0 - 65535 
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
             Figure 9: Bytecode for a multitype (%) parameter 
    
7.4.  Byte copying 
    
   A number of wait until the entire 
   message has been decompressed. 
    
   The UDVM instructions require a string of bytes to be copied passes state creation requests to and from areas of the UDVM memory. state handler using 
   the END-MESSAGE instruction. This section defines how means that it is only possible to 
   make a state creation request once the 
   byte copying operation should be performed. 
    
   In general, message has been decompressed, 
   which is necessary since the string application typically determines the 
   validity of bytes is copied in ascending order these requests based on the contents of 
   memory address. So if the decompressed 
   message. 
    
8.2.  Decompression failure 
    
   If a byte compressed message given to the UDVM is copied from/to Address n corrupted (either 
   accidentally or maliciously) then the 
   next byte is copied from/to Address n + 1. As usual, if UDVM may terminate with a byte is 
   read from an address beyond 
   decompression failure. 
    
   Reasons for decompression failure include the overall memory size following: 
    
   *    A compressed or is written to 
   an address outside the working memory area then decompression failure 
   occurs. 
    
   Note however that if a byte is copied from/to the memory address 
   specified in byte_copy_right, uncompressed message exceeds the byte copy operation continues maximum size 
        defined by 
   copying the next byte from/to application. 
    
   *    The UDVM exceeds the memory address specified in 
   byte_copy_left. This is useful available CPU cycles for setting up decompressing a "circular buffer" 
   within the 
        message. 
    
   *    The UDVM memory. 

   Note that the string of bytes is copied on attempts to read a purely byte-by-byte 
   basis. In particular, some of memory address beyond the later bytes to overall 
        memory size. 
    
   *    An unknown instruction type is encountered. 
    
   *    An unknown operand type is encountered. 
    
   *    An instruction is encountered that cannot be copied may 
   themselves have been written into processed 
        successfully by the UDVM memory by (for example a RETURN instruction when 
        no CALL instruction has previously been encountered). 
    
   *    The UDVM attempts to access non-existent state. 
    
   *    A manual decompression failure is triggered using the byte copying 
   operation currently being performed. 
        DECOMPRESSION-FAILURE instruction. 
    

 
 
 
Price, Hannu, et al.                                           [Page 25] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
    
   Equally, it is possible for a byte copying operation to overwrite the 
   instruction that called the byte copy. 
 
 
   If this a decompression failure occurs when decompressing a message then 
   the byte 
   copying operation MUST be completed as if the original instruction 
   were still in place in the UDVM memory (this also applies if 
   byte_copy_left or byte_copy_right are overwritten). 
    
8.  Decompressing a SigComp message 
    
   This chapter lists informs the steps involved in dispatcher and takes no further action. It is 
   the decompression responsibility of a 
   single SigComp message. 
    
8.1.  Invoking the UDVM 
    
   Whenever the dispatcher receives a message to be decompressed, it 
   invokes decide how to cope with the 
   decompression failure. In general a new instance of dispatcher SHOULD discard the UDVM. The overall_memory_size 
   compressed message and 
   initial contents of the any decompressed data that has been outputted. 
    
9.  UDVM memory are initialized using the 
   corresponding application-defined parameters. The following steps are 
   then taken: 
    
   1.)   The number of remaining CPU cycles is instruction set equal to the 
   application-defined parameter cycles_per_message. 
    
   Notes: 
    
   The amount of compressed data available to the UDVM is exactly one 
   compressed message. If the transport is stream-based then SigComp 
   uses the reserved byte string 0xFFFF currently understands 30 instructions, chosen to delimit the compressed 
   messages: the dispatcher takes support the data between a pair of neighboring 
   reserved byte strings to be a single compressed message. The reserved 
   byte string itself is not considered to be part 
   widest possible range of compression algorithms with the compressed 
   message. 
    
   The compressed data is not provided to the UDVM by default. Instead, 
   the UDVM requests compressed data using minimum 
   possible overhead. 
    
   Figure 10 lists the INPUT different instructions 
   (useful when running over a stream-based transport since there is no 
   need to wait for the entire compressed message before decompression 
   can begin). Note that in particular, this means that the application 
   MUST define the initial contents of and the UDVM memory bytecode values 
   used to contain at 
   least one INPUT instruction. See Section 4.2 for an example of how store the application might initialize instructions at the UDVM memory. UDVM. The dispatcher MUST NOT make more than one compressed message 
   available to a given instance cost of the UDVM. In particular, the 
   dispatcher MUST NOT concatenate two messages to form a single 
   compressed message. This each 
   instruction in CPU cycles is because compressed messages are typically 
   padded with trailing zero bits so that they are a whole number of 
   bytes long. Concatenating two messages would cause these padding bits 
   to be incorrectly interpreted as compressed data. also given: 
    
   Instruction:     Bytecode value:   Cost in CPU cycles: 
    
   DECOMPRESSION-FAILURE     0          1 
   AND                       1          1 
   OR                        2          1 
   NOT                       3          1 
   ADD                       4          1 
   SUBTRACT                  5          1 
   MULTIPLY                  6          1 
   DIVIDE                    7          1 
   SORT-ASCENDING            8          1 + k * ceiling(log2(k)) 
   SORT-DESCENDING           9          1 + k * ceiling(log2(k)) 
   MD5                       10         1 + length 
   LOAD                      11         1 
   MULTILOAD                 12         1 + n 
   COPY                      13         1 + length 
   COPY-LITERAL              14         1 + length 
   COPY-OFFSET               15         1 + length + offset 
   JUMP                      16         1 
   COMPARE                   17         1 
   CALL                      18         1 
   RETURN                    19         1 
   SWITCH                    20         1 + n 
   CRC                       21         1 + length 
   END-MESSAGE               22         1 + state length 
   OUTPUT                    23         1 + output_length 
   NBO                       24         1 
   INPUT-BYTECODE            25         1 + length 
   INPUT-FIXED               26         1 
   INPUT-HUFFMAN             27         1 + n 
   STATE-REFERENCE           28         1 + state_length 
   STATE-EXECUTE             29         1 + state length 
    
      Figure 10: UDVM instructions and corresponding bytecode values 
    

 
 
 
Price, Hannu, et al.                                           [Page 26] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   2.)   Next, the 
 
 
   Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high-
   level instructions contained within may cost additional cycles depending on the UDVM memory are 
   executed beginning at value 
   of one of the address specified in first_instruction. 
    
   Notes: instruction operands. 
    
   The instructions are executed consecutively unless otherwise 
   indicated (for example only exception when calculating the UDVM encounters a JUMP instruction). 
    
   If number of CPU cycles is that 
   the next STATE-EXECUTE instruction to be executed lies outside takes (1 + state_length) cycles even 
   though it does not have a state_length operand; instead the available 
   memory then decompression failure occurs (see Section 8.3). 
    
   3.)   Each time an instruction value of 
   state length is executed provided by the number state handler as part of available 
   CPU cycles is decreased the state 
   being accessed. 
    
   All instructions are stored as a single byte to indicate the 
   instruction type, followed by 0 or more bytes containing the amount specified in Chapter 9. 
   Additionally, if operands 
   required by the UDVM requests n bits of compressed data (using 
   one instruction. The instruction specifies which of the INPUT instructions) then the number 
   three operand types of available CPU 
   cycles Section 7.2 is increased used in each case. For example, 
   the ADD instruction is followed by n * cycles_per_bit. 
    
   Notes: 
    
   This means that two operands as shown below: 
    
   ADD ($operand_1, %operand_2) 
    
   When converted into bytecode the total number of CPU cycles available for 
   processing a compressed message is given bytes required by the formula: 
    
     total_cycles = cycles_per_message + message_size * cycles_per_bit 
    
   The reason that this total is not allocated to ADD 
   instruction depends on the UDVM when it is 
   invoked is that size of each operand value, and whether 
   the UDVM can begin to decompress second (multitype) operand contains the operand value itself or a message that has 
   only been partially received. So 
   memory address where the total message size may not actual value of the operand can be 
   known when found. 
    
   The instruction set available for the UDVM is initialized. 
    
   4.) offers a mix of low-level 
   and high-level instructions. The UDVM stops executing high-level instructions when can all be 
   emulated using the low-level instructions provided, but given a 
   choice it encounters an 
   END-MESSAGE instruction or if decompression failure occurs. 
    
   Notes: 
    
   The UDVM passes uncompressed data is generally preferable to the dispatcher using the OUTPUT 
   instruction. The OUTPUT use a single instruction can rather 
   than a large number of general-purpose instructions. The resulting 
   bytecode will be used more compact (leading to output a partially 
   decompressed message; it is a dispatcher decision whether to use the 
   data immediately or whether to buffer higher overall 
   compression ratio) and wait until decompression will typically be faster because 
   the entire 
   message has been decompressed. 
    
   The UDVM passes state creation requests to implementation of the state handler using compression-specific instructions can be 
   optimized for the END-MESSAGE instruction. This means that it UDVM. 
    
   Each instruction is only possible to 
   make explained in more detail below: 
    
9.1.  Mathematical instructions 
    
   The following instructions provide a state creation request once the message has been decompressed, 
   which is necessary since the application typically determines the 
   validity number of these requests based mathematical 
   operations including bit manipulation, arithmetic and sorting. 
    
9.1.1.  Bit manipulation 
    
   The AND, OR and NOT instructions provide simple bit manipulation on 
   2-byte words. 
    
   AND ($operand_1, %operand_2) 
   OR ($operand_1, %operand_2) 
   NOT ($operand_1) 
    
   After the contents of operation is complete, the decompressed 
   message. 
    
8.2.  Successful decompression 
    
   The END-MESSAGE instruction indicates that value of the compressed message has 
   been successfully decompressed and passed to first operand is 
   overwritten with the dispatcher. result. Note that since this operand is a 

 
 
 
Price, Hannu, et al.                                           [Page 27] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   that 
 
 
   reference, the actual uncompressed message is outputted beforehand using memory address specified by the OUTPUT instruction; this allows operand is always 
   overwritten and not the UDVM to output each part of operand itself. 
    
9.1.2.  Arithmetic 
    
   The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform 
   arithmetic on 2-byte words. 
    
   ADD ($operand_1, %operand_2) 
   SUBTRACT ($operand_1, %operand_2) 
   MULTIPLY ($operand_1, %operand_2) 
   DIVIDE ($operand_1, %operand_2) 
    
   After the message to operation is complete, the dispatcher as soon as it has been decompressed. 
    
   The END-MESSAGE instruction provides two additional pieces of 
   information to the state handler: first operand is overwritten 
   with the state creation request and result. 
    
   Note that in all cases the 
   capability announcement block. The state creation request mechanism arithmetic operation is discussed below: 
    
   The UDVM may optionally save part of its memory performed modulo 
   2^16. So for retrieval by 
   later messages. However to prevent malicious storage of a large 
   amount of unnecessary state information, the application itself MUST 
   give permission before any state can be created. The state handler 
   typically makes a decision on whether state can be created based on 
   the contents of example, subtracting 1 from 0 gives the decompressed message, particularly if result 65535. 
    
   For the message 
   contains authentication data that can verify whether or not SUBTRACT instruction the 
   sender second operand is legitimate. 
    
   The END-MESSAGE instruction requests subtracted from 
   the creation of state using first. Similarly, for the 
   parameters state start and state length, which together denote a byte 
   string state_value. Provided that DIVIDE instruction the application gives permission, 
   state_value first operand is byte copied from the UDVM memory (obeying 
   divided by the rules of 
   Section 7.4) and stored together with a 16-byte state identifier second operand. Note that 
   can be used to access if the state by a later compressed message. 
    
   To provide security against malicious access, second operand does 
   not divide exactly into the identifier for any 
   item of state created by first operand then the UDVM remainder is derived from 
   ignored. 
    
9.1.3.  Sorting 
    
   The SORT-ASCENDING and SORT-DESCENDING instructions sort lists of 2-
   byte words. 
    
   SORT-ASCENDING (%start, %n, %k) 
   SORT-DESCENDING (%start, %n, %k) 
    
   The start operand specifies the [MD5] hash starting memory address of the state_value block 
   of data to be stored. sorted. 
   The state identifier block of data itself is constructed by 
   taking the 16-byte [MD5] hash and replacing all but the first 
   hash_length most significant bytes with zeroes. Note that if 
   hash_length is 16 then the unmodified [MD5] hash is the state 
   identifier. Decompression failure occurs if hash_length is less than 
   the application-defined parameter minimum_hash_size or greater than 
   16. 
    
   If divided into n lists each containing k 
   words. The SORT-ASCENDING instruction applies a state identifier already exists (hash collision occurs), the 
   decompressor should check whether the requested state is identical certain permutation 
   to the established state, and count lists, such that the state creation request first list is sorted into ascending order 
   (treating each data word as 
   successful if this an integer). The same permutation is 
   applied to all n lists, so lists other than the case. 
    
   If first will not then the state creation request is unsuccessful. The existing 
   state MUST NOT 
   necessarily be replaced with sorted into order. 
    
   For example, the requested state first list might contain a set of integers to be saved. This 
   is to avoid 
   sorted, whilst the situation where a compressed message cannot second list might be 
   decompressed because a needed item of state has been replaced 
   (possibly by a malicious sender). 
    
   Each item used to keep track of state stores the following information (accessed by the 
   state_identifier): 
   integers: 
    
      Before sorting              After sorting 
    
   List 1        List 2        List 1        List 2 
    
      8             1             1             2 

 
 
 
Price, Hannu, et al.                                           [Page 28] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   Name:                      Type 
 
 
      1             2             1             3 
      1             3             3             4 
      3             4             8             1 
    
   In the case of data: 
    
   state_identifier           16-byte value 
   state start                2-byte value 
   state_instruction          2-byte value 
   state length               2-byte value 
   state_value                String two words of bytes  
    
   Note that state_start, state_length and state_instruction are all 
   parameters from data with the END-MESSAGE instruction, whereas state_identifier 
   and state_value are created as specified above. 
    
   This state can subsequently be accessed by using same value, the STATE-REFERENCE 
   and STATE-EXECUTE instructions (by providing original 
   ordering of the correct state 
   identifier). 
    
8.3.  Decompression failure 
    
   If a compressed message given to list is preserved. 
    
   The SORT-DESCENDING instruction behaves as above, except that the UDVM 
   first list is corrupted (either 
   accidentally or maliciously) then sorted into descending order. 
    
9.1.4.  MD5 
    
   The MD5 instruction calculates an MD5 hash over the specified area of 
   UDVM may terminate with a 
   decompression failure. 
    
   Reasons for decompression failure include the following: 
    
   *    A compressed or uncompressed message exceeds memory. 
    
   MD5 (%position, %length, %destination) 
    
   The position and length operands define the maximum size 
        defined by string of bytes over 
   which the application. 
    
   * MD5 hash is calculated. Byte copying rules are enforced as 
   per Section 7.3. 
    
   The UDVM exceeds destination operand gives the available CPU cycles for decompressing a 
        message. 
    
   * starting address to which the 
   resulting 16-byte hash will be copied. 
    
9.2.  Memory management instructions 
    
   The UDVM attempts following instructions are used to read a memory address beyond manipulate the overall UDVM memory. 
   Bytes can be copied from one area of memory size, or to write into a memory address outside the 
        working another, and areas of 
   memory area. 
    
   *    An unknown can be write-protected to make it easier for UDVM code to be 
   compiled. 
    
9.2.1.  LOAD 
    
   The LOAD instruction type is encountered. 
    
   *    An unknown parameter type is encountered. 
    
   *    An sets a 2-byte variable to a certain specified 
   value. The format of a LOAD instruction is encountered that cannot as follows: 
    
   LOAD (%address, %value) 
    
   The first operand specifies the starting address of the 2-byte 
   variable, whilst the second operand specifies the value to be processed 
        successfully by loaded 
   into this variable. As usual, MSBs are stored before LSBs in the UDVM (for example a RETURN 
   memory. 
    
9.2.2.  MULTILOAD 
    
   The MULTILOAD instruction when 
        no CALL instruction has previously been encountered). 
    
   *    The UDVM attempts to access non-existent state. 
    
   *    A manual decompression failure is triggered using the 
        DECOMPRESSION-FAILURE instruction. 
    
   If a decompression failure occurs when decompressing sets a message then 
   the UDVM informs the dispatcher and takes no further action. It is contiguous block of 2-byte variables 
   to specified values. 
    
   MULTILOAD (%address, #n, %value_0, ..., %value_n-1) 
   The first operand specifies the responsibility starting address of the dispatcher to decide how to cope with contiguous 
   variables, whilst the operands value_0 through to value_n-1 specify 

 
 
 
Price, Hannu, et al.                                           [Page 29] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   decompression failure. In general a dispatcher SHOULD discard 
 
 
   the 
   compressed message and any decompressed data that has been outputted. 
    
9.  UDVM instruction set values to load into these variables (in the same order as they 
   appear in the instruction). 
    
9.2.3.  COPY 
    
   The COPY instruction is used to copy a string of bytes from one part 
   of the UDVM currently understands 28 instructions, chosen memory to support another. 
    
   COPY (%position, %length, %destination) 
    
   The position operand specifies the 
   widest possible range memory address of compression algorithms with the minimum 
   possible overhead. 
    
   Figure 10 lists first byte 
   in the different instructions string to be copied, and the bytecode values 
   used length operand specifies the 
   number of bytes to store be copied. 
    
   The destination operand gives the instructions at address to which the UDVM. The cost first byte in 
   the string will be copied. 
    
   Note that byte copying is performed as per the rules of each Section 7.3. 
    
9.2.4.  COPY-LITERAL 
    
   A modified version of the COPY instruction in CPU cycles is also given: 
    
   Instruction:     Bytecode value:   Cost in CPU cycles: 
    
   DECOMPRESSION-FAILURE     0          1 
   AND                       1          1 
   OR                        2          1 
   NOT                       3          1 
   ADD                       4          1 
   SUBTRACT                  5          1 
   MULTIPLY                  6          1 
   DIVIDE                    7          1 
   LOAD                      8          1 
   MULTILOAD                 9          1 + n 
   WORKING-MEMORY            10         1 
   COPY                      11         1 + length given below: 
    
   COPY-LITERAL (%position, %length, $destination) 
    
   The COPY-LITERAL              12         1 + length 
   COPY-OFFSET               13         1 + length + offset 
   JUMP                      14         1 
   COMPARE                   15         1 
   CALL                      16         1 
   RETURN                    17         1 
   SWITCH                    18         1 + n 
   CRC                       19         1 + length 
   END-MESSAGE               20         1 + state length 
   OUTPUT                    21         1 + output_length 
   NBO                       22         1 
   INPUT-BYTECODE            23         1 + length 
   INPUT-FIXED               24         1 
   INPUT-HUFFMAN             25         1 + n 
   STATE-REFERENCE           26         1 + state_length 
   STATE-EXECUTE             27         1 + state length 
    
      Figure 10: UDVM instructions and corresponding bytecode values 
    
   Each UDVM instruction costs behaves as a minimum of 1 CPU cycle. Certain high-
   level instructions may cost additional cycles depending on the value 
   of one of the COPY instruction parameters. 
    


 
 
 
Price, Hannu, et al.                                           [Page 30] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The only exception when calculating except 
   that after copying, the number of CPU cycles destination operand is that replaced with the STATE-EXECUTE instruction takes (1 + state_length) cycles even 
   though it does not have a state_length parameter; instead 
   memory address immediately following the value 
   of state length is provided by address to which the state handler as part of final 
   byte was copied. If the state 
   being accessed. 
    
   All instructions are stored as a single final byte was copied to indicate the 
   instruction type, followed by 0 or more bytes containing memory address 
   specified in byte_copy_right, the 
   parameters required by destination operand is set to the instruction. The instruction specifies 
   which 
   memory address specified in byte_copy_left. 
    
9.2.5.  COPY-OFFSET 
    
   A further version of the three parameter types of Section 7.3 is used in each 
   case. For example, the ADD COPY-LITERAL instruction is followed by two parameters 
   as shown given below: 
    
   ADD ($parameter_1, %parameter_2) 
    
   When converted into bytecode the number of bytes required by the ADD 
    
   COPY-OFFSET (%offset, %length, $destination) 
    
   The COPY-OFFSET instruction depends on the size behaves as a COPY-LITERAL instruction 
   except that an offset operand is given instead of each parameter value, and whether 
   the second (multitype) parameter contains the parameter value itself 
   or a position operand. 
    
   To derive a suitable position operand, starting at the memory address where 
   specified by destination, the actual value UDVM counts backwards a total of offset 
   memory addresses. If the parameter can memory address specified in byte_copy_left 
   is reached, the next memory address is taken to be 
   found. byte_copy_right. 
    
   The COPY-OFFSET instruction set available for the UDVM offers then behaves as a mix of low-level 
   and high-level instructions. The high-level instructions can all COPY-LITERAL 
   instruction, taking the position operand to be 
   emulated using the low-level last memory 
   address reached in the above step. 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 30] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
9.3.  Program flow instructions provided, but given a 
   choice it is generally preferable to use a single 
    
   The following instructions alter the flow of UDVM code. Each 
   instruction rather 
   than jumps to one of a large number of general-purpose instructions. The resulting 
   bytecode will be more compact (leading to memory addresses based on a higher overall 
   compression ratio) and decompression will typically be faster because 
   the implementation 
   certain specified criterion. Note that all of the compression-specific instructions can be 
   optimized for give 
   the UDVM. 
    
   Each instruction is explained memory addresses in more detail below: 
    
9.1.  Bit manipulation instructions 
    
   The AND, OR and NOT instructions provide simple bit manipulation on 
   2-byte words. 
    
   AND ($parameter_1, %parameter_2) 
   OR ($parameter_1, %parameter_2) 
   NOT ($parameter_1) 
    
   After the operation is complete, form of deltas relative to the value memory 
   address of the first parameter instruction. The actual memory address is 
   overwritten with calculated 
   as follows: 
    
   memory_address = (memory_address_of_instruction + delta) modulo 2^16 
    
   Note that certain I/O instructions (see Section 9.4) can also alter 
   program flow. 
    
9.3.1.  JUMP 
    
   The JUMP instruction moves program execution to the result. specified memory 
   address. 
    
   JUMP (%delta) 
    
   Note that since this parameter is if the address (specified as a 
   reference, delta from the address of 
   the JUMP instruction) lies beyond the overall UDVM memory size then 
   decompression failure occurs. 
    
9.3.2.  COMPARE 
    
   The COMPARE instruction compares two operands and then jumps to one 
   of three specified memory addresses depending on the result. 
    
   COMPARE (%operand_1, %operand_2, %delta_1, %delta_2, %delta_3) 
    
   If operand_1 < operand_2 then the UDVM continues instruction 
   execution at the (relative) memory address specified by delta 1. If 
   operand_1 = operand_2 then it jumps to the parameter is always 
   overwritten and not address specified by 
   delta_2. If operand_1 > operand_2 then it jumps to the parameter itself. 
    
9.2.  Arithmetic address 
   specified by delta_3. 
    
9.3.3.  CALL and RETURN 
    
   The CALL and RETURN instructions provide support for compression 
   algorithms with a nested structure. 
    
   CALL (%delta) 
    
   RETURN 
    
   The ADD, SUBTRACT, MULTIPLY CALL and DIVIDE RETURN instructions perform 
   arithmetic on make use of a stack of 2-byte words. 
   variables stored at the memory address specified by the well-known 
   variable stack_location. The stack contains the following variables: 
    

 
 
 
Price, Hannu, et al.                                           [Page 31] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
    
   ADD ($parameter_1, %parameter_2) 
   SUBTRACT ($parameter_1, %parameter_2) 
   MULTIPLY ($parameter_1, %parameter_2) 
   DIVIDE ($parameter_1, %parameter_2) 
    
   After 
 
 
   Name:           Starting memory address: 
    
   stack_free            stack_location 
   stack[0]              stack_location + 2 
   stack[1]              stack_location + 4 
   stack[2]              stack_location + 6 
      :                       : 
    
   The MSBs of these variables are stored before the operation is complete, the first parameter is overwritten 
   with the result. 
    
   Note that LSBs in all cases the arithmetic operation is performed modulo 
   2^16. So for example, subtracting 1 from 0 gives UDVM 
   memory. 
    
   When the result 65535. 
    
   For UDVM reaches a CALL instruction, it finds the SUBTRACT instruction memory address 
   of the second parameter is subtracted from instruction immediately following the first. Similarly, CALL instruction and 
   copies this 2-byte value into stack[stack_free] ready for the DIVIDE later 
   retrieval. It then increases stack_free by 1 and continues 
   instruction execution at the first parameter 
   is divided (relative) memory address specified by 
   the second parameter. Note that if the second parameter 
   does not divide exactly into operand. 
    
   When the first parameter UDVM reaches a RETURN instruction it decreases stack_free by 
   1, and then continues instruction execution at the remainder 
   is ignored. 
    
9.3.  Memory management instructions 
    
   The following instructions are used to manipulate byte position 
   stored in stack[stack_free]. 
    
   If the UDVM memory. 
   Bytes can be copied from variable stack_free is ever increased beyond 65535 or 
   decreased below 0 then a bad compressed message has been received and 
   decompression failure occurs (see Section 8.2). 
    
   Decompression failure also occurs if one area of memory to another, the above instructions is 
   encountered and areas the value of 
   memory can be write-protected to make it easier for UDVM code to be 
   compiled. 
    
9.3.1.  LOAD stack_location is smaller than 6 (this 
   prevents the stack from overwriting the well-known variables). 
    
9.3.4.  SWITCH 
    
   The LOAD SWITCH instruction sets a 2-byte variable to performs a certain specified 
   value. The format conditional jump based on the value 
   of one of its operands. 
    
   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) 
    
   When a LOAD SWITCH instruction is as follows: 
    
   LOAD (%address, %value) 
    
   The first parameter specifies encountered the starting address of UDVM reads the 2-byte 
   variable, whilst value of 
   j. It then continues instruction execution at the second parameter (relative) address 
   specified by delta j. 
    
   If j specifies the a value to be 
   loaded into this variable. As usual, MSBs are stored before LSBs in 
   the UDVM memory. 
    
9.3.2.  MULTILOAD of n or more, a bad compressed message has 
   been received and decompression failure occurs. 
    
9.3.5.  CRC 
    
   The MULTILOAD CRC instruction sets verifies a contiguous block string of bytes using a 2-byte variables 
   to specified values. 
    
   MULTILOAD (%address, #n, %value_0, ..., %value_n-1) 
    
   The first parameter specifies the starting address of the contiguous 
   variables, whilst the parameters value_0 through to value_n-1 specify 
   the values to load into these variables (in the same order as they 
   appear in the instruction). CRC. 
    
   CRC (%value, %position, %length, %delta) 
    


 
 
 
Price, Hannu, et al.                                           [Page 32] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
9.3.3.  WORKING-MEMORY 
 
 
   The WORKING-MEMORY instruction actual CRC calculation is used to prevent part of performed using the UDVM 
   memory from being modified. This can be very useful when offering 
   UDVM code for compilation. 
    
   WORKING-MEMORY (%memory_start, %memory_end) generator 
   polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte 
   Frame Check Sequence (FCS) of [RFC-1662]. 
    
   The parameters memory_start position and memory_end specify the new working 
   memory area for the UDVM. These parameters replace length operands define the application-
   defined parameters working_memory_start and working_memory_end, but 
   only while string of bytes over 
   which the current message CRC is being decompressed. When evaluated. Byte copying rules are enforced as per 
   Section 7.3. 
    
   Important note: Since a new 
   instance of the UDVM CRC calculation is invoked the working memory area always performed over a 
   bitstream, for interoperability it is set by necessary to define the 
   original application-defined parameters. order 
   in which bits are supplied within each individual byte. In this case 
   the MSBs of the byte MUST be supplied to the CRC calculation before 
   the LSBs. 
    
   The value operand contains the expected integer value of the 2-byte 
   CRC. If memory_end < memory_start, or if the parameters reference a memory 
   address beyond calculated CRC matches the overall UDVM memory size, expected value then decompression 
   failure occurs. 
    
   After the WORKING-MEMORY instruction has been encountered, UDVM 
   continues at the following instruction. Otherwise the only 
   way to write into UDVM jumps to 
   the (relative) memory within address specified by delta. 
    
9.4.  I/O instructions 
    
   The following instructions allow the protected region is UDVM to interface with its 
   environment. Note that in the overall SigComp architecture all of 
   these interfaces pass to 
   cancel the protection using another WORKING-MEMORY instruction (or decompressor dispatcher or to 
   invoke a new instance of the UDVM). 
    
9.3.4.  COPY state 
   handler. 
    
9.4.1.  END-MESSAGE 
    
   The COPY END-MESSAGE instruction is used to copy a string of bytes from one part 
   of successfully terminates the UDVM memory and 
   passes state information to another. 
    
   COPY (%position, %length, %destination) 
    
   The position parameter specifies the memory address of the first byte 
   in state handler. 
    
   END-MESSAGE (%hash_length, %state_start, %state_length, 
   %state_instruction, %announcement_location) 
    
   Note that the string to be copied, and actual uncompressed message is outputted separately 
   using the length parameter specifies OUTPUT instruction; this conserves memory at the 
   number of bytes UDVM 
   because there is no need to buffer an entire uncompressed message 
   before it can be copied. 
    
   The destination parameter gives the address passed to which the first byte 
   in the string will be copied. dispatcher. 
    
   Note that byte copying if the announcement_location operand is performed as per set to 0 then no 
   announcement information is provided, otherwise it points to the rules of Section 7.4. 
    
9.3.5.  COPY-LITERAL 
    
   A modified version 
   starting memory address of the COPY instruction is given below: 
    
   COPY-LITERAL (%position, %length, $destination) 
    
   The COPY-LITERAL instruction behaves announcement information as a COPY per 
   Section 6.3. 
    
   The END-MESSAGE instruction except 
   that after copying, the destination parameter is replaced with requests the 
   memory address immediately following creation of state using the address to 
   operands state start and state length, which the final together denote a byte was copied. If 
   string state_value. Provided that the final application gives permission, 
   state_value is byte was copied to from the UDVM memory address (obeying the rules of 
   Section 7.3) and stored together with a 16-byte state identifier that 
   can be used to access the state by a later compressed message. 
    

 
 
 
Price, Hannu, et al.                                           [Page 33] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   specified in byte_copy_right, the destination parameter is set to the 
   memory address specified in byte_copy_left. 
    
9.3.6.  COPY-OFFSET 
    
   A further version of the COPY-LITERAL instruction is given below: 
    
   COPY-OFFSET (%offset, %length, $destination) 
    
   The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction 
   except that an offset parameter is given instead of a position 
   parameter. 
 
 
   To derive a suitable position parameter, starting at provide security against malicious access, the memory 
   address specified identifier for any 
   item of state created by destination, the UDVM counts backwards a total 
   of offset memory addresses. If the memory address specified in 
   byte_copy_left is reached, derived from the next memory address is taken [MD5] hash of 
   the state_value to be 
   byte_copy_right. stored. The COPY-OFFSET instruction then behaves as a COPY-LITERAL 
   instruction, state identifier is constructed by 
   taking the position parameter to be 16-byte [MD5] hash and replacing all but the last memory 
   address reached in first 
   hash_length most significant bytes with zeroes. Note that if 
   hash_length is 16 then the above step. 
    
9.4.  Program flow instructions unmodified [MD5] hash is the state 
   identifier. Decompression failure occurs if hash_length is less than 
   the application-defined parameter minimum_hash_size or greater than 
   16. 
    
   If a state identifier already exists (hash collision occurs), the 
   decompressor should check whether the requested state is identical to 
   the established state, and count the state creation request as 
   successful if this is the case. 
    
   If not then the state creation request is unsuccessful. The following instructions alter existing 
   state MUST NOT be replaced with the flow of UDVM code. Each 
   instruction jumps requested state to one of be saved. This 
   is to avoid the situation where a number of memory addresses based on compressed message cannot be 
   decompressed because a 
   certain specified criterion. Note that all needed item of state has been replaced 
   (possibly by a malicious sender). 
    
9.4.2.  DECOMPRESSION-FAILURE 
    
   The DECOMPRESSION-FAILURE instruction triggers a manual decompression 
   failure. This is useful if the instructions give UDVM program discovers that it cannot 
   successfully decompress the memory addresses in message (e.g. by using the form of deltas relative CRC 
   instruction). 
    
   This instruction has no operands. 
    
9.4.3.  OUTPUT 
    
   The OUTPUT instruction provides successfully decompressed data to the 
   dispatcher. 
    
   OUTPUT (%output_start, %output_length) 
    
   The operands define the starting memory address and length of the instruction. The actual memory address is calculated 
   as follows: 
    
   memory_address = (memory_address_of_instruction + delta) modulo 2^16 
    
   Note that certain I/O instructions (see Section 9.5) can also alter 
   program flow. 
    
9.4.1.  JUMP 
    
   The JUMP instruction moves program execution 
   byte string to be provided to the specified memory 
   address. 
    
   JUMP (%delta) dispatcher. Note that if the address (specified as OUTPUT 
   instruction can be used to output a delta from partially decompressed message; 
   each time the address instruction is encountered it appends a byte string to 
   the end of the JUMP instruction) lies beyond data previously passed to the dispatcher via the 
   OUTPUT instruction. 
    
   The string of data is byte copied from the overall UDVM memory size then 
   decompression obeying the 
   rules of Section 7.3. 
    
   Decompression failure occurs. occurs if the cumulative number of bytes 
   provided to the dispatcher exceeds the application-defined parameter 
   maximum_uncompressed_size. 
    

 
 
 
Price, Hannu, et al.                                           [Page 34] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
9.4.2.  COMPARE 
    
   The COMPARE instruction compares two parameters 
 
 
   Since there is technically a difference between outputting a 0-byte 
   decompressed message, and then jumps not outputting a decompressed message at 
   all, the OUTPUT instruction needs to one 
   of three specified memory addresses depending on distinguish between the result. 
    
   COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3) 
    
   If parameter_1 < parameter_2 then two 
   cases. Thus, if the UDVM continues terminates before encountering an OUTPUT 
   instruction 
   execution at the (relative) memory address specified by delta 1. it is considered not to have outputted a decompressed 
   message. If 
   parameter_1 = parameter_2 then it jumps encounters one or more OUTPUT instructions, each of 
   which provides 0 bytes of data to the address specified by 
   delta_2. If parameter_1 > parameter_2 dispatcher, then it jumps is 
   considered to the address 
   specified by delta_3. 
    
9.4.3.  CALL and RETURN 
    
   The CALL and RETURN instructions provide support for compression 
   algorithms with have outputted a nested structure. 
    
   CALL (%delta) 
    
   RETURN 0-byte decompressed message. 
    
9.4.4.  NBO 
    
   The CALL NBO instruction modifies the order in which compressed bits are 
   passed to the UDVM. 
    
   As the INPUT-FIXED and RETURN INPUT-HUFFMAN instructions make use of read individual 
   bits from within a stack of 2-byte 
   variables stored at the memory address specified by byte, to avoid ambiguity it is necessary to define 
   the well-known 
   variable stack_location. order in which these bits are read. The stack contains default operation is to 
   read the following variables: 
    
   Name:           Starting memory address: 
    
   stack_free            stack_location 
   stack[0]              stack_location + 2 
   stack[1]              stack_location + 4 
   stack[2]              stack_location + 6 
      :                       : 
    
   The MSBs of these variables are stored before the LSBs, but if the NBO instruction is 
   encountered then the LSBs in are read before the UDVM 
   memory. 
    
   When MSBs. Both cases are 
   illustrated below: 
    
    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB 
    
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
        Byte 0          Byte 1              Byte 0          Byte 1 
    
           Default operation            After NBO instruction 
    
   The NBO instruction can only be used before bitwise compressed data 
   is passed to the UDVM reaches UDVM. Therefore, a CALL instruction, decompression failure occurs if 
   it finds the memory address is encountered after an INPUT-FIXED or an INPUT-HUFFMAN 
   instruction has been used. 
    
9.4.5.  INPUT-BYTECODE 
    
   The INPUT-BYTECODE instruction requests a certain number of bytes of 
   compressed data from the instruction immediately following dispatcher. 
    
   INPUT-BYTECODE (%length, %destination, %delta) 
    
   The length operand indicates the CALL instruction and 
   copies this 2-byte value into stack[stack_free] ready for later 
   retrieval. It then increases stack_free by 1 requested number of bytes of 
   compressed data, and continues 
   instruction execution at the (relative) destination operand specifies the starting 
   memory address specified by to which they should be copied. Byte copying is 
   performed as per the parameter. 
    
   When rules of Section 7.3. 
    
   If the UDVM reaches a RETURN instruction it decreases stack_free by 
   1, and then continues instruction execution at requests data that lies beyond the byte position 
   stored in stack[stack_free]. 
    
   If end of the variable stack_free is ever increased beyond 65535 or 
   decreased below 0 then a bad 
   compressed message has been received and 
   decompression failure occurs (see Section 8.3). message, no data is returned. Instead the UDVM moves 


 
 
 
Price, Hannu, et al.                                           [Page 35] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   Decompression 
 
 
   program execution to the memory address specified by the formula 
   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. 
    
   The INPUT-BYTECODE instruction can only be used before bitwise 
   compressed data is passed to the UDVM. Therefore, a decompression 
   failure also occurs if one of the above instructions it is encountered and the value of stack_location is smaller than 6 (this 
   prevents the stack from overwriting the well-known variables). 
    
9.4.4.  SWITCH after an INPUT-FIXED or an INPUT-
   HUFFMAN instruction has been used. 
    
9.4.6.  INPUT-FIXED 
    
   The SWITCH INPUT-FIXED instruction performs requests a conditional jump based on the value certain number of one bits of its parameters. 
    
   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) 
    
   When a SWITCH instruction is encountered 
   compressed data from the UDVM reads dispatcher. 
    
   INPUT-FIXED (%length, %destination, %delta) 
    
   The length operand indicates the value requested number of 
   j. It then continues instruction execution at the (relative) address 
   specified by delta j. bits. If j specifies a value of n or more, a bad compressed message has 
   been received this 
   operand does not lie between 1 and 16 inclusive then a decompression 
   failure occurs. 
    
9.4.5.  CRC 
    
   The CRC instruction verifies a string of bytes using a 2-byte CRC. 
    
   CRC (%value, %position, %length, %delta) 
    
   The actual CRC calculation is performed using destination operand specifies the generator 
   polynomial x^16 + x^12 + x^5 + 1, memory address to which coincides with the 2-byte 
   Frame Check Sequence (FCS) of [RFC-1662]. 
    
   The position and length parameters define the string of bytes over 
   which 
   compressed data should be copied. Note that the CRC is evaluated. Byte copying rules requested bits are enforced 
   interpreted as per 
   Section 7.4. 
    
   Important note: Since a CRC calculation is always performed over a 
   bitstream, for interoperability it is necessary 2-byte integer ranging from 0 to define the order 
   in which bits are supplied within each individual byte. In this case 2^length - 1. Under 
   default operation the MSBs of the byte MUST be supplied to the CRC calculation before 
   the LSBs. 
    
   The value parameter contains the expected this integer value of are provided first, but if 
   an NBO instruction has been executed then the 2-byte 
   CRC. LSBs are provided 
   first. 
    
   If the calculated CRC matches the expected value then instruction requests data that lies beyond the UDVM 
   continues at end of the following instruction. Otherwise 
   compressed message, no data is returned. Instead the UDVM jumps moves 
   program execution to the (relative) memory address specified by delta. 
    
9.5.  I/O instructions 
    
   The following instructions allow the UDVM to interface with its 
   environment. Note that in the overall SigComp architecture all formula 
   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 
    
9.4.7.  INPUT-HUFFMAN 
    
   The INPUT-HUFFMAN instruction requests a variable number of 
   these interfaces pass to the decompressor dispatcher or to bits of 
   compressed data from the state 
   handler. 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 36] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.1.  END-MESSAGE dispatcher. The END-MESSAGE instruction successfully terminates the UDVM initially 
   requests a small number of bits and 
   passes state information to the state handler. 
    
   END-MESSAGE (%hash_length, %state_start, %state_length, 
   %state_instruction, %capability_announcement_location) 
    
   The actions taken by compares the UDVM upon encountering result against a 
   certain criterion; if the END-MESSAGE 
   instruction criterion is not met then additional bits 
   are described in Section 8.2. Note also that the 
   capability_announcement_location parameter points to the starting 
   memory address of requested until the capability announcement block of Section 6.3. 
    
9.5.2.  DECOMPRESSION-FAILURE criterion is achieved. 
    
   The DECOMPRESSION-FAILURE INPUT-HUFFMAN instruction triggers a manual decompression 
   failure. This is useful if the UDVM program discovers followed by three mandatory operands 
   plus n additional sets of operands. Every additional set contains 
   four operands as shown below: 
    
   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, 
   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, 
   %upper_bound_n, %uncompressed_n) 
    
   Note that it cannot 
   successfully decompress if n = 0 then the message (e.g. INPUT-HUFFMAN instruction is ignored by using 
   the CRC 
   instruction). 
    
   This instruction has no parameters. 
    
9.5.3.  OUTPUT 
    
   The OUTPUT UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then 
   decompression failure occurs. 
    

 
 
 
Price, Hannu, et al.                                           [Page 36] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   In all other cases, the behavior of the INPUT-HUFFMAN instruction provides successfully decompressed is 
   defined below: 
    
   1.)   Set j = 1. 
    
   2.)   Request an additional bits_j compressed bits. Interpret the 
   total (bits_1 + ... + bits_j) bits of compressed data to requested so 
   far as an integer H, with the 
   dispatcher. 
    
   OUTPUT (%output_start, %output_length) 
    
   The parameters define first bit to be supplied as the starting memory address MSB and length of 
   the 
   byte string last bit to be provided to supplied as the dispatcher. Note LSB (note that this is always the OUTPUT 
   instruction can be used to output a partially decompressed message; 
   each time 
   case, independently of whether the NBO instruction has been used). 
    
   3.)   If data is encountered it appends a byte string to requested that lies beyond the end of the data previously passed to compressed 
   message, terminate the dispatcher via INPUT-HUFFMAN instruction and move program 
   execution to the 
   OUTPUT instruction. memory address specified by the formula 
   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 
    
   4.)   If (H < lower_bound_j) or (H > upper_bound_j) then set j = j +  
   1. Then go back to Step 2, unless j > n in which case decompression 
   failure occurs. 
    
   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the 
   memory address specified by the destination operand. 
    
9.4.8.  STATE-REFERENCE 
    
   The string STATE-REFERENCE instruction retrieves some previously stored 
   state information. 
    
   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, 
   %state_destination) 
    
   The id_start and id_length operands specify the location of data is byte copied from the UDVM memory obeying state 
   identifier used to retrieve the 
   rules state information. The state 
   identifier is always 16 bytes long; if id_length is less than 16 then 
   the remaining least significant bytes of Section 7.4. the identifier are padded 
   with zeroes. 
    
   Decompression failure occurs if id_length is greater than 16. 
   Decompression failure also occurs if no state information matching 
   the cumulative number state identifier can be found. 
    
   Note that when accessing state information that has been previously 
   created by the UDVM, the state identifier is always taken from an 
   [MD5] hash of bytes 
   provided to the dispatcher exceeds state to be retrieved. However this is not 
   necessarily the case for application-defined parameter 
   maximum_uncompressed_size. 
    
   Since there is technically a difference between outputting a 0-byte 
   decompressed message, state as per Section 
   3.2. 
    
   The state_start and not outputting a decompressed message at 
   all, state_length operands define the OUTPUT instruction needs starting byte 
   and number of bytes to distinguish between copy from the two 
   cases. Thus, if state_value contained in the UDVM terminates before encountering an OUTPUT 
   instruction it 
   identified item of state. If more state is considered not to have outputted a decompressed 
   message. If it encounters one or more OUTPUT instructions, each of 
   which provides 0 bytes of data to the dispatcher, then it requested than is 
   considered to have outputted a 0-byte decompressed message. actually 
   available then decompression failure occurs. 

 
 
 
Price, Hannu, et al.                                           [Page 37] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
9.5.4.  NBO 
 
 
   The NBO state_destination operand contains a UDVM memory address. The 
   requested state is byte copied to this memory address using the rules 
   of Section 7.3. 
    
9.4.9.  STATE-EXECUTE 
    
   The STATE-EXECUTE instruction modifies retrieves and runs some previously 
   stored state information. 
    
   STATE-EXECUTE (%id_start, %id_length) 
    
   The id_start and id_length operands function as per the order in which compressed bits are 
   passed STATE-
   REFERENCE instruction. 
    
   STATE-EXECUTE is similar to STATE-REQUEST except that it does not 
   require the UDVM. 
    
   As amount of state being requested or the INPUT-FIXED and INPUT-HUFFMAN instructions read individual 
   bits from within a byte, proposed 
   destination for the state to avoid ambiguity be specified explicitly. Instead, it is necessary to define 
   simply puts the order in which these bits are read. The default operation is to 
   read state_value back into the MSBs before UDVM memory using the LSBs, but if 
   operands state_start and state_length contained as part of the NBO instruction state 
   information. 
    
   The entire state_value (all state length bytes of it) is 
   encountered byte copied 
   into the memory address specified by state start. The UDVM then jumps 
   to the LSBs (absolute) memory address specified by state_instruction. 
    
   Note that state start, state length and state_instruction are read before all 
   stored together with state_value as part of an item of state 
   information. 
    
    
10. Security considerations 
    
10.1.  Security goals 
    
   The overall security goal of the MSBs. Both cases are 
   illustrated below: 
    
    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB 
    
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
        Byte 0          Byte 1              Byte 0          Byte 1 
    
           Default operation            After NBO instruction 
    
   The NBO instruction can only be used before bitwise compressed data SigComp architecture is passed to not 
   create risks that are in addition to those already present in the UDVM. Therefore, a decompression failure occurs if 
   it 
   application protocols. There is encountered after an INPUT-FIXED or an INPUT-HUFFMAN 
   instruction has been used. 
    
9.5.5.  INPUT-BYTECODE 
    
   The INPUT-BYTECODE instruction requests a certain number of bytes no intention for SigComp to enhance 
   the security of 
   compressed data from the dispatcher. 
    
   INPUT-BYTECODE (%length, %destination, %delta) 
    
   The length parameter indicates protocols, as it always can be circumvented by 
   not using compression. More specifically, the requested number high-level security 
   goals can be described as: 
    
   -- do not worsen security of bytes existing application protocol 
    
   -- do not create any new security issues 
    
   -- do not hinder deployment of 
   compressed data, application security 
    
    
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 38] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
10.2.  Security risks and mitigations 
    
   This subsection identifies the destination parameter specifies potential security risks associated 
   with the starting 
   memory address to which they should overall SigComp architecture, and details the proposed 
   solution for each risk. 
    
    
   ** Confidentiality risks 
    
   *** Attacking SigComp by snooping into state of other users 
    
   State can only be copied. Byte copying accessed using a state identifier, which is 
   performed as per the rules a 
   (prefix of a) cryptographic hash of Section 7.4. 
    
   If the instruction requests data that lies beyond state being referenced. This 
   implies that the end of referencing packet already needs knowledge about the 
   compressed message, no data 
   state. To enforce this, a reference length of 72 bits is returned. Instead defined. 
   This also minimizes the UDVM moves 
   program execution probability of an accidental state collision. 
    
   Generally, ways to obtain knowledge about the memory address specified by state identifier (e.g., 
   passive attacks) will also easily provide knowledge about the formula 
   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. state 
   referenced, so no new vulnerability results. 
    
   The INPUT-BYTECODE instruction can only be used before bitwise 
   compressed data is passed application needs to handle state identifiers with the UDVM. Therefore, a decompression 
   failure occurs if same care 
   it is encountered after an INPUT-FIXED or an INPUT-
   HUFFMAN instruction has been used. 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 38] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.6.  INPUT-FIXED 
    
   The INPUT-FIXED instruction requests a certain number of bits of 
   compressed data from would handle the dispatcher. 
    
   INPUT-FIXED (%length, %destination, %delta) state itself. 
    
   ** Integrity risks 
    
   The length parameter indicates SigComp approach assumes that there is appropriate integrity 
   protection below and/or above the requested number of bits. If this 
   parameter does not lie between 1 and 16 inclusive then a 
   decompression failure occurs. 
    
   The destination parameter specifies SigComp layer. However, the memory address state 
   establishment mechanism provides additional potential to which compromise 
   the 
   compressed data should integrity of the messages (which, however, would most likely be copied. Note that 
   detectable at the requested bits are 
   interpreted as a 2-byte integer ranging from 0 application layer). 
    
   *** Attacking SigComp by faking state or making unauthorized changes 
   to 2^length - 1. Under 
   default operation the MSBs of this integer are provided first, but state 
    
   State cannot be destroyed or changed by a malicious sender -- it can 
   only add new state. Faking state is only possible if the hash allows 
   intentional collision. 
    
   ** Availability risks (avoid DoS vulnerabilities) 
    
   *** Use of SigComp as a tool in a DoS attack to another target 
    
   SigComp cannot easily be used as an NBO instruction has been executed amplifier in a reflection attack, 
   as it only generates one decompressed message per incoming compressed 
   message. This packet is then handed to the LSBs are provided 
   first. 
    
   If application; the instruction requests data that lies beyond utility 
   as a reflection amplifier is therefore limited by the end utility of the 
   compressed message, no data is returned. Instead the UDVM moves 
   program execution 
   application. 
    
   However, it must be noted that SigComp can be used to generate larger 
   packets as input to the memory address specified by application than have to be sent from the formula 
   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 
    
    
9.5.7.  INPUT-HUFFMAN 
    
   The INPUT-HUFFMAN instruction requests 

 
 
 
Price, Hannu, et al.                                           [Page 39] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   malicious sender; this therefore can send smaller packets (at a variable number of bits lower 
   bandwidth) than are delivered to the application. Depending on the 
   reflection characteristics of 
   compressed data from the dispatcher. The instruction initially 
   requests application, this can be considered 
   a small number mild form of bits and compares amplification. The application MUST limit the result against number 
   of packets reflected to a 
   certain criterion; potential target -- even if the criterion is not met then additional bits 
   are requested until the criterion is achieved. 
    
   The INPUT-HUFFMAN instruction SigComp is followed by three mandatory 
   parameters plus n additional sets used 
   to generate a large amount of parameters. Every additional set 
   contains four parameters information from a small incoming 
   attack packet. 
   *** Attacking SigComp as shown below: 
    
   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, 
   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, 
   %upper_bound_n, %uncompressed_n) 
    
   Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored DoS target by 
   the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then 
   decompression failure occurs. 
    
   In all other cases, the behavior filling it with state 
    
   Excessive state can only be installed by a malicious sender (or a set 
   of malicious senders) with the INPUT-HUFFMAN instruction is 
   defined below: 
    
   1.)   Set j = 1. 
    
   2.)   Request an additional bits_j compressed bits. Interpret consent of the 
   total (bits_1 + ... + bits_j) bits application. The system 
   consisting of compressed data requested so 

 
 
 
Price, Hannu, et al.                                           [Page 39] 

INTERNET-DRAFT SigComp              February 14 , 2002 
 
 
   far and application is thus approximately as an integer H, with the first bit to be supplied 
   vulnerable as the MSB and application itself, unless it allows the last bit 
   installation of state from a message where it would not have 
   installed state itself. 
    
   If this is desirable to increase the compression ratio, the effect 
   can be supplied as mitigated by adding feedback at the LSB (note application level that this is always the 
   case, independently of 
   indicates whether the NBO instruction has been used). 
    
   3.)   If data is state requested was actually installed -- This 
   allows a system under attack to gracefully degrade by no longer 
   installing compressor state that lies beyond the end of the compressed 
   message, terminate the INPUT-HUFFMAN instruction and move program 
   execution to the memory address specified is not matched by application state. 
    
   *** Attacking the formula 
   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 
    
   4.)   If (H < lower_bound_j) UDVM by faking state or (H > upper_bound_j) then set j = j +  
   1. Then go back to Step 2, unless j > n in which case decompression 
   failure occurs. 
    
   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 making unauthorized changes 
   to state 
    
    (See "Integrity risks" above.) 
    
   *** Attacking the 
   memory address specified UDVM by the destination parameter. 
    
9.5.8.  STATE-REFERENCE 
    
   The STATE-REFERENCE instruction retrieves some previously stored 
   state information. 
    
   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, 
   %state_destination) sending it looping code 
    
   The id_start and id_length parameters specify application sets an upper limit to the location number of the 
   state identifier "CPU cycles" 
   that can be used to retrieve per compressed message and per input bit in the state information. 
   compressed message. The state 
   identifier damage inflicted by sending packets with 
   looping code is always 16 bytes long; therefore limited, although this may still be 
   substantial if id_length is less than 16 then 
   the remaining least significant bytes a large number of the identifier CPU cycles are padded 
   with zeroes. 
    
   Decompression failure occurs if id_length is greater than 16. 
   Decompression failure also occurs if no state information matching offered by the state identifier can UDVM. 
   However, this would be found. 
    
   Note that when accessing state information true for any decompressor that has been previously 
   created can receive 
   packets from anywhere. 
    
    
11. IANA considerations 
    
   The SigComp solution currently requires two identifiers to be 
   assigned by IANA: the UDVM, UDVM_version and the state identifier is always taken from an 
   [MD5] hash identifier. 
    
   Upgraded versions of the state UDVM will contain additional instructions to be retrieved. However this is not 
   necessarily the case for application-defined state as per Section 
   3.2. 
    
   The state_start and state_length parameters define 
   improve the starting byte 
   and number performance of bytes to copy from the state_value contained overall SigComp solution; new 
   UDVM_version parameters will be needed in the 
   identified item of state. If more state is requested than is actually 
   available then decompression failure occurs. 
    
   The state_destination parameter contains a UDVM memory address. The 
   requested state is byte copied to this memory address using the rules 
   of Section 7.4. case. 
    
    
12. Acknowledgements 
    
   Thanks to  

 
 
 
Price, Hannu, et al.                                           [Page 40] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
9.5.9.  STATE-EXECUTE 
    
   The STATE-EXECUTE instruction retrieves and runs some previously 
   stored state information. 
    
   STATE-EXECUTE (%id_start, %id_length) 
    
   The id_start and id_length parameters function as per the STATE-
   REFERENCE instruction. 
    
   STATE-EXECUTE is similar to STATE-REQUEST except that it does not 
   require the amount of state being requested or the proposed 
   destination 
 
 
        
            Abigail Surtees (abigail.surtees@roke.co.uk)  
            Mark A West (mark.a.west@roke.co.uk)  
            Lawrence Conroy (lwc@roke.co.uk)  
            Christian Schmidt (christian.schmidt@icn.siemens.de)  
            Max Riegel (maximilian.riegel@icn.siemens.de)  
            Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) 
            Stefan Forsgren (stefan.forsgren@epl.ericsson.se)  
            Krister Svanbro (krister.svanbro@epl.ericsson.se)  
            Miguel Garcia (miguel.a.garcia@ericsson.com) 
            Christopher Clanton (christopher.clanton@nokia.com)  
            Khiem Le (khiem.le@nokia.com)  
            Ka Cheong Leung (kacheong.leung@nokia.com)  
    
   for the state to be specified explicitly. Instead, it 
   simply puts the state_value back into the UDVM memory using the 
   parameters state_start valuable input and state_length contained as part of the 
   state information. 
    
   The entire state_value (all state length bytes of it) is byte copied 
   into the memory address specified by state start. The UDVM then jumps 
   to the (absolute) memory address specified by state_instruction. 
    
   Note that state start, state length and state_instruction are all 
   stored together with state_value as part of an item of state 
   information. 
 
    
10. Security considerations 
    
10.1 Security goals 
    
   The overall security goal of the SigComp architecture is to not 
   create risks that are in addition to those already present in the 
   application protocols. There is no intention for SigComp to enhance 
   the security of the protocols, as it always can be circumvented by 
   not using compression. More specifically, the high-level security 
   goals can be described as: 
    
   -- do not worsen security of existing application protocol 
    
   -- do not create any new security issues 
    
   -- do not hinder deployment of application security 
    
    
10.2 Security risks and mitigations 
    
   This subsection identifies the potential security risks associated 
   with the overall SigComp architecture, and details the proposed 
   solution for each risk. review. 
    
    
13. Authors' addresses 
    
   Richard Price         Tel: +44 1794 833681 
   Email:                richard.price@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
   Hans Hannu            Tel: +46 920 20 21 84 
   Email:                hans.hannu@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Carsten Bormann       Tel: +49 421 218 7024 
   Email:                cabo@tzi.org 
    
   Universitaet Bremen TZI 
   Postfach 330440 
   D-28334 Bremen, Germany 
    
    
   Jan Christoffersson   Tel: +46 920 20 28 40 
   Email:                jan.christoffersson@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 41] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   ** Confidentiality risks 
    
   *** Attacking SigComp by snooping into state of other users 
    
   State can only be accessed using a state identifier, which is a 
   (prefix of a) cryptographic hash of the state being referenced. This 
   implies that the referencing packet already needs knowledge about the 
   state. To enforce this, a reference length of 48 bits is defined. 
   This also minimizes the probability of an accidental state collision. 
    
   Generally, ways to obtain knowledge about the state identifier (e.g., 
   passive attacks) will also easily provide knowledge about the state 
   referenced, so no new vulnerability results. 
    
   The application needs to handle state identifiers with the same care 
   it would handle the state itself. 
    
   ** Integrity risks 
    
   The SigComp approach assumes that there is appropriate integrity 
   protection below and/or above the SigComp layer. However, the state 
   establishment mechanism provides additional potential to compromise 
   the integrity of the messages (which, however, would most likely be 
   detectable at the application layer). 
    
   *** Attacking SigComp by faking state or making unauthorized changes 
   to state 
    
   State cannot be destroyed or changed by a malicious sender -- it can 
   only add new state. Faking state is only possible if the hash allows 
   intentional collision. 
    
   ** Availability risks (avoid DoS vulnerabilities) 
    
   *** Use of SigComp as a tool in a DoS attack to another target 
    
   SigComp cannot easily be used as an amplifier in a reflection attack, 
   as it only generates one decompressed message per incoming compressed 
   message. This packet is then handed to the application; the utility 
   as a reflection amplifier is therefore limited by the utility of the 
   application. 
    
   However, it must be noted that SigComp can be used to generate larger 
   packets as input to the application than have to be sent from the 
   malicious sender; this therefore can send smaller packets (at a lower 
   bandwidth) than are delivered to the application. Depending on the 
   reflection characteristics of the application, this can be considered 
   a mild form of amplification. The application MUST limit the number 
   of packets reflected to a potential target -- even if SigComp is used 
   to generate a large amount of information from a small incoming 
   attack packet. 

 
 
 
Price, Hannu, et al.                                           [Page 42] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   *** Attacking SigComp as the DoS target by filling it with state 
    
   Excessive state can only be installed by a malicious sender (or a set 
   of malicious senders) with the consent of the application. The system 
   consisting of SigComp and application is thus approximately as 
   vulnerable as the application itself, unless it allows the 
   installation of state from a message where it would not have 
   installed state itself. 
    
   If this is desirable to increase the compression ratio, the effect 
   can be mitigated by adding feedback at the application level that 
   indicates whether the state requested was actually installed -- This 
   allows a system under attack to gracefully degrade by no longer 
   installing compressor state that is not matched by application state. 
    
   *** Attacking the UDVM by faking state or making unauthorized changes 
   to state 
    
    (See "Integrity risks" above.) 
    
   *** Attacking the UDVM by sending it looping code 
    
   The application sets an upper limit to the number of "CPU cycles" 
   that can be used per compressed message and per input bit in the 
   compressed message. The damage inflicted by sending packets with 
   looping code is therefore limited, although this may still be 
   substantial if a large number of CPU cycles are offered by the UDVM. 
   However, this would be true for any decompressor that can receive 
   packets from anywhere. 
    
    
11. IANA considerations 
    
   The SigComp solution currently requires two identifiers to be 
   assigned by IANA: the UDVM_version and the state identifier. 
    
   Upgraded versions of the UDVM will contain additional instructions to 
   improve the performance of the overall SigComp solution; new 
   UDVM_version parameters will be needed in this case. 
    
   Well-known decompression algorithms will also need to be assigned 
   fixed state identifiers. 
    
    
12. Acknowledgements 
    
   Thanks to  
        
            Abigail Surtees (abigail.surtees@roke.co.uk)  
            Mark A West (mark.a.west@roke.co.uk)  
            Lawrence Conroy (lwc@roke.co.uk)  

 
 
 
Price, Hannu, et al.                                           [Page 43] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
            Christian Schmidt (christian.schmidt@icn.siemens.de)  
            Max Riegel (maximilian.riegel@icn.siemens.de)  
            Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) 
            Stefan Forsgren (stefan.forsgren@epl.ericsson.se)  
            Krister Svanbro (krister.svanbro@epl.ericsson.se)  
            Christopher Clanton (christopher.clanton@nokia.com)  
            Khiem Le (khiem.le@nokia.com)  
            Ka Cheong Leung (kacheong.leung@nokia.com)  
    
   for valuable input and review. 
    
    
13. AuthorsĘ addresses 
    
   Richard Price         Tel: +44 1794 833681 
   Email:                richard.price@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
   Hans Hannu            Tel: +46 920 20 21 84 
   Email:                hans.hannu@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Carsten Bormann       Tel: +49 421 218 7024 
   Email:                cabo@tzi.org 
    
   Universitaet Bremen TZI 
   Postfach 330440 
   D-28334 Bremen, Germany 
    
    
   Jan Christoffersson   Tel: +46 920 20 28 40 
   Email:                jan.christoffersson@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Zhigang Liu           Tel: +1 972 894-5935 
   Email:                zhigang.liu@nokia.com 
    
   Nokia Research Center 
   6000 Connection Drive 

 
 
 
Price, Hannu, et al.                                           [Page 44] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Irving, TX 75039 
   USA 
    
    
   Jonathan Rosenberg 
   Email:                jdrosen@dynamicsoft.com 
    
   dynamicsoft 
   72 Eagle Rock Avenue 
   First Floor 
   East Hanover, NJ 07936 
    
 
14. References 
 
   [SIP]       "SIP: Session Initiation Protocol", Handley et al,  
               RFC 2543, Internet Engineering Task Force, March 1999 
    
   [RTSP]      "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.  
               Rao and R. Lanphier, , RFC 2326, April 1998 
    
   [HTTP]      "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et  
               al.", RFC 2616, June 1999 
    
   [SIPsrv]    "SIP: Locating SIP Servers", J. Rosenberg, H.  
               Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002,  
               work in progress 
    
   [DEFLATE]   "DEFLATE Compressed Data Format Specification version  
               1.3", P. Deutsch, RFC 1951, Internet Engineering Task  
               Force, May 1996 
    
   [SCTP]      "Stream Control Transmission Protocol", Stewart et al,  
               RFC 2960, Internet Engineering Task Force, October 2000 
    
   [MD5]       "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,  
               Internet Engineering Task Force, April 1992 
    
   [RFC-1662]  "PPP in HDLC-like Framing", Simpson et al, Internet  
               Engineering Task Force, July 1994 
    
   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott 
               Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]  "Key words for use in RFCs to Indicate Requirement 
               Levels", Scott Bradner, Internet Engineering Task Force, 
               March 1997 
    




 
 
 
Price, Hannu, et al.                                           [Page 45] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
Appendix A. Mnemonic language 
    
   Writing UDVM programs directly in bytecode would be a daunting task, 
   so a simple mnemonic language is provided to facilitate the creation 
   of new decompression algorithms. Most importantly, the language 
   allows the parameters of an instruction to be specified as text names 
   rather than as integer values. 
    
   If an instruction parameter is given as a text name, it should 
   correspond to exactly one instance of a label, a reserved memory 
   address or an externally defined keyword. A label is simply a text 
   name preceded by a colon, for example: 
    
   :loop 
   JUMP (loop) 
    
   For any parameters corresponding to a label, the integer value of the 
   parameter is calculated by the following formula: 
    
    parameter_value = (instruction_address - label_address) modulo 2^16 
    
   Note that the "label address" is simply the memory address of the 
   instruction immediately following the label. In particular, the above 
   example can be rewritten as JUMP (0). 
    
   A reserved memory address is specified using the "reserve" keyword 
   followed by a text_name and (optionally) an integer value. For 
   example: 
    
   reserve apples 
   reserve pears (8) 
   reserve bananas 
   LOAD (bananas, 5) 
    
   For any parameters corresponding to a reserved memory address, the 
   integer value of the parameter is the next free memory address that 
   has not yet been reserved. Starting at this address, the specified 
   number of bytes of memory are then reserved (if no value is given 
   then a total of 2 bytes is reserved). 
    
   The first instance of a "reserve" keyword begins reserving memory at 
   Address 6 (to avoid overwriting the three well-known variables of 
   Section 7.2). So the above example can be rewritten as LOAD (16, 5). 
    
   An externally defined keyword is specified outside of the mnemonic 
   language. All of the application-defined parameters are considered to 
   be externally defined keywords and can be referenced in the mnemonic 
   code (useful for adapting the code based on the available memory or 
   CPU cycles). The following additional keywords can also be used: 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 46] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Keyword:                 Corresponding value: 
    
   byte_copy_left                 0 
   byte_copy_right                2 
   stack_location                 4 
   reserved_end               See below 
   bytecode_length            See below 
   total_length               See below 
    
   The keyword reserved_end specifies the highest reserved memory 
   address for the entire mnemonic code (taking into account all the 
   occasions where memory is reserved). 
    
   The keyword bytecode_length specifies the total size of the bytecode 
   corresponding to the mnemonic code. Any instances of bytecode_length 
   are initially replaced with 3 bytes of zeroes, and then are filled in 
   after the remainder of the bytecode has been generated. 
    
   Similarly, the keyword total_length specifies the total amount of 
   memory required at the UDVM including bytecode and reserved memory 
   addresses. 
    
   A complete description of the mnemonic language and how it should be 
   translated into bytecode is given below: 
    
   Instructions:     Instruction names are given in capitals. Replace  
                     each name with the corresponding 1-byte value as  
                     per Chapter 9. 
    
   $:                When appended to the front of an instruction  
                     parameter then the parameter is a memory address 
                     rather than a direct value. This symbol is  
                     mandatory for reference parameters, optional for 
                     multitype parameters and disallowed for literals. 
    
   Integers:         Instruction parameters can be given in the form of 
                     decimal integers. They are converted into the 
                     shortest bytecode capable of representing the 
                     integer by the rules of Section 7.3. 
    
   Text references:  Instruction parameters can also be given in the  
                     form of lowercase names. These names should match 
                     exactly one label, reserved memory address or 
                     externally defined keyword as described above. 
    
   Labels:           Label names are given as a colon followed by  
                     lowercase text. They are deleted when converting  
                     the mnemonics to bytecode. 
    
   Reserved memory:  Memory addresses are reserved using the "reserve" 
                     keyword. The line containing the reserve keyword 

 
 
 
Price, Hannu, et al.                                           [Page 47] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
                     is deleted when converting to bytecode. 
    
   .LSB:             When appended to the end of a text name, the  
                     integer value corresponding to the name is  
                     increased by 1. This is useful for addressing the  
                     LSBs of a 2-byte variable. 
    
   0b, 0d:           Bytecode values can be specified directly in  
                     binary or decimal via the appropriate prefix. The  
                     direct bytecode continues until a character occurs  
                     that is not an integer or whitespace. 
    
   Whitespace:       All whitespace (plus brackets and commas) just  
                     delimit the instructions. Delete. 
    
   Comments:         These are indicated by a semicolon and continue 
                     to the end of the line. Delete. 
    
   Once the mnemonic code has been converted into bytecode, it can be 
   executed by copying the bytecode into the UDVM memory beginning at 
   the first memory address that has not been reserved by an instance of 
   the "reserve" keyword. Program execution is assumed to begin at this 
   address. 
    
   Note that further to the rules outlined above, well-written mnemonic 
   code will also have the following properties: 
    
   *    Any instance of a memory address will be specified as a text 
        reference rather than an integer value. This ensures that the 
        mnemonic code is portable. 
    
   *    The mnemonic code will not write to any memory address except 
        those reserved by the "reserve" keyword. This ensures that the 
        code can be compiled. 
    
    
Appendix B. Example application-defined parameters 
    
   This appendix gives some example values for each of the application-
   defined parameters. These values are geared towards the compression 
   of a signaling protocol such as [SIP]. 
    
   Note that all of the proposed values are fixed and not negotiated 
   between the two instances of the application invoking SigComp. This 
   is because it is possible for the application invoking the 
   decompressor to receive compressed messages from several different 
   applications, and it is difficult to determine which message 
   corresponds to which application. [SIP] does this using "From:" and 
   "To:" fields in the message itself, but these are not visible until 
   the message has been decompressed. It is simpler just to fix a set of 
   parameter for every instance of the application. 

 
 
 
Price, Hannu, et al.                                           [Page 48] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   UDVM_version                    0 
   minimum_compression_ratio       0.5 
   maximum_compressed_size         65535 
   maximum_uncompressed_size       65535 
   minimum_hash_size               6 
   overall_memory_size             8192 
   working_memory_start            0 
   working_memory_end              8191 
   cycles_per_bit                  20 
   cycles_per_message              2000 
   first_instruction               6 
    
   Note that the parameters overall_memory_size, cycles_per_bit and 
   cycles_per_message can be increased on the fly using the capabilities 
   announcement mechanism. This mechanism is designed to function 
   correctly even when the receiving application is sent compressed 
   messages from several different applications. 
    
   The initial contents of the UDVM memory also need to be defined. It 
   is not enough simply to initialize the memory containing all zeroes, 
   as the UDVM would be unable to input any compressed data. Instead, 
   for each new compressed message the memory should be initialized 
   containing a simple decompressor capable of extracting the first few 
   bytes of compressed data. These bytes can then be interpreted as a 
   state identifier to retrieve the correct decompression algorithm. 
    
   As an example, the following mnemonic code can be converted to 
   bytecode and pasted into the UDVM memory beginning at Address 6: 
    
   reserve state_identifier (6) 
   INPUT-BYTECODE (6, state_identifier, fail) 
   STATE-EXECUTE (state_identifier, 6) 
   :fail 
   DECOMPRESSION-FAILURE 
    
   Finally, the application can define initial state that is available 
   to the UDVM. Examples of application-defined state include common 
   decompression algorithms, dictionaries of common text phrases etc. 
    
    
Appendix C. Example decompression algorithms 
    
   This appendix gives examples of decompression algorithms which can be 
   run on the UDVM in the form of bytecode. 
    
C.1.  Example UDVM code for simple LZ77 decompression 
    
   The first example gives the code required to decompress data from a 
   very simple LZ77-based algorithm. The UDVM is instructed to interpret 
   a compressed message as a set of 4-byte characters, where each 
   character contains a 2-byte position integer followed by a 2-byte 

 
 
 
Price, Hannu, 
 
 
   Zhigang Liu           Tel: +1 972 894-5935 
   Email:                zhigang.liu@nokia.com 
    
   Nokia Research Center 
   6000 Connection Drive 
   Irving, TX 75039 
   USA 
    
    
   Jonathan Rosenberg 
   Email:                jdrosen@dynamicsoft.com 
    
   dynamicsoft 
   72 Eagle Rock Avenue 
   First Floor 
   East Hanover, NJ 07936 
 
14. References 
 
   [SIP]       "SIP: Session Initiation Protocol", Handley et al.                                           [Page 49] 

INTERNET-DRAFT                  SigComp              February 14 al,  
               RFC 2543, Internet Engineering Task Force, March 1999 
    
   [RTSP]      "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.  
               Rao and R. Lanphier, , 2002 
 
 
   length integer. Taken together these integers point to a previously 
   received text string RFC 2326, April 1998 
    
   [HTTP]      "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et  
               al.", RFC 2616, June 1999 
    
   [SIPsrv]    "SIP: Locating SIP Servers", J. Rosenberg, H.  
               Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002,  
               work in progress 
    
   [DEFLATE]   "DEFLATE Compressed Data Format Specification version  
               1.3", P. Deutsch, RFC 1951, Internet Engineering Task  
               Force, May 1996 
    
   [SCTP]      "Stream Control Transmission Protocol", Stewart et al,  
               RFC 2960, Internet Engineering Task Force, October 2000 
    
   [MD5]       "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,  
               Internet Engineering Task Force, April 1992 
    
   [RFC-1662]  "PPP in the UDVM memory, which is then copied to the 
   end of the uncompressed message. 
    
   Since the compressor can only send references to strings already 
   present HDLC-like Framing", Simpson et al, Internet  
               Engineering Task Force, July 1994 
    
   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott 
               Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]  "Key words for use in the UDVM memory, before the first message is decompressed 
   the memory must be initialized with a static dictionary containing 
   the 256 ASCII characters. 
    
   The algorithm write-protects the memory containing the UDVM 
   instructions used to decompress each character, so that they can 
   easily be compiled to improve the speed of decompression. 
    
   A 2-byte CRC over the uncompressed message is appended to the end of 
   the compressed message, RFCs to verify that correct decompression has 
   occurred. The algorithm also requests that the contents of the UDVM 
   memory be saved using the state request mechanism, so that it can be 
   retrieved by sending the appropriate 6-byte hash. 
    
   reserve byte_copy_left 
   reserve byte_copy_right 
   reserve uncompressed_start 
   reserve uncompressed_end 
   reserve uncompressed_length 
   reserve position 
   reserve length 
   reserve static_dictionary (256) 
   reserve circular_buffer (2048) 
         
   WORKING-MEMORY (uncompressed_start, reserved_end) 
   MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary, 
   circular_buffer, 0, 0, 0) 
    
   :unpack_static_dictionary 
    
   ; The following instructions initialize the static dictionary. 
    
   COPY-LITERAL (position.LSB, 1, $uncompressed_start) 
   ADD ($position, 1) 
   COMPARE ($position, 256, unpack_static_dictionary, next_character, 0) 
    
   :next_character 
    
   INPUT-FIXED (16, position, fail) 
   INPUT-FIXED (16, length, end_of_message) 
   COPY-LITERAL ($position, $length, $uncompressed_end) 
   ADD ($uncompressed_length, $length) 
   JUMP (next_character) 
    
   :fail Indicate Requirement 
               Levels", Scott Bradner, Internet Engineering Task Force, 
               March 1997 

 
 
 
Price, Hannu, et al.                                           [Page 50] 42] 

INTERNET-DRAFT                  SigComp              February 14 ,                   March 1, 2002 
 
 
   DECOMPRESSION-FAILURE 
    
   :end_of_message 
    
   CRC ($position, $uncompressed_start, $uncompressed_length, fail) 
   OUTPUT ($uncompressed_start, $uncompressed_length) 
   END-MESSAGE (6, 0, total_length, next_character, 0) 
 
 
Appendix D. A. Document history 
 
   - October 19, 2001, version 00 
    
   First version. The draft describes the current ideas, from people    
   involved in the ROHC WG, of how to perform compression of  
   application signaling messages. 
    
   - October 31, 2001, version 01 
    
   Second version. Additional section, 5.2.1, which describes when a  
   message identifier can be reused. 
    
   - November 21, 2001, version 02 
    
   Third version. Section 6 has been moved to a separate draft. The  
   third version describes a modular solution, providing flexibility  
   for implementers to decide which functions they want to integrate. 
    
   - January 28, 2002, version 03  
       
   Fourth version. SigComp version 02 is divided into this draft, a UDVM 
   draft and a an extended operation mechanisms draft. 
   Compressor/decompressor (UDVM) state approach has been introduced for 
   security reasons.  
    
   - February 14, 2002, version 04 
    
   Fifth version. Describes the complete base SigComp solution including 
   the UDVM. 
    
   - March 1, 2002, version 05 
    
   Sixth version. Comments from several authors and contributors have 
   been taken into account. Announcement mechanism has been updated. 
    
    
    
    
    
    
    
    
    
    
    
    
    
    
 
   This Internet-Draft expires in August September 2002. 

 
 
 
Price, Hannu, et al.                                           [Page 51] 43] 

----