draft-ietf-rohc-sigcomp-03.txt  -->   draft-ietf-rohc-sigcomp-04.txt

view Side-By-Side changes

 
 
 
Network Working Group                        H. Hannu (Editor), Ericsson                  Richard Price, Siemens/Roke Manor 
INTERNET-DRAFT                              J. Christoffersson,                                      Hans Hannu, Ericsson  
Expires: July August 2002                                     C. Clanton, Nokia
                                                   S. Forsgren.                     Carsten Bormann, TZI/Uni Bremen 
                                           Jan Christoffersson, Ericsson
                                                            K. Le, Nokia
                                                         K. Leung, Nokia
                                                           Z. 
                                                      Zhigang Liu, Nokia
                                            R. Price, Siemens/Roke Manor
                                               J. 
                                         Jonathan Rosenberg, Dynamicsoft
                                                    K. Svanbro, Ericsson
                                                        January 28, dynamicsoft 
  
                                                       February 14, 2002 
 
 
                      Signaling Compression (SigComp)
                    <draft-ietf-rohc-sigcomp-03.txt> 
                     <draft-ietf-rohc-sigcomp-04.txt> 
                                     
    
Status of this memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or cite them other than as "work in progress". 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/lid-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
   This document is a submission of the IETF ROHC WG. Comments should be 
   directed to its mailing list, rohc@cdt.luth.se. 
    
    
Abstract 
    
   This document defines one out of two parts for SigComp, a basic solution for compressing messages 
   generated by protocols, text-based protocols such as SIP, called
   SigComp. SIP [SIP] and RTSP [RTSP]. 
   The architecture and pre-requisites of SigComp is are outlined, along 
   with the format of the SigComp message. 
    
   Decompression functionality for the SigComp solution is provided by a 
   "Universal Decompressor Virtual Machine" optimized for the task of 
   running decompression algorithms. The UDVM can be configured to 
   understand the output of many well-known compressors such as 
   [DEFLATE]. 

 
 
 
Price, Hannu, et al.                                            [Page 1]  

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


0. Document History

   - October 19, 2001, version 00

   First version. The draft describes the current ideas, from people
   involved in the ROHC WG, of how to perform compression of
   application signaling messages.

   - October 31, 2001, version 01

   Second version. Additional section, 5.2.1, which describes when a
   message identifier can be reused.

   - November 21, 2001, version 02

   Third version. Section 6 has been moved to a separate draft. The
   third version describes a modular solution, providing flexibility
   for implementers to decide which functions they want to integrate.

   - January 28, 2002, version 03

   Fourth version. SigComp version 02 is divided into this draft, a UDVM
   draft and a extended operation mechanisms draft.
   Compressor/decompressor (UDVM) state approach has been introduced for
   security reasons. 
 
 
Table of contents 
   
   1.  Introduction..................................................3  Introduction..................................................2 
   2.  Terminology...................................................3 
   3.  The  SigComp Architecture......................................5 Architecture..........................................5 
   4.  Applicability of SigComp......................................6  SigComp message flow..........................................11 
   5.  SigComp operation.............................................6
   5.1.  Pre-requisites..............................................6
   5.2.  Compression states..........................................7
   5.3.  Saving and deleting states..................................7 compressor............................................15 
   6.  SigComp message...............................................8  State handling and capability announcement....................18 
   7.  Capability exchange...........................................8  Overview of the UDVM..........................................22 
   8.  Security considerations......................................10  Decompressing a SigComp message...............................26 
   9.  IANA considerations..........................................10  UDVM instruction set..........................................30 
   10. Acknowledgements.............................................10 Security considerations.......................................41 
   11. AuthorsĘ addresses...........................................10 IANA considerations...........................................43 
   12. References...................................................12











Hannu, et al.                                                   [Page 2]

INTERNET-DRAFT                  SigComp               January 28 , 2002 Acknowledgements..............................................43 
   13. AuthorsĘ addresses............................................44 
   14. References....................................................45 
   Appendix A. Mnemonic language.....................................46 
   Appendix B. Example application-defined parameters................48 
   Appendix C. Example decompression algorithms......................49 
   Appendix D. Document history......................................51 
 
1.  Introduction 
    
   The Session Initiation Protocol (SIP) [SIP], along with many other 
   application protocols used for multimedia communications, communications such as RTSP 
   [RTSP], are is a textual protocols protocol engineered for bandwidth rich links. As 
   a result, these the SIP messages have not been optimized for
   message in terms of size. 
   Typical SIP messages are from a few hundred bytes to as high as 2000. 
   To date, this has not been a significant problem.  
    
   With the planned usage of these protocols in wireless handsets as 
   part of 2.5G and 3G cellular networks, the large size of these 
   messages is problematic. With low-rate IP connectivity, store-and-
   forward delays are significant. Taking into account retransmits, and 
   the multiplicity of messages that are required in some flows, call 
   setup and feature invocation are adversely affected. Therefore, we 
   believe there is a merit in reducing these message sizes.

   From the view of external users, SigComp is only to provide
   compression service. The service provided is that of the underlying
   transport plus compression.  
 
   This document outline outlines the architecture and pre-requisites of SigComp, the 
   SigComp solution including the capability announcement and UDVM 
   algorithm upload, along with the format of the SigComp message.

   This document does not specify 
    
   SigComp is typically offered to applications as a negotiation or capability exchange
   mechanism. The "shim" layer 
   between the application that applies SigComp should also negotiate and the usage transport. The service provided is 
   that of SigComp including capability exchange.

   Compression/decompression algorithms functionality for the underlying transport plus compression. 
    
   This document focuses on the signaling scenario where an endpoint 
   sends and receives data to/from an outbound/inbound proxy. However, 
   SigComp is
   provided by [UDVM]. designed to run over both connectionless and connection-
   oriented transports and hence may be applicable to other scenarios 
   with multiple endpoints compressing and decompressing data. 

 
 
 
Price, Hannu, et al.                                            [Page 2] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
2.  Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC-2119].

   Universal Decompressor Virtual Machine (UDVM)

     The virtual machine described in [UDVM]. 
    
   SigComp 
    
     The UDVM overall solution for signaling compression, comprising the  
     compressor, decompressor, dispatchers and state handler. 
    
   Application 
    
     For the purpose of this document, an application is used a text-based 
     protocol software that: 
    
     a) sends application data to the compressor dispatcher 
     b) receives data from the decompressor dispatcher 
     c) decides whether state information may be saved by SigComp 
    
   Transport 
    
     Mechanism for
     decompression passing data between two instances of an application.  
     SigComp messages. The UDVM is capable of
     interoperating with sending messages over a wide range of compression algorithms.

   Decompressor

     The decompressor invokes the UDVM. It is responsible for supplying
     the UDVM with compressed data  
     transports including TCP, UDP and make decompressed data available
     for the application.







Hannu, et al.                                                   [Page 3]

INTERNET-DRAFT                  SigComp               January 28 , 2002


   Encoder

     Encodes [SCTP]. 
    
   Message-based transport 
    
     A transport that carries data according to as a (compression) algorithm into UDVM
     readable code. The encoded set of distinct, bounded  
     messages. 
    
   Stream-based transport 
    
     A transport that carries data can be decoded by as a UDVM provided continuous stream with no  
     message boundaries. In this case, SigComp reserves the needed instructions.

   Compressor

     The compressor invokes character  
     0xFFFF to delimit messages in the encoder, and keeps track of states compressed stream. 
    
   Application-defined parameters 
    
     Parameters that
     can must be used for compression. Responsible for supplying UDVM
     instructions agreed upon by the applications invoking  
     SigComp. Depending on the situation these parameters might be fixed  
     a-priori or negotiated. 
    
   Application message 
    
     An uncompressed message, as provided from or to the peer decompressor in order for compressed
     data application, 
     which is to be decompressed.

   State

     Data saved either compressed by a compressor or a decompressor. The the compressor. When delivered 
     from the decompressor the data
     reflects has passed through the contents of a UDVM memory.

   State index

     Reference decompression 
     process and is referred to a state saved either by a compressor as decompressed data or a
     decompressor.

   Application

     Invokes the decompressed 
     message. 
    

 
 
 
Price, Hannu, et al.                                            [Page 3] 

INTERNET-DRAFT                  SigComp compressor and decompressor.              February 14 , 2002 
 
 
   SigComp message 
    
     May contain a compressed application message in the form of UDVM  
     bytecode. In case of a message oriented transport mechanism, message- based transport, such as UDP, a  
     SigComp message corresponds to exactly one (UDP) packet. datagram. For a
     stream oriented  
     stream-based transport, such as TCP, each SigComp message is  
     separated by a 0xFFFF delimiter, see [UDVM].

   Application data (message)

     Data, also referred to as message, which delimiter. 
    
   Compressor 
    
     The compressor invokes the encoder, and keeps track of states that 
     can be used for compression. It is responsible for supplying UDVM 
     bytecode to be
     compressed by the compressor. When delivered from the remote decompressor
     the in order for compressed 
     data has passed through the decompression process and is
     referred to as decompressed be decompressed. 
    
   Encoder 
    
     Encodes data or decompressed message.















Hannu, et al.                                                   [Page 4]

INTERNET-DRAFT according to a (compression) algorithm into UDVM 
     bytecode. The encoded data can be decoded by a UDVM. 
    
   Compressor dispatcher 
    
     A layer that receives uncompressed application messages, invokes a 
     compressor, and forwards the SigComp               January 28 , 2002


3. messages to a remote SigComp 
     layer. 
    
   Decompressor 
    
     The decompressor is responsible for converting a SigComp Architecture

   Compression and decompression message  
     into uncompressed data. Decompression functionality is performed at two communicating end-
   points, see Figure 1.

          +--------------------+                 +--------------------+
          |    Compressor 1    |                 | provided by  
     the UDVM. 
    
   Decompressor 1  |
          |    +---------+     |                 |     +---------+    |
          |    | Encoder |     |                 |     | dispatcher 
    
     A layer that receives SigComp messages, invokes a decompressor, and 
     forwards the decompressed application messages to an application. 
    
   Virtual machine 
    
     A machine architecture designed to be implemented in software 
     (although silicon implementations are of course possible). 
    
   Universal Decompressor Virtual Machine (UDVM) 
    
     The virtual machine described in this document. The UDVM   |    |
          |    +---------+     |                 |     +---------+    |
          +--------------------+                 +--------------------+
               ^   App. ^    |                     ^     | App.  ^
               |   data |    |                     |     v is used  
     for decompression of SigComp messages. 
    
   Bytecode 
    
     Machine code that can be executed by a virtual machine. UDVM 
     bytecode is a combination of UDVM instructions and compressed data. 

 
 
 
Price, Hannu, et al.                                            [Page 4] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Per-message compression 
    
     Compression that does not reference data  |
          +----|-------------|-+                 +-|-------------|----+
          |    |             | | from previous messages.  
     SigComp     | |             |    |
          |    | Application v | can decompress a message     | | Application |    |
          |    v             +---------->----------+             v    |
          |   +-------+        |                 |        +-------+   |
          |   | of this type using only the  
     application-defined parameters and the data in the message itself. 
    
   Dynamic compression 
    
     Compression relative to messages sent prior to the current  
     compressed message. SigComp stores and retrieves this data using  
     the state handler. 
    
   State |        | 
    
     Data transport  |        | State |   |
          |   +-------+        |                 |        +-------+   |
          |    ^             +----------<----------+             ^    |
          |    |             | | saved for retrieval by later SigComp     | ^             |    |
          |    |             | |     message     | |             |    |
          +----|-------------|-+                 +-|-------------|----+
               |   App. ^    |                     |     | App.  |
               v   data |    v                     |     v messages. The data  v
          +--------------------+                 +--------------------+
          |   Decompressor 2   |                 |     Compressor 2   |
          |    +---------+     |                 |     +---------+    |
          |    |  
     typically reflects the contents of the UDVM   |     |                 |     | Encoder |    |
          |    +---------+     |                 |     +---------+    |
          +--------------------+                 +--------------------+ memory after 
     decompressing a message, but state can also be saved by the 
     compressor or by the application. 
    
   State identifier 
    
     Reference used to access an item of state previously saved by the  
     compressor, the decompressor or the application. 
    
   CPU cycles 
    
     A measure of the amount of "CPU power" required to execute a UDVM  
     instruction (the simplest UDVM instructions require a single CPU  
     cycle). An upper limit is placed on the number of cycles that can  
     be used to decompress each bit in a compressed message. 
    
    
3.  SigComp Architecture 
 

   In the SigComp architecture compression and decompression is 
   performed at two communicating endpoints. Figure 1 shows the layout 
   of a communicating endpoint that implements a SigComp layer. The 
   figure does not mandate any particular implementation, but is shown 
   to the reader for the sake of clarity. 
    
   SigComp is typically offered to applications as a "shim" layer 
   between the application and the transport. Note however that for 
   certain applications the compressed SigComp message may be passed 
   back to the application itself for additional processing before 
   transmission. For example, the application may wish to apply 
   encryption to the compressed message before handing it to the 
   transport. 
    
   The SigComp layer is common for several text based protocol 
   applications (identified as Application 1 and Application 2 in Figure 
   1). These applications are not part of the SigComp layer. 

 
 
 
Price, Hannu, et al.                                            [Page 5] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The SigComp layer is further decomposed in the following components: 
    
   - A compressor dispatcher: this is the interface from the  
     applications. Application messages are received by the compressor  
     dispatcher, and based on the application requirements, the  
     compressor dispatcher invokes a particular compressor to achieve  
     the desired compression ratio using the allocated processing and 
     memory resources. The compressor returns a SigComp message that is  
     forwarded to the remote SigComp peer. 
    
   - A decompressor dispatcher: this is the interface towards the  
     applications. A SigComp message is received by the decompressor  
     dispatcher and an instance of the UDVM is invoked. Once the  
     dispatcher has received the (decompressed) application data it  
     determines the target application and forward the message to it. 
    
   - One or more compressors: the compressors each contain an algorithm  
     to perform the compression. A compressor receives an (uncompressed)  
     application message from the compressor dispatcher, compresses the  
     message, and returns a SigComp message to the compressor  
     dispatcher. During the compression process, the compressor may  
     invoke the state handler to restore a previous state or save a new  
     one. The compressor is responsible for providing the remote  
     decompressor with suitable UDVM bytecode to reconstruct the  
     original application message. Within the compressor, the entity  
     which runs the actual compression algorithm (minus state management 
     issues) is known as the "encoder". 
    
   - One or more decompressors: the decompressors contain the needed 
     UDVM to perform the decompression. The decompressor receives a   
     SigComp message from the decompressor dispatcher, decompress the  
     message, and returns the (decompressed) application message to the   
     decompressor dispatcher. During the decompression process, the  
     decompressor may invoke the state handler to restore a previous  
     state or save a new one. 
    
   - State handler: this entity contains enough logic to store and  
     retrieve states. A state is data that is stored between SigComp  
     messages: this data can be saved either by a compressor, a  
     decompressor or an application. The saved state may be used for  
     (de)compression between a compressor and its peer decompressor.  
     The state handler is also responsible for asking the application to  
     grant permission for states to be saved by the state handler. State  
     parameters and retrieval of states are further described in Chapter  
     6. 
    
    





 
 
 
Price, Hannu, et al.                                            [Page 6] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
               +-----------------+        +-----------------+ 
               |                 |        |                 | 
               |  Application 1  |<-+  +->|  Application 2  | 
               |                 |  |  |  |                 | 
               |                 |  |  |  |                 | 
               +-----------------+  |  |  +-----------------+ 
                      |   ^         |  |          | ^ 
                      |   |  Application msgs.    | | 
                      | +-------------------------+ | 
       +-- -- -- -- --| | | - -- -- -- -- -- -- --  |-- -- -- -- --+ 
       |              | | +-----------------------+ |              | 
                      v v           |  |          | | 
       |    +--------------+        |  |       +--------------+    | 
    SigComp |              |        |  |       |              | SigComp 
    message |  Compressor  |        |  |       | Decompressor | message 
    <-------|  dispatcher  |        |  |       |  dispatcher  |<-------    
       |    |              |        |  |       |              |    | 
            +--------------+        |  |       +--------------+ 
       |           ^  ^             |  |            ^  ^           | 
                   |  |             |  |            |  | 
       |           |  |             |  |            |  |           | 
                   |  |             |  |            |  | 
       |           v  |             |  |            v  |           | 
            +--------------+        v  |       +--------------+ 
       |    | Compressor 1 |    +---------+    |Decompressor 1|    | 
            |              |<-->|  State  |<-->|              | 
       |    |  (Encoder)   |    | Handler |    |    (UDVM)    |    | 
            |              |    +---------+    |              | 
       |    +--------------+           |       +--------------+    | 
                      |                |               | 
       |              v                |               v           | 
            +--------------+           v       +--------------+ 
       |    | Compressor 2 |    +---------+    |Decompressor 2|    | 
            |              |<-->|  State  |<-->|              | 
       |    |  (Encoder)   |    | Handler |    |    (UDVM)    |    | 
            |              |    +---------+    |              | 
       |    +--------------+                   +--------------+    | 
    
       |                       SigComp layer                       | 
       +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --+ 
    
     Figure 1: High-level architectural overview of one SigComp peer. 
    
   Note that it is possible to decompress messages from multiple 
   compressors at different physical locations in a network. The 
   architecture is designed to prevent data from one compressor 
   interfering with data from a different compressor. A consequence of 
   this design choice is that it is difficult for a malicious user to 
   disrupt decompressor operation by inserting false compressed messages 
   on the transport. 


 
 
 
Price, Hannu, et al.                                            [Page 7] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The decompressors in Figure 1 should be viewed as containers for 
   UDVMs; the actual decompressor functionality is handled by invoking 
   an instance of the UDVM. 
    
   Figure 2 gives a more detailed view of a UDVM, including all of the 
   interfaces between the UDVM and its environment. 
    
    
   +----------------+                                 +----------------+ 
   |                |     Request compressed data     |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |     Provide compressed data     |                | 
   |                |                                 |   Dispatcher   | 
   |                |                                 |                | 
   |                |    Output uncompressed data     |                | 
   |                |-------------------------------->|                | 
   |                |                                 |                | 
   |                |                                 +----------------+ 
   |      UDVM      | 
   |                |                                 +----------------+ 
   |                |    Request state information    |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |    Provide state information    |                | 
   |                |                                 |     State      | 
   |                |                                 |    Handler     | 
   |                |   Make state creation request   |                | 
   |                |-------------------------------->|                |
   |                | Forward capability announcement |                | 
   |                |                                 |                | 
   +----------------+                                 +----------------+ 
    
         Figure 2: Interfaces between the UDVM and its environment 
    
   Note that for simplicity, the UDVM indicates when it requires 
   additional compressed data or state information using an explicit 
   instruction. It then pauses and waits for the information to be 
   supplied before continuing with the next instruction. This prevents 
   the arrival of more data from interfering with the operation of the 
   UDVM (e.g. by accidentally overwriting UDVM memory that is currently 
   in use). 
    
3.1.  Requirements on application 
    
   From an application perspective the SigComp layer typically appears 
   as a new transport, with similar behavior to the original transport 
   used to carry uncompressed data (for example SigComp/UDP behaves 
   similarly to native UDP). 
    


 
 
 
Price, Hannu, et al.                                            [Page 8] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   If the application wishes to mix SigComp messages with other types of 
   data (e.g. uncompressed data) on the same transport then the 
   transport must distinguish between the two types of data. For UDP and 
   TCP this means that a new port will need to be reserved for 
   compressed data. For example [SIP] uses port 5060 for TCP and port 
   5061 for TLS/TCP, so it could similarly reserve another port for 
   SigComp/TCP. 
    
   In the interests of security, a new interface is required to the 
   signaling application in order to leverage the authentication 
   functions built into the application itself. For each decompressed 
   message that is accompanied by a state creation request, the state 
   handler needs to find out whether the application considers the 
   message to be legitimate. If the decompressed message is considered 
   to be invalid then the state handler cannot create the requested 
   state information. This interface is marked on the architecture of 
   Figure 1.  
    
3.2.  Application-defined parameters 
    
   When an application invokes SigComp, a number of parameters are 
   provided by the application to control the maximum size of compressed 
   messages, the UDVM memory size etc. The two instances of the 
   application that wish to communicate MUST initially agree on a common 
   set of values for these parameters. 
    
   Note that if a reverse channel is available then SigComp can perform 
   an internal "capability announcement" to indicate that additional 
   memory or CPU cycles are available. This means that it is generally 
   sufficient to set fixed values for each application-defined parameter 
   (there is no need to provide an external, application-specific 
   negotiation mechanism). 
    
   Each application-defined parameter is described below. Appendix B 
   discusses how each of the parameters affects SigComp operation in 
   greater detail, and recommends default values for the parameters. 
    
   UDVM_version 
    
     The UDVM_version parameter specifies the level of functionality 
     available at the UDVM. The basic version of the UDVM (Version 0) 
     is defined in this document. 
    
   minimum_compression_ratio 
    
     The minimum_compression_ratio parameter prevents the generation of 
     excessively large SigComp messages. For an n byte uncompressed  
     message, the corresponding SigComp message must be no larger than 
     (n / minimum_compression_ratio) rounded down to the nearest byte. 
     Note that this parameter can be less than 1, (in which case a 
     certain amount of message expansion is allowed) or 0 (in which case  

 
 
 
Price, Hannu, et al.                                            [Page 9] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
     no minimum_compression_ratio needs to be met). Any value other than  
     0 bans the creation of standalone SigComp messages (i.e. messages 
     that do not contain a compressed application message). 
    
   maximum_compressed_size 
    
     The maximum_compressed_size parameter limits the size of one  
     compressed message. SigComp rejects any message larger than the 
     specified value. 
    
   maximum_uncompressed_size 
    
     The maximum_uncompressed_size parameter limits the size of one  
     uncompressed message. SigComp rejects any message larger than the 
     specified value. 
    
   minimum_hash_size 
    
     The minimum_hash_size parameter specifies the minimum size of the 
     state identifier when creating new state information. This value  
     needs to be sufficiently large to prevent malicious users from  
     guessing a state identifier by brute force. 
    
   overall_memory_size 
    
     The overall_memory_size parameter specifies the total number of 
     bytes in the UDVM memory. 
    
   working_memory_start 
    
     The working_memory_start parameter specifies the start of the UDVM  
     memory area that can be modified. Memory addresses below this 
     value are considered read-only by the UDVM. 
    
   working_memory_end 
    
     The working_memory_end parameter specifies the end of the UDVM  
     memory area that can be modified. Memory addresses above this 
     value are considered read-only by the UDVM. 
    
   cycles_per_bit 
    
     The cycles_per_bit parameter specifies the number of "CPU cycles"  
     that can be used to decompress a single bit of data. One CPU cycle  
     typically corresponds to a single UDVM instruction, although some  
     of the high-level instructions may require additional cycles. 
    
   cycles_per_message 
    
     The cycles_per_message parameter specifies the number of additional  
     CPU cycles made available at the start of a compressed message.  

 
 
 
Price, Hannu, et al.                                           [Page 10] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
     These cycles can be useful when decompressing algorithms that  
     upload additional data on a per-message basis, for example a new  
     set of Huffman codes as with [DEFLATE]. 
    
     The total number of "CPU cycles" available for each compressed  
     message is specified by the following formula: 
    
     total_cycles = message_size * cycles_per_bit + cycles_per_message 
    
   first_instruction 
    
     The first_instruction parameter specifies the memory address of the 
     first instruction to be executed when the UDVM is initialized. 
    
   Initial memory contents 
    
     When the UDVM is invoked its memory is reset to contents defined by  
     the application. This code is executed for every SigComp message 
     (so typically it performs a simple task such as extracting the  
     first n bytes from the message and interpreting them as a state  
     identifier). 
    
   Initial state 
    
     As well as deciding the initial contents of the UDVM memory, the  
     application can also store useful information in the form of state.  
     This predefined state is used to offer a range of well-known 
     decompression algorithms to the compressor, which can choose to 
     avoid uploading bytecode for a new algorithm if it supports one of 
     the well-known algorithms. Each item of initial state can be made 
     mandatory for every instance of the application, or it can be made 
     optional (in which case support for the relevant state will need to  
     be advertised before the state can be used). 
    
4.  SigComp message flow 
    
   This chapter describes the SigComp message flow, including the 
   initialization, capability announcement and exchange of compressed 
   messages. 
    
   In the architecture of Figure 1, this chapter describes the operation 
   of the compressor and decompressor dispatcher. 
    
4.1.  Message exchange 
    
   The local SigComp layer may send compressed data to a remote SigComp 
   layer, and the local SigComp layer may also receive compressed data. 
   However, compression in one direction does not necessarily imply 
   compression in the reverse direction. Furthermore, even in the case 
   that there are two unidirectional compressed flows between two 


 
 
 
Price, Hannu, et al.                                           [Page 11] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   SigComp layer, there is no need to use the same compression algorithm 
   at both compressors. 
    
4.1.1.  Operation for each compressor-decompressor pair 
 
   An endpoint that wants to send compressed data to a remote party must 
   initialize a SigComp layer at the local party prior to its use, so 
   that the decompressor dispatcher in the remote endpoint assigns a 
   decompressor to be used and the UDVM is loaded with the compression 
   algorithm. The process is described in Figure 3. 
    
    
             +--------------+                  +--------------+ 
             |              |                  |              | 
             |  Endpoint A  |                  |  Endpoint B  | 
             |              |                  |              | 
             +--------------+                  +--------------+ 
                    |                                 | 
                    |                                 | 
                    |        SigComp Discovery        | 
                    |<------------------------------->| 
                    |                                 | 
                    |        SigComp Request          | 
                    |-------------------------------->| 
                    |                                 | 
                    |                                 | 
                    |     Capabilities Announcement   | 
                    |<--------------------------------| 
                    |                                 | 
                    |                                 | 
                    |           UDVM Upload           | 
                    |-------------------------------->| 
                    |                                 | 
                    |        Compressed Messages      | 
                    |- - - -  - -  - - - - - - - - - >| 
    
             Figure 3: Compressor-decompressor pair operation 
    
   The SigComp discovery mechanism itself is outside the scope of this 
   specification. 
    
   The following three paragraphs specify a message flow for discovering 
   the capabilities of Endpoint B and (if necessary) uploading a new 
   decompression algorithm to this endpoint. Note that if an 
   application-defined default algorithm is available at all endpoints 
   then Endpoint A can immediately begin to compress messages and the 
   following stages may be skipped. 
    
   Endpoint A may send a SigComp Request message to Endpoint B. The 
   SigComp Request message is a request to initialize a SigComp layer 
   for the application at Endpoint B and to know B's capabilities, i.e. 

 
 
 
Price, Hannu, et al.                                           [Page 12] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   the parameters in Section 3.2. If Endpoint A uses a UDVM 
   decompression algorithm which only requires the default application-
   defined parameters, then this step may be omitted. 
    
   Endpoint B SHOULD answer the SigComp Request message with a 
   Capabilities Announcement message, which includes the SigComp 
   parameters that constrain the operation of the UDVM. 
    
   Once Endpoint A has received the Capabilities Announcement message, 
   it chooses a suitable compression algorithm that B is able to 
   decompress and sends a message containing the UDVM decompression 
   algorithm (unless Endpoint B already has the algorithm available). 
                          
   At this point, Endpoint B contains enough information to start 
   decompressing messages received from the application at Endpoint A. 
    
4.1.2.  Bi-directional initialization 
    
   In scenarios where both endpoints decide to compress data in each of 
   the directions, a double initialization process must be done prior to 
   start with the normal operation. 
    
   The double initialization process is comprised of two of the above 
   initialization processes, one in each direction, as described in 
   Section 4.1.1. 
    
4.2.  SigComp message format 
    
   The basic SigComp message consists of a block of UDVM bytecode, the 
   first n bytes of which are interpreted as a state identifier that 
   accesses some previously stored state information. 
    
   This state information comprises the decompression algorithm that 
   will be used to decompress the remainder of the SigComp message, as 
   well as any needed additional information (e.g. one or more 
   previously received messages if dynamic compression is in use). 
    
   A decompressor dispatcher MUST be able to separate two SigComp 
   messages; in the case of UDP a SigComp message corresponds exactly to 
   one UDP datagram. For TCP each 0xFFFF delimiter is followed by a new 
   SigComp message. 
    
   The format of the basic SigComp message is given in Figure 4: 
    
   [Editors' Note: Specific SigComp messages such as the SigComp 
   Request, the Capabilities Announcement and the UDVM Upload may need 
   to be defined. A state identifier could be reserved for each specific 
   type of message, just as state identifiers are reserved for each 
   well-known algorithm.]  
    
    

 
 
 
Price, Hannu, et al.                                           [Page 13] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   state_identifier (n-bytes)  :  
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :    Remaining UDVM bytecode    : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
    Figure 4: Basic SigComp message 
    
   Note that n is the application-defined parameter minimum_hash_size, 
   an example value for which is given in Appendix B. 
    
   Note also that the state information is loaded into the UDVM memory 
   and executed as defined by the following piece of UDVM bytecode: 
    
   reserve state_identifier (n) 
   INPUT-BYTECODE (n, state_identifier, fail) 
   STATE-EXECUTE (state_identifier, n) 
   :fail 
   DECOMPRESSION-FAILURE 
    
   If the UDVM memory is initialized containing the above bytecode then 
   the state identifier will automatically be extracted from the SigComp 
   message and the corresponding state information will be accessed and 
   executed. 
    
4.3.  Interfaces to and from the dispatcher 
    
   Once the remote party has initialized the SigComp layer at the local 
   party, the decompressor dispatcher is ready to receive compressed 
   messages from a particular remote party, decompress those messages, 
   and pass them onto the application. 
    
   The application provides the compressor dispatcher with messages to 
   be compressed. The encoder in the compressor compresses messages in 
   such a way that the remote decompressor with the UDVM can decompress 
   the data correctly (providing that the compressed data is not lost or 
   damaged during transport). 
    
   When a message is to be compressed, the compressor selects the state 
   to use. The identifier of the used state MUST be sent along with the 
   compressed message to the remote decompressor. The SigComp message is 
   then passed on to underlying layers for transport to the remote 
   decompressor. 
    
   [Editor's Note: State identifiers will need to be reserved for well-
   known decompression algorithms, and an additional state identifier 

 
 
 
Price, Hannu, et al.                                           [Page 14] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   will be needed to indicate that the algorithm is being uploaded as 
   part of the compressed message.] 
    
   Upon the arrival of a SigComp message the decompressor dispatcher 
   invokes the decompressor that loads the UDVM with the indicated 
   state. The message is then decompressed by the UDVM, returned to the 
   decompressor dispatcher, and possibly passed on to the receiving 
   application. 
    
   Note that when the UDVM is invoked it does not receive any compressed 
   data by default, but instead requests new data explicitly using a 
   specific instruction. Therefore, the dispatcher is responsible for 
   buffering each SigComp message and passing the data to the UDVM when 
   it is requested. 
    
   Uncompressed data is also outputted by the UDVM using a specific 
   instruction. Depending on the particular application, the dispatcher 
   decides whether to forward a partially decompressed message 
   immediately to the application, or to buffer and wait for a complete 
   message to be successfully decompressed. 
    
   For a stream-based transport, the dispatcher delimits messages by 
   parsing the compressed data stream for instances of 0xFF and taking 
   the following actions: 
    
   Occurs in data stream:     Action: 
    
   0xFFFF                     Delimit compressed message 
   0xFF00                     Replace with 0xFF 
   0xFF01 - 0xFFFE            Decompression failure 
    
   The reserved character 0xFF00 is useful for byte stuffing (if a 
   compression algorithm generates compressed data containing the 
   character 0xFF then it should be replaced by the character 0xFF00 to 
   avoid accidentally inserting a message delimiter into the compressed 
   data stream). 
    
5.  SigComp compressor  
 
   An important feature of SigComp is that if two endpoints cannot agree 
   on a common algorithm with which to send and receive data, it is 
   possible for the compressor to upload bytecode for its own choice of 
   algorithm to the decompressor. In particular this means that it is 
   not necessary to force all compressors to use the same default 
   algorithm; instead each implementer has the freedom to pick one of 
   the predefined algorithms or to upload their own if needed. 
    
   The overall requirement placed on the compressor is that of 
   transparency, i.e. the compressor MUST NOT send bytecode which cause 
   the UDVM to incorrectly decompress a given message. 
    

 
 
 
Price, Hannu, et al.                                           [Page 15] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The following more specific requirements are also placed on the 
   compressor (they can be considered particular instances of the 
   transparency requirement): 
    
   *    It is RECOMMENDED that the compressor supply a CRC over the 
        uncompressed message to ensure that successful decompression has 
        occurred. A UDVM instruction is provided to verify this CRC. 
    
   *    If the transport is message-based then the compressor MUST 
        preserve the boundaries between messages. 
    
   *    If the transport is stream-based but the application defines its 
        own internal message boundaries, then the compressor SHOULD 
        preserve the boundaries between messages by using the "end-of-
        message" character 0xFFFF reserved by SigComp. 
    
   *    The compressor MUST achieve the minimum_compression_ratio and  
        MUST ensure that the message can be decompressed using no more 
        than the resources available at the remote decompressor. 
    
   The reason for preserving the message boundaries over a stream-based 
   transport is that damage to one compressed message does not affect 
   the decompression of subsequent messages. Moreover, the application 
   typically vetoes state creation requests on a per-message basis. 
    
   Note that SigComp also reserves the character 0xFF00 over a stream-
   based transport, and replaces every instance of 0xFF00 with 0xFF 
   before decompressing the data. This ensures that arbitrary 
   compression algorithms can be used over a stream-based transport, 
   provided that every instance of 0xFF in the compressed data stream is 
   identified and replaced with 0xFF00. This "byte-stuffing" scheme 
   prevents the compression algorithm from inserting a message delimiter 
   into the data stream where one is not required. 
    
5.1.  Types of compression algorithm 
    
   Any of the following classes of compression algorithm may be useful 
   for particular applications: 
    
   *    Generic compressor (for example [DEFLATE] or a similar 
        algorithm). 
    
   *    Protocol-aware compressor offering excellent performance for  
        one particular type of data (for example the text messages  
        generated by [SIP]). 
    
   *    Hybrid compressor with similar performance to [DEFLATE] for 
        generic data and superior performance for certain types of data. 
    



 
 
 
Price, Hannu, et al.                                           [Page 16] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Provided that the uncompressed data can be reconstructed at the UDVM 
   using the available memory and CPU cycles, implementers have freedom 
   to use a compression algorithm of their choice. 
    
5.2.  Supplying bytecode to the UDVM 
    
   A compressor MUST be certain that compressed data can be decompressed 
   before the data is to be sent, i.e. the UDVM instructions for 
   decompression MUST be available at the peer decompressor. Several 
   options exist for ensuring that this bytecode is available: 
    
   1. Each SigComp message sent from the compressor contains the  
      necessary UDVM instructions for decompression. 
    
   2. By setting up a reliable connection, such as a TCP connection,  
      between a compressor and its peer decompressor the UDVM  
      instructions can be transferred and saved as state. 
    
   3. If there are predefined UDVM codes for well-known algorithms, a  
      compressor only needs to send the state identifier of that UDVM  
      decompression algorithm code to its peer decompressor. The  
      decompressor can then populate the UDVM locally.  
    
   In order to save delay for "time-critical" sessions, the UDVM 
   instructions should be uploaded prior to any initiation of "time-
   critical" sessions. 
    
5.3.  Compression failure 
    
   The compressor SHOULD make every effort to successfully compress an 
   application message, but in certain cases this might not be possible 
   (particularly if a high minimum_compression_ratio has been set by the 
   application). In this case a "compression failure" is called.  
    
   Reasons for compression failure include the following: 
    
   *    A compressed or uncompressed message exceeds the maximum size 
        defined by the application. 
    
   *    The minimum_compression_ratio cannot be achieved for a certain 
        message. 
    
   *    Insufficient resources are available at the compressor or at the 
        remote decompressor. 
    
   If a compression failure occurs when compressing a message then the 
   compressor informs the dispatcher and takes no further action. The 
   dispatcher can then report this failure to the application. 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 17] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
6.  State handling and capability announcement 
    
   This chapter defines the behavior of the SigComp state handler. The 
   function of the state handler is to retain information between 
   successive SigComp messages; it is the only SigComp entity that is 
   capable of this function, and so it is of particular importance from 
   a security perspective. 
 
6.1.  Storing and retrieving state 
    
   To provide security against the malicious insertion of false 
   compressed data, the UDVM memory is reset after each compressed 
   message. This ensures that damaged compressed messages do not prevent 
   the successful decompression of subsequent valid messages. 
   Note however that the overall compression ratio is often 
   significantly higher if messages can be compressed relative to the 
   information stored in previous messages. For this reason it is 
   possible to create "state" information for access when a later 
   message is being decompressed. 
    
   Both the creation and access of state are designed to be secure 
   against malicious tampering with the compressed data. State can only 
   be created when a complete message has been successfully 
   decompressed, and the state handler MUST veto a state creation 
   request if instructed by the application based on the contents of the 
   decompressed message. This is especially useful if the application 
   has an authentication mechanism that can be applied to determine 
   whether the decompressed data is legitimate. 
    
   Furthermore, a compressor can only access previously created state 
   information by providing an [MD5] hash of the state to be accessed. 
   The advantage of using a secure hash to access state information is 
   that it is very difficult to guess the correct hash value without 
   complete knowledge of the state being accessed. 
    
   Also note that state is not deleted when it is accessed. So even if a 
   malicious user manages to access state information, subsequent 
   messages compressed relative to this state can still be successfully 
   decompressed. Instead, the state handler is responsible for deleting 
   state information once it determines that the state will no longer be 
   needed. 
    
   Each item of state stores the following information: 
    
   Name:                      Type of data: 
    
   state_identifier           16-byte value 
   state start                2-byte value 
   state_instruction          2-byte value 
   state length               2-byte value 
   state_value                String of bytes 

 
 
 
Price, Hannu, et al.                                           [Page 18] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The state_identifier must be supplied to retrieve an item of state 
   from the state handler. State can be accessed using the UDVM 
   instructions STATE-REFERENCE and STATE-EXECUTE, and can be created 
   using the END-MESSAGE instruction. 
    
   The state_value is a byte string that contains the actual value that 
   is copied from/to the UDVM memory. The state_length specifies the 
   number of bytes contained within state_value, and state_start gives 
   the UDVM memory address from/to which the state_value is copied. 
    
   Finally, state_instruction specifies the memory address of the next 
   UDVM instruction to execute when state is accessed. 
    
   The kind of information which is included in the state_value is up to 
   a particular compressor and the uploaded instructions in the remote 
   UDVM. However a compressor MUST not use a state that is not known to 
   be established at the remote decompressor. 
    
6.2. Guidelines for saving and deleting states 
 
   [Editors' Note: Do we need something more?] 
    
   A decompressor SHOULD NOT delete a state before it is confident 
   enough that the state is not used by a peer compressor any more.  
      
6.3.  Capability announcement 
    
   The capability announcement information is used to modify the value 
   of certain application-defined parameters. Since these parameter 
   values are saved between SigComp messages, they are considered to be 
   part of the overall state and hence are supplied from the UDVM to the 
   state handler.  
    
   If the state handler rejects a state creation request then the 
   accompanying capability announcements MUST be rejected also. 
    
   If the unidirectional version of SigComp is running then the 
   capability announcement information is automatically discarded by the 
   state handler. 
    
   The following block of parameters is passed to the state handler 
   using the appropriate UDVM instruction (currently the END-MESSAGE 
   instruction): 
    
   [Editors' Note: The capability announcement block is yet to be 
   finalized. More items may be added in future.] 
    
    
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 19] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |       Total length        |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |       UDVM_version        |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |    overall_memory_size    |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |      cycles_per_bit       |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |    cycles_per_message     |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      | Requested feedback length |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |                           | 
      :    Requested feedback     : 
      |                           |             
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      | Returned feedback length  |  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
      |                           | 
      :     Returned feedback     : 
      |                           |             
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   Figure 5: Capability announcement block 
    
   Note that the three 2-byte length fields specify the lengths of the 
   entire capability announcement block, the requested feedback data and 
   the returned feedback data respectively. As usual, MSBs are stored 
   preceding LSBs. 
    
   The remaining items of data are explained in greater detail below: 
    
6.3.1.  UDVM version 
    
   The first 2 bytes of the capability announcement block specify 
   whether only the basic version of the UDVM is available, or whether 
   an upgraded version of the UDVM is available offering additional 
   instructions etc. 
    
   The basic version of the UDVM is Version 0, which is the version 
   described in this document. Upgraded versions MUST be backwards-
   compatible with the basic version in the following sense: 
    
   *    If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION-
        FAILURE instructions when running on Version 0 of the UDVM, then 
        the upgraded version MUST run the bytecode in an identical 
        manner. 
    
   This condition ensures that all bytecode that is valid for Version 0 
   of the UDVM will continue to be valid for upgraded versions of the 

 
 
 
Price, Hannu, et al.                                           [Page 20] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   UDVM. However, bytecode that is invalid on Version 0 of the UDVM 
   (i.e. bytecode that produces a decompression failure that is not 
   manually triggered) may become valid on upgraded versions. 
    
   The simplest way to upgrade the UDVM in a backwards-compatible manner 
   is to add additional UDVM instructions, as this will not affect the 
   operation of existing UDVM bytecode. 
    
6.3.2.  Memory size and CPU cycles 
    
   The next 6 bytes of data specify new values for the application-
   defined parameters overall_memory_size, cycles_per_bit and 
   cycles_per_message. 
    
   Note that this data can only be used to increase the amount of 
   resources available at the remote UDVM. If the data specifies a 
   parameter value that is smaller than the value already possessed by 
   the state handler, the parameter keeps its original value (i.e. the 
   capability announcement data for this parameter is simply ignored). 
    
   In particular, only allowing the parameter values to increase means 
   that the announcement mechanism is robust against message loss or 
   reordering. 
    
   The parameters can only be restored to their original values if reset 
   or renegotiated by the application. 
    
6.3.3.  Requested feedback 
    
   The requested feedback data is provided to the UDVM by the remote 
   compressor. By providing this data, the remote compressor is 
   requesting that the data be returned to the compressor via a reverse 
   channel (assuming that one is present). 
    
   The compressor in control of the reverse channel SHOULD return this 
   data by uploading it into the returned feedback data block at the 
   remote UDVM. The data will then be passed back to the remote 
   compressor as explained below. 
    
6.3.4.  Returned feedback 
    
   The returned feedback data is an item of feedback data that has 
   successfully returned from the remote entity. This data is passed to 
   the local compressor (assuming that permission is granted by the 
   application), which can make use of it as it wishes. 
    
   Note that a compressor MUST only populate the returned feedback data 
   with the bit-exact contents of a requested feedback data block 
   previously provided to it. 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 21] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
7.  Overview of the UDVM 
    
   Decompression functionality for SigComp is provided by a "Universal 
   Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine 
   much like the Java Virtual Machine but with a key difference: it is 
   designed solely for the purpose of running decompression algorithms. 
    
   The motivation for creating the UDVM is to provide unlimited 
   flexibility when choosing how to compress a given item of data. 
   Rather than picking one of a small number of pre-negotiated 
   compression algorithms, the implementer has the freedom to select an 
   algorithm of their choice. The compressed data is then combined with 
   a set of UDVM instructions that allow the original data to be 
   extracted, and the result is outputted as UDVM bytecode. 
    
   Since the UDVM is optimized specifically for running decompression 
   algorithms, the code size of a typical algorithm is small (often sub 
   100 bytes). Moreover the UDVM approach does not add significant extra 
   processing or memory requirements compared to running a fixed pre-
   programmed decompression algorithm. 
    
   This chapter describes some basic features of the UDVM, including the 
   memory allocation, well-known variables and instruction parameters. 
    
7.1.  UDVM memory allocation 
    
   The memory available to the UDVM is partitioned into a number of 
   sections, providing space for program code, variables and 
   miscellaneous data: 
    
                  <----- working_memory_size ------> 
    
   | Fixed values | Variables | Miscellaneous data | Program code | 
   +--------------+-----------+--------------------+--------------+ 
    
   <--------------------- overall_memory_size --------------------> 
    
                  Figure 6: Memory allocation in the UDVM 
    
   Recall that the amount of memory available to the UDVM is specified 
   by the application-defined parameters overall_memory_size, 
   working_memory_start and working_memory_end. Note that all of these
   parameters are initialized by the application, but can be 
   renegotiated on the fly using the capabilities announcement 
   mechanism. 
    
   The memory area from Address (working_memory_start) to Address 
   (working_memory_end) inclusive can be used to store arbitrary data 
   (variables, program code, Huffman codes etc.). UDVM instructions are 
   allowed to read from or write to any address in this memory area. 
    

 
 
 
Price, Hannu, et al.                                           [Page 22] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The first part of this memory area is typically used to store a 
   number of 2-byte variables. UDVM instructions can reference these 
   variables using a special instruction parameter as described in 
   Section 7.3. 
    
   The memory area from Address 0 to Address (working_memory_start - 1) 
   and from Address (working_memory_end + 1) to Address 
   (overall_memory_size - 1) inclusive is write-protected, so UDVM 
   instructions can read from this memory area but cannot write to it. 
   This memory area is intended for storing UDVM bytecode that can be 
   compiled. 
    
   Any attempt to read memory addresses beyond the overall memory size 
   or to write to memory addresses outside the working memory area MUST 
   cause a decompression failure (see Section 8.3). 
    
   The first part of the write-protected UDVM memory is intended for 
   storing variables whose values no longer need to be modified. The 
   second part of the write-protected memory is intended for storing 
   program code including UDVM instructions and their associated 
   parameters. Note that if an instruction references a variable that 
   has been write-protected, the compiled version of the instruction 
   will typically run faster than if the referenced variable lies in the 
   working memory area. 
    
7.2.  Well-known variables 
    
   The first few variables in the UDVM memory have special tasks, for 
   example specifying the location of the stack used by the CALL and 
   RETURN instructions. Each of these well-known variables is a 2-byte 
   integer. 
    
   The following list gives the name of each well-known variable and the 
   memory address at which the variable can be found: 
    
   Name:           Starting memory address: 
    
   byte_copy_left             0 
   byte_copy_right            2 
   stack_location             4 
    
   The MSBs of each variable are always stored before the LSBs. So, for 
   example, the MSBs of stack_location are stored at Address 4 whilst 
   the LSBs are stored at Address 5. 
    
   The use of each well-known variable is described in the following 
   sections of the document. 
    
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 23] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
7.3.  Instruction parameters 
    
   Each of the UDVM instructions is followed by 0 or more bytes 
   containing the parameters required by the instruction. 
    
   To reduce the code size of a typical UDVM program, each parameter for 
   a UDVM instruction is compressed using variable-length encoding. The 
   aim is to store more common parameter values using fewer bits than 
   rarely occurring values. 
    
   Three different types of parameter are available: the literal, the 
   reference and the multitype. The parameter types that follow each 
   UDVM instruction are specified in Chapter 9. 
    
   The UDVM bytecode for each parameter type is illustrated in Figure 7 
   to Figure 9, together with the integer values represented by the 
   bytecode. 
    
   Note that the MSBs in the bytecode are illustrated as preceding the 
   LSBs. Also, any string of bits marked with k consecutive "n"s is to 
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the 
   MSBs of n illustrated as preceding the LSBs). 
    
   The decoded integer value of the bytecode can be interpreted in two 
   ways. In some cases it is taken to be the actual value of the 
   parameter. In other cases it is taken to be a memory address at which 
   the 2-byte parameter value can be found (MSBs found at the specified 
   address, LSBs found at the following address). The latter case is 
   denoted by memory[X] where X is the address and memory[X] is the 2-
   byte value starting at Address X. 
    
   The simplest parameter type is the literal (#), which encodes a 
   constant integer from 0 to 65535 inclusive. A literal parameter may 
   require between 1 and 3 bytes depending on its value. 
    
   Bytecode:                  Parameter value:         Range: 
    
   0nnnnnnn                        N                   0 - 127 
   10nnnnnn nnnnnnnn               N                   0 - 16383 
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
    
              Figure 7: Bytecode for a literal (#) parameter 
    
   The second parameter type is the reference ($), which is always used 
   to access a 2-byte value located elsewhere in the UDVM memory. The 
   bytecode for a reference parameter is decoded to be a constant 
   integer from 0 to 65535 inclusive, which is interpreted as the memory 
   address containing the actual value of the parameter. 
    
   Note that reference parameters can always take values from 0 to 65535 
   inclusive, as they reference 2-byte values. 

 
 
 
Price, Hannu, et al.                                           [Page 24] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Bytecode:                  Parameter value:         Range: 
    
   0nnnnnnn                        memory[2 * N]       0 - 65535 
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535 
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
             Figure 8: Bytecode for a reference ($) parameter 
    
   The third kind of parameter is the multitype (%), which can be used 
   to encode both actual values and memory addresses. The multitype 
   parameter also offers efficient encoding for small integer values 
   (both positive and negative) and for powers of 2. 
    
   Bytecode:                  Parameter value:         Range: 
    
   00nnnnnn                        N                   0 - 63 
   01nnnnnn                        memory[2 * N]       0 - 65535 
   1000011n                        2 ^ (N + 6)        64 , 128 
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768 
   111nnnnn                        N + 65504       65504 - 65535 
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535 
   101nnnnn nnnnnnnn               N                   0 - 8191 
   110nnnnn nnnnnnnn               memory[N]           0 - 65535 
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
             Figure 9: Bytecode for a multitype (%) parameter 
    
7.4.  Byte copying 
    
   A number of UDVM instructions require a string of bytes to be copied 
   to and from areas of the UDVM memory. This section defines how the 
   byte copying operation should be performed. 
    
   In general, the string of bytes is copied in ascending order of 
   memory address. So if a byte is copied from/to Address n then the 
   next byte is copied from/to Address n + 1. As usual, if a byte is 
   read from an address beyond the overall memory size or is written to 
   an address outside the working memory area then decompression failure 
   occurs. 
    
   Note however that if a byte is copied from/to the memory address 
   specified in byte_copy_right, the byte copy operation continues by 
   copying the next byte from/to the memory address specified in 
   byte_copy_left. This is useful for setting up a "circular buffer" 
   within the UDVM memory. 

   Note that the string of bytes is copied on a purely byte-by-byte 
   basis. In particular, some of the later bytes to be copied may 
   themselves have been written into the UDVM memory by the byte copying 
   operation currently being performed. 

 
 
 
Price, Hannu, et al.                                           [Page 25] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
    
   Equally, it is possible for a byte copying operation to overwrite the 
   instruction that called the byte copy. If this occurs then the byte 
   copying operation MUST be completed as if the original instruction 
   were still in place in the UDVM memory (this also applies if 
   byte_copy_left or byte_copy_right are overwritten). 
    
8.  Decompressing a SigComp message 
    
   This chapter lists the steps involved in the decompression of a 
   single SigComp message. 
    
8.1.  Invoking the UDVM 
    
   Whenever the dispatcher receives a message to be decompressed, it 
   invokes a new instance of the UDVM. The overall_memory_size and 
   initial contents of the UDVM memory are initialized using the 
   corresponding application-defined parameters. The following steps are 
   then taken: 
    
   1.)   The number of remaining CPU cycles is set equal to the 
   application-defined parameter cycles_per_message. 
    
   Notes: 
    
   The amount of compressed data available to the UDVM is exactly one 
   compressed message. If the transport is stream-based then SigComp 
   uses the reserved byte string 0xFFFF to delimit the compressed 
   messages: the dispatcher takes the data between a pair of neighboring 
   reserved byte strings to be a single compressed message. The reserved 
   byte string itself is not considered to be part of the compressed 
   message. 
    
   The compressed data is not provided to the UDVM by default. Instead, 
   the UDVM requests compressed data using the INPUT instructions 
   (useful when running over a stream-based transport since there is no 
   need to wait for the entire compressed message before decompression 
   can begin). Note that in particular, this means that the application 
   MUST define the initial contents of the UDVM memory to contain at 
   least one INPUT instruction. See Section 4.2 for an example of how 
   the application might initialize the UDVM memory. 
    
   The dispatcher MUST NOT make more than one compressed message 
   available to a given instance of the UDVM. In particular, the 
   dispatcher MUST NOT concatenate two messages to form a single 
   compressed message. This is because compressed messages are typically 
   padded with trailing zero bits so that they are a whole number of 
   bytes long. Concatenating two messages would cause these padding bits 
   to be incorrectly interpreted as compressed data. 
    


 
 
 
Price, Hannu, et al.                                           [Page 26] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   2.)   Next, the instructions contained within the UDVM memory are 
   executed beginning at the address specified in first_instruction. 
    
   Notes:    
    
   The instructions are executed consecutively unless otherwise 
   indicated (for example when the UDVM encounters a JUMP instruction). 
    
   If the next instruction to be executed lies outside the available 
   memory then decompression failure occurs (see Section 8.3). 
    
   3.)   Each time an instruction is executed the number of available 
   CPU cycles is decreased by the amount specified in Chapter 9. 
   Additionally, if the UDVM requests n bits of compressed data (using 
   one of the INPUT instructions) then the number of available CPU 
   cycles is increased by n * cycles_per_bit. 
    
   Notes: 
    
   This means that the total number of CPU cycles available for 
   processing a compressed message is given by the formula: 
    
     total_cycles = cycles_per_message + message_size * cycles_per_bit 
    
   The reason that this total is not allocated to the UDVM when it is 
   invoked is that the UDVM can begin to decompress a message that has 
   only been partially received. So the total message size may not be 
   known when the UDVM is initialized. 
    
   4.)   The UDVM stops executing instructions when it encounters an 
   END-MESSAGE instruction or if decompression failure occurs. 
    
   Notes: 
    
   The UDVM passes uncompressed data to the dispatcher using the OUTPUT 
   instruction. The OUTPUT instruction can be used to output a partially 
   decompressed message; it is a dispatcher decision whether to use the 
   data immediately or whether to buffer and wait until the entire 
   message has been decompressed. 
    
   The UDVM passes state creation requests to the state handler using 
   the END-MESSAGE instruction. This means that it is only possible to 
   make a state creation request once the message has been decompressed, 
   which is necessary since the application typically determines the 
   validity of these requests based on the contents of the decompressed 
   message. 
    
8.2.  Successful decompression 
    
   The END-MESSAGE instruction indicates that the compressed message has 
   been successfully decompressed and passed to the dispatcher. Note 

 
 
 
Price, Hannu, et al.                                           [Page 27] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   that the actual uncompressed message is outputted beforehand using 
   the OUTPUT instruction; this allows the UDVM to output each part of 
   the message to the dispatcher as soon as it has been decompressed. 
    
   The END-MESSAGE instruction provides two additional pieces of 
   information to the state handler: the state creation request and the 
   capability announcement block. The state creation request mechanism 
   is discussed below: 
    
   The UDVM may optionally save part of its memory for retrieval by 
   later messages. However to prevent malicious storage of a large 
   amount of unnecessary state information, the application itself MUST 
   give permission before any state can be created. The state handler 
   typically makes a decision on whether state can be created based on 
   the contents of the decompressed message, particularly if the message 
   contains authentication data that can verify whether or not the 
   sender is legitimate. 
    
   The END-MESSAGE instruction requests the creation of state using the 
   parameters state start and state length, which together denote a byte 
   string state_value. Provided that the application gives permission, 
   state_value is byte copied from the UDVM memory (obeying the rules of 
   Section 7.4) and stored together with a 16-byte state identifier that 
   can be used to access the state by a later compressed message. 
    
   To provide security against malicious access, the identifier for any 
   item of state created by the UDVM is derived from the [MD5] hash of 
   the state_value to be stored. The state identifier is constructed by 
   taking the 16-byte [MD5] hash and replacing all but the first 
   hash_length most significant bytes with zeroes. Note that if 
   hash_length is 16 then the unmodified [MD5] hash is the state 
   identifier. Decompression failure occurs if hash_length is less than 
   the application-defined parameter minimum_hash_size or greater than 
   16. 
    
   If a state identifier already exists (hash collision occurs), the 
   decompressor should check whether the requested state is identical to 
   the established state, and count the state creation request as 
   successful if this is the case. 
    
   If not then the state creation request is unsuccessful. The existing 
   state MUST NOT be replaced with the requested state to be saved. This 
   is to avoid the situation where a compressed message cannot be 
   decompressed because a needed item of state has been replaced 
   (possibly by a malicious sender). 
    
   Each item of state stores the following information (accessed by the 
   state_identifier): 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 28] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Name:                      Type of data: 
    
   state_identifier           16-byte value 
   state start                2-byte value 
   state_instruction          2-byte value 
   state length               2-byte value 
   state_value                String of bytes  
    
   Note that state_start, state_length and state_instruction are all 
   parameters from the END-MESSAGE instruction, whereas state_identifier 
   and state_value are created as specified above. 
    
   This state can subsequently be accessed by using the STATE-REFERENCE 
   and STATE-EXECUTE instructions (by providing the correct state 
   identifier). 
    
8.3.  Decompression failure 
    
   If a compressed message given to the UDVM is corrupted (either 
   accidentally or maliciously) then the UDVM may terminate with a 
   decompression failure. 
    
   Reasons for decompression failure include the following: 
    
   *    A compressed or uncompressed message exceeds the maximum size 
        defined by the application. 
    
   *    The UDVM exceeds the available CPU cycles for decompressing a 
        message. 
    
   *    The UDVM attempts to read a memory address beyond the overall 
        memory size, or to write into a memory address outside the 
        working memory area. 
    
   *    An unknown instruction type is encountered. 
    
   *    An unknown parameter type is encountered. 
    
   *    An instruction is encountered that cannot be processed 
        successfully by the UDVM (for example a RETURN instruction when 
        no CALL instruction has previously been encountered). 
    
   *    The UDVM attempts to access non-existent state. 
    
   *    A manual decompression failure is triggered using the 
        DECOMPRESSION-FAILURE instruction. 
    
   If a decompression failure occurs when decompressing a message then 
   the UDVM informs the dispatcher and takes no further action. It is 
   the responsibility of the dispatcher to decide how to cope with the 


 
 
 
Price, Hannu, et al.                                           [Page 29] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   decompression failure. In general a dispatcher SHOULD discard the 
   compressed message and any decompressed data that has been outputted. 
    
9.  UDVM instruction set 
    
   The UDVM currently understands 28 instructions, chosen to support the 
   widest possible range of compression algorithms with the minimum 
   possible overhead. 
    
   Figure 10 lists the different instructions and the bytecode values 
   used to store the instructions at the UDVM. The cost of each 
   instruction in CPU cycles is also given: 
    
   Instruction:     Bytecode value:   Cost in CPU cycles: 
    
   DECOMPRESSION-FAILURE     0          1 
   AND                       1          1 
   OR                        2          1 
   NOT                       3          1 
   ADD                       4          1 
   SUBTRACT                  5          1 
   MULTIPLY                  6          1 
   DIVIDE                    7          1 
   LOAD                      8          1 
   MULTILOAD                 9          1 + n 
   WORKING-MEMORY            10         1 
   COPY                      11         1 + length 
   COPY-LITERAL              12         1 + length 
   COPY-OFFSET               13         1 + length + offset 
   JUMP                      14         1 
   COMPARE                   15         1 
   CALL                      16         1 
   RETURN                    17         1 
   SWITCH                    18         1 + n 
   CRC                       19         1 + length 
   END-MESSAGE               20         1 + state length 
   OUTPUT                    21         1 + output_length 
   NBO                       22         1 
   INPUT-BYTECODE            23         1 + length 
   INPUT-FIXED               24         1 
   INPUT-HUFFMAN             25         1 + n 
   STATE-REFERENCE           26         1 + state_length 
   STATE-EXECUTE             27         1 + state length 
    
      Figure 10: UDVM instructions and corresponding bytecode values 
    
   Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high-
   level instructions may cost additional cycles depending on the value 
   of one of the instruction parameters. 
    


 
 
 
Price, Hannu, et al.                                           [Page 30] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   The only exception when calculating the number of CPU cycles is that 
   the STATE-EXECUTE instruction takes (1 + state_length) cycles even 
   though it does not have a state_length parameter; instead the value 
   of state length is provided by the state handler as part of the state 
   being accessed. 
    
   All instructions are stored as a single byte to indicate the 
   instruction type, followed by 0 or more bytes containing the 
   parameters required by the instruction. The instruction specifies 
   which of the three parameter types of Section 7.3 is used in each 
   case. For example, the ADD instruction is followed by two parameters 
   as shown below: 
    
   ADD ($parameter_1, %parameter_2) 
    
   When converted into bytecode the number of bytes required by the ADD 
   instruction depends on the size of each parameter value, and whether 
   the second (multitype) parameter contains the parameter value itself 
   or a memory address where the actual value of the parameter can be 
   found. 
    
   The instruction set available for the UDVM offers a mix of low-level 
   and high-level instructions. The high-level instructions can all be 
   emulated using the low-level instructions provided, but given a 
   choice it is generally preferable to use a single instruction rather 
   than a large number of general-purpose instructions. The resulting 
   bytecode will be more compact (leading to a higher overall 
   compression ratio) and decompression will typically be faster because 
   the implementation of the compression-specific instructions can be 
   optimized for the UDVM. 
    
   Each instruction is explained in more detail below: 
    
9.1.  Bit manipulation instructions 
    
   The AND, OR and NOT instructions provide simple bit manipulation on 
   2-byte words. 
    
   AND ($parameter_1, %parameter_2) 
   OR ($parameter_1, %parameter_2) 
   NOT ($parameter_1) 
    
   After the operation is complete, the value of the first parameter is 
   overwritten with the result. Note that since this parameter is a 
   reference, the memory address specified by the parameter is always 
   overwritten and not the parameter itself. 
    
9.2.  Arithmetic instructions 
    
   The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform 
   arithmetic on 2-byte words. 

 
 
 
Price, Hannu, et al.                                           [Page 31] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
    
   ADD ($parameter_1, %parameter_2) 
   SUBTRACT ($parameter_1, %parameter_2) 
   MULTIPLY ($parameter_1, %parameter_2) 
   DIVIDE ($parameter_1, %parameter_2) 
    
   After the operation is complete, the first parameter is overwritten 
   with the result. 
    
   Note that in all cases the arithmetic operation is performed modulo 
   2^16. So for example, subtracting 1 from 0 gives the result 65535. 
    
   For the SUBTRACT instruction the second parameter is subtracted from 
   the first. Similarly, for the DIVIDE instruction the first parameter 
   is divided by the second parameter. Note that if the second parameter 
   does not divide exactly into the first parameter then the remainder 
   is ignored. 
    
9.3.  Memory management instructions 
    
   The following instructions are used to manipulate the UDVM memory. 
   Bytes can be copied from one area of memory to another, and areas of 
   memory can be write-protected to make it easier for UDVM code to be 
   compiled. 
    
9.3.1.  LOAD 
    
   The LOAD instruction sets a 2-byte variable to a certain specified 
   value. The format of a LOAD instruction is as follows: 
    
   LOAD (%address, %value) 
    
   The first parameter specifies the starting address of the 2-byte 
   variable, whilst the second parameter specifies the value to be 
   loaded into this variable. As usual, MSBs are stored before LSBs in 
   the UDVM memory. 
    
9.3.2.  MULTILOAD 
    
   The MULTILOAD instruction sets a contiguous block of 2-byte variables 
   to specified values. 
    
   MULTILOAD (%address, #n, %value_0, ..., %value_n-1) 
    
   The first parameter specifies the starting address of the contiguous 
   variables, whilst the parameters value_0 through to value_n-1 specify 
   the values to load into these variables (in the same order as they 
   appear in the instruction). 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 32] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.3.3.  WORKING-MEMORY 
    
   The WORKING-MEMORY instruction is used to prevent part of the UDVM 
   memory from being modified. This can be very useful when offering 
   UDVM code for compilation. 
    
   WORKING-MEMORY (%memory_start, %memory_end) 
    
   The parameters memory_start and memory_end specify the new working 
   memory area for the UDVM. These parameters replace the application-
   defined parameters working_memory_start and working_memory_end, but 
   only while the current message is being decompressed. When a new 
   instance of the UDVM is invoked the working memory area is set by the 
   original application-defined parameters. 
    
   If memory_end < memory_start, or if the parameters reference a memory 
   address beyond the overall UDVM memory size, then decompression 
   failure occurs. 
    
   After the WORKING-MEMORY instruction has been encountered, the only 
   way to write into UDVM memory within the protected region is to 
   cancel the protection using another WORKING-MEMORY instruction (or to 
   invoke a new instance of the UDVM). 
    
9.3.4.  COPY 
    
   The COPY instruction is used to copy a string of bytes from one part 
   of the UDVM memory to another. 
    
   COPY (%position, %length, %destination) 
    
   The position parameter specifies the memory address of the first byte 
   in the string to be copied, and the length parameter specifies the 
   number of bytes to be copied. 
    
   The destination parameter gives the address to which the first byte 
   in the string will be copied. 
    
   Note that byte copying is performed as per the rules of Section 7.4. 
    
9.3.5.  COPY-LITERAL 
    
   A modified version of the COPY instruction is given below: 
    
   COPY-LITERAL (%position, %length, $destination) 
    
   The COPY-LITERAL instruction behaves as a COPY instruction except 
   that after copying, the destination parameter is replaced with the 
   memory address immediately following the address to which the final 
   byte was copied. If the final byte was copied to the memory address 


 
 
 
Price, Hannu, et al.                                           [Page 33] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   specified in byte_copy_right, the destination parameter is set to the 
   memory address specified in byte_copy_left. 
    
9.3.6.  COPY-OFFSET 
    
   A further version of the COPY-LITERAL instruction is given below: 
    
   COPY-OFFSET (%offset, %length, $destination) 
    
   The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction 
   except that an offset parameter is given instead of a position 
   parameter. 
    
   To derive a suitable position parameter, starting at the memory 
   address specified by destination, the UDVM counts backwards a total 
   of offset memory addresses. If the memory address specified in 
   byte_copy_left is reached, the next memory address is taken to be 
   byte_copy_right. 
    
   The COPY-OFFSET instruction then behaves as a COPY-LITERAL 
   instruction, taking the position parameter to be the last memory 
   address reached in the above step. 
    
9.4.  Program flow instructions 
    
   The following instructions alter the flow of UDVM code. Each 
   instruction jumps to one of a number of memory addresses based on a 
   certain specified criterion. Note that all of the instructions give 
   the memory addresses in the form of deltas relative to the memory 
   address of the instruction. The actual memory address is calculated 
   as follows: 
    
   memory_address = (memory_address_of_instruction + delta) modulo 2^16 
    
   Note that certain I/O instructions (see Section 9.5) can also alter 
   program flow. 
    
9.4.1.  JUMP 
    
   The JUMP instruction moves program execution to the specified memory 
   address. 
    
   JUMP (%delta) 
    
   Note that if the address (specified as a delta from the address of 
   the JUMP instruction) lies beyond the overall UDVM memory size then 
   decompression failure occurs. 
    
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 34] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.4.2.  COMPARE 
    
   The COMPARE instruction compares two parameters and then jumps to one 
   of three specified memory addresses depending on the result. 
    
   COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3) 
    
   If parameter_1 < parameter_2 then the UDVM continues instruction 
   execution at the (relative) memory address specified by delta 1. If 
   parameter_1 = parameter_2 then it jumps to the address specified by 
   delta_2. If parameter_1 > parameter_2 then it jumps to the address 
   specified by delta_3. 
    
9.4.3.  CALL and RETURN 
    
   The CALL and RETURN instructions provide support for compression 
   algorithms with a nested structure. 
    
   CALL (%delta) 
    
   RETURN 
    
   The CALL and RETURN instructions make use of a stack of 2-byte 
   variables stored at the memory address specified by the well-known 
   variable stack_location. The stack contains the following variables: 
    
   Name:           Starting memory address: 
    
   stack_free            stack_location 
   stack[0]              stack_location + 2 
   stack[1]              stack_location + 4 
   stack[2]              stack_location + 6 
      :                       : 
    
   The MSBs of these variables are stored before the LSBs in the UDVM 
   memory. 
    
   When the UDVM reaches a CALL instruction, it finds the memory address 
   of the instruction immediately following the CALL instruction and 
   copies this 2-byte value into stack[stack_free] ready for later 
   retrieval. It then increases stack_free by 1 and continues 
   instruction execution at the (relative) memory address specified by 
   the parameter. 
    
   When the UDVM reaches a RETURN instruction it decreases stack_free by 
   1, and then continues instruction execution at the byte position 
   stored in stack[stack_free]. 
    
   If the variable stack_free is ever increased beyond 65535 or 
   decreased below 0 then a bad compressed message has been received and 
   decompression failure occurs (see Section 8.3). 

 
 
 
Price, Hannu, et al.                                           [Page 35] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Decompression failure also occurs if one of the above instructions is 
   encountered and the value of stack_location is smaller than 6 (this 
   prevents the stack from overwriting the well-known variables). 
    
9.4.4.  SWITCH 
    
   The SWITCH instruction performs a conditional jump based on the value 
   of one of its parameters. 
    
   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) 
    
   When a SWITCH instruction is encountered the UDVM reads the value of 
   j. It then continues instruction execution at the (relative) address 
   specified by delta j. 
    
   If j specifies a value of n or more, a bad compressed message has 
   been received and decompression failure occurs. 
    
9.4.5.  CRC 
    
   The CRC instruction verifies a string of bytes using a 2-byte CRC. 
    
   CRC (%value, %position, %length, %delta) 
    
   The actual CRC calculation is performed using the generator 
   polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte 
   Frame Check Sequence (FCS) of [RFC-1662]. 
    
   The position and length parameters define the string of bytes over 
   which the CRC is evaluated. Byte copying rules are enforced as per 
   Section 7.4. 
    
   Important note: Since a CRC calculation is always performed over a 
   bitstream, for interoperability it is necessary to define the order 
   in which bits are supplied within each individual byte. In this case 
   the MSBs of the byte MUST be supplied to the CRC calculation before 
   the LSBs. 
    
   The value parameter contains the expected integer value of the 2-byte 
   CRC. If the calculated CRC matches the expected value then the UDVM 
   continues at the following instruction. Otherwise the UDVM jumps to 
   the (relative) memory address specified by delta. 
    
9.5.  I/O instructions 
    
   The following instructions allow the UDVM to interface with its 
   environment. Note that in the overall SigComp architecture all of 
   these interfaces pass to the decompressor dispatcher or to the state 
   handler. 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 36] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.1.  END-MESSAGE 
    
   The END-MESSAGE instruction successfully terminates the UDVM and 
   passes state information to the state handler. 
    
   END-MESSAGE (%hash_length, %state_start, %state_length, 
   %state_instruction, %capability_announcement_location) 
    
   The actions taken by the UDVM upon encountering the END-MESSAGE 
   instruction are described in Section 8.2. Note also that the 
   capability_announcement_location parameter points to the starting 
   memory address of the capability announcement block of Section 6.3. 
    
9.5.2.  DECOMPRESSION-FAILURE 
    
   The DECOMPRESSION-FAILURE instruction triggers a manual decompression 
   failure. This is useful if the UDVM program discovers that it cannot 
   successfully decompress the message (e.g. by using the CRC 
   instruction). 
    
   This instruction has no parameters. 
    
9.5.3.  OUTPUT 
    
   The OUTPUT instruction provides successfully decompressed data to the 
   dispatcher. 
    
   OUTPUT (%output_start, %output_length) 
    
   The parameters define the starting memory address and length of the 
   byte string to be provided to the dispatcher. Note that the OUTPUT 
   instruction can be used to output a partially decompressed message; 
   each time the instruction is encountered it appends a byte string to 
   the end of the data previously passed to the dispatcher via the 
   OUTPUT instruction. 
    
   The string of data is byte copied from the UDVM memory obeying the 
   rules of Section 7.4. 
    
   Decompression failure occurs if the cumulative number of bytes 
   provided to the dispatcher exceeds the application-defined parameter 
   maximum_uncompressed_size. 
    
   Since there is technically a difference between outputting a 0-byte 
   decompressed message, and not outputting a decompressed message at 
   all, the OUTPUT instruction needs to distinguish between the two 
   cases. Thus, if the UDVM terminates before encountering an OUTPUT 
   instruction it is considered not to have outputted a decompressed 
   message. If it encounters one or more OUTPUT instructions, each of 
   which provides 0 bytes of data to the dispatcher, then it is 
   considered to have outputted a 0-byte decompressed message. 

 
 
 
Price, Hannu, et al.                                           [Page 37] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.4.  NBO 
    
   The NBO instruction modifies the order in which compressed bits are 
   passed to the UDVM. 
    
   As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual 
   bits from within a byte, to avoid ambiguity it is necessary to define 
   the order in which these bits are read. The default operation is to 
   read the MSBs before the LSBs, but if the NBO instruction is 
   encountered then the LSBs are read before the MSBs. Both cases are 
   illustrated below: 
    
    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB 
    
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
        Byte 0          Byte 1              Byte 0          Byte 1 
    
           Default operation            After NBO instruction 
    
   The NBO instruction can only be used before bitwise compressed data 
   is passed to the UDVM. Therefore, a decompression failure occurs if 
   it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN 
   instruction has been used. 
    
9.5.5.  INPUT-BYTECODE 
    
   The INPUT-BYTECODE instruction requests a certain number of bytes of 
   compressed data from the dispatcher. 
    
   INPUT-BYTECODE (%length, %destination, %delta) 
    
   The length parameter indicates the requested number of bytes of 
   compressed data, and the destination parameter specifies the starting 
   memory address to which they should be copied. Byte copying is 
   performed as per the rules of Section 7.4. 
    
   If the instruction requests data that lies beyond the end of the 
   compressed message, no data is returned. Instead the UDVM moves 
   program execution to the memory address specified by the formula 
   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. 
    
   The INPUT-BYTECODE instruction can only be used before bitwise 
   compressed data is passed to the UDVM. Therefore, a decompression 
   failure occurs if it is encountered after an INPUT-FIXED or an INPUT-
   HUFFMAN instruction has been used. 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 38] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.6.  INPUT-FIXED 
    
   The INPUT-FIXED instruction requests a certain number of bits of 
   compressed data from the dispatcher. 
    
   INPUT-FIXED (%length, %destination, %delta) 
    
   The length parameter indicates the requested number of bits. If this 
   parameter does not lie between 1 and 16 inclusive then a 
   decompression failure occurs. 
    
   The destination parameter specifies the memory address to which the 
   compressed data should be copied. Note that the requested bits are 
   interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under 
   default operation the MSBs of this integer are provided first, but if 
   an NBO instruction has been executed then the LSBs are provided 
   first. 
    
   If the instruction requests data that lies beyond the end of the 
   compressed message, no data is returned. Instead the UDVM moves 
   program execution to the memory address specified by the formula 
   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 
    
    
9.5.7.  INPUT-HUFFMAN 
    
   The INPUT-HUFFMAN instruction requests a variable number of bits of 
   compressed data from the dispatcher. The instruction initially 
   requests a small number of bits and compares the result against a 
   certain criterion; if the criterion is not met then additional bits 
   are requested until the criterion is achieved. 
    
   The INPUT-HUFFMAN instruction is followed by three mandatory 
   parameters plus n additional sets of parameters. Every additional set 
   contains four parameters as shown below: 
    
   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, 
   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, 
   %upper_bound_n, %uncompressed_n) 
    
   Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by 
   the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then 
   decompression failure occurs. 
    
   In all other cases, the behavior of the INPUT-HUFFMAN instruction is 
   defined below: 
    
   1.)   Set j = 1. 
    
   2.)   Request an additional bits_j compressed bits. Interpret the 
   total (bits_1 + ... + bits_j) bits of compressed data requested so 

 
 
 
Price, Hannu, et al.                                           [Page 39] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   far as an integer H, with the first bit to be supplied as the MSB and 
   the last bit to be supplied as the LSB (note that this is always the 
   case, independently of whether the NBO instruction has been used). 
    
   3.)   If data is requested that lies beyond the end of the compressed 
   message, terminate the INPUT-HUFFMAN instruction and move program 
   execution to the memory address specified by the formula 
   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 
    
   4.)   If (H < lower_bound_j) or (H > upper_bound_j) then set j = j +  
   1. Then go back to Step 2, unless j > n in which case decompression 
   failure occurs. 
    
   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the 
   memory address specified by the destination parameter. 
    
9.5.8.  STATE-REFERENCE 
    
   The STATE-REFERENCE instruction retrieves some previously stored 
   state information. 
    
   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, 
   %state_destination) 
    
   The id_start and id_length parameters specify the location of the 
   state identifier used to retrieve the state information. The state 
   identifier is always 16 bytes long; if id_length is less than 16 then 
   the remaining least significant bytes of the identifier are padded 
   with zeroes. 
    
   Decompression failure occurs if id_length is greater than 16. 
   Decompression failure also occurs if no state information matching 
   the state identifier can be found. 
    
   Note that when accessing state information that has been previously 
   created by the UDVM, the state identifier is always taken from an 
   [MD5] hash of the state to be retrieved. However this is not 
   necessarily the case for application-defined state as per Section 
   3.2. 
    
   The state_start and state_length parameters define the starting byte 
   and number of bytes to copy from the state_value contained in the 
   identified item of state. If more state is requested than is actually 
   available then decompression failure occurs. 
    
   The state_destination parameter contains a UDVM memory address. The 
   requested state is byte copied to this memory address using the rules 
   of Section 7.4. 
    
    
    

 
 
 
Price, Hannu, et al.                                           [Page 40] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
9.5.9.  STATE-EXECUTE 
    
   The STATE-EXECUTE instruction retrieves and runs some previously 
   stored state information. 
    
   STATE-EXECUTE (%id_start, %id_length) 
    
   The id_start and id_length parameters function as per the STATE-
   REFERENCE instruction. 
    
   STATE-EXECUTE is similar to STATE-REQUEST except that it does not 
   require the amount of state being requested or the proposed 
   destination for the state to be specified explicitly. Instead, it 
   simply puts the state_value back into the UDVM memory using the 
   parameters state_start and state_length contained as part of the 
   state information. 
    
   The entire state_value (all state length bytes of it) is byte copied 
   into the memory address specified by state start. The UDVM then jumps 
   to the (absolute) memory address specified by state_instruction. 
    
   Note that state start, state length and state_instruction are all 
   stored together with state_value as part of an item of state 
   information. 
 
    
10. Security considerations 
    
10.1 Security goals 
    
   The overall security goal of the SigComp architecture is to not 
   create risks that are in addition to those already present in the 
   application protocols. There is no intention for SigComp to enhance 
   the security of the protocols, as it always can be circumvented by 
   not using compression. More specifically, the high-level security 
   goals can be described as: 
    
   -- do not worsen security of existing application protocol 
    
   -- do not create any new security issues 
    
   -- do not hinder deployment of application security 
    
    
10.2 Security risks and mitigations 
    
   This subsection identifies the potential security risks associated 
   with the overall SigComp High-level view.


   The different components in architecture, and details the proposed 
   solution for each risk. 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 41] 

INTERNET-DRAFT                  SigComp architecture are described
   below.              February 14 , 2002 
 
 
   ** Confidentiality risks 
    
   *** Attacking SigComp by snooping into state of other users 
    
   State can only be accessed using a state identifier, which is a 
   (prefix of a) cryptographic hash of the state being referenced. This 
   implies that the referencing packet already needs knowledge about the 
   state. To enforce this, a reference length of 48 bits is defined. 
   This also minimizes the probability of an accidental state collision. 
    
   Generally, ways to obtain knowledge about the state identifier (e.g., 
   passive attacks) will also easily provide knowledge about the state 
   referenced, so no new vulnerability results. 
    
   The application needs to handle state identifiers with the same care 
   it would handle the state itself. 
    
   ** Integrity risks 
    
   The SigComp approach assumes that there is responsible for invoking appropriate integrity 
   protection below and/or above the SigComp compressor
   and decompressor, and establish layer. However, the state 
   establishment mechanism provides additional potential to compromise 
   the applicability integrity of the messages (which, however, would most likely be 
   detectable at the application layer). 
    
   *** Attacking SigComp between
   two communicating end points.

   The compressor by faking state or making unauthorized changes 
   to state 
    
   State cannot be destroyed or changed by a malicious sender -- it can 
   only add new state. Faking state is responsible for providing its peer decompressor
   with only possible if the UDVM instructions for decompression hash allows 
   intentional collision. 
    
   ** Availability risks (avoid DoS vulnerabilities) 
    
   *** Use of messages. For the
   actual compression SigComp as a tool in a DoS attack to another target 
    
   SigComp cannot easily be used as an amplifier in a reflection attack, 
   as it invokes only generates one decompressed message per incoming compressed 
   message. This packet is then handed to the encoder. The compressor also keeps
   track application; the utility 
   as a reflection amplifier is therefore limited by the utility of available states, the 
   application. 
    
   However, it must be noted that may contain useful information for SigComp can be used to generate larger 
   packets as input to the application than have to be sent from the 
   malicious sender; this therefore can send smaller packets (at a lower 
   bandwidth) than are delivered to the compression process. application. Depending on the 
   reflection characteristics of the application, this can be considered 
   a mild form of amplification. The decompressor contains application MUST limit the UDVM, and number 
   of packets reflected to a potential target -- even if SigComp is responsible for providing
   it with input. used 
   to generate a large amount of information from a small incoming 
   attack packet. 

 
 
 
Price, Hannu, et al.                                           [Page 5] 42] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


   A 
 
 
   *** Attacking SigComp as the DoS target by filling it with state is data saved either 
    
   Excessive state can only be installed by a compressor or malicious sender (or a decompressor. The
   data reflects set 
   of malicious senders) with the contents consent of a UDVM memory. The saved state may be
   used for decompression/compression between a compressor and its peer
   decompressor, e.g. compressor 1 and decompressor 1 in Figure 1. the application. The
   same state information is used both for compression and
   decompression. In order to retrieve a certain state its reference
   must be given, denoted state index.
   State parameters and retrieval of states are further described in
   [UDVM].


4. Applicability system 
   consisting of SigComp

   The and application is responsible for establishing the applicability of
   SigComp between thus approximately as 
   vulnerable as the communicating end points.

   This involves finding out application itself, unless it allows the transport (e.g. UDP or TCP) and port 
   installation of state from a message where it would not have 
   installed state itself. 
    
   If this is desirable to
   use for SigComp messages.

   [Editors note: This should be further described in draft XXXX]

   The parameters for the UDVM operation should be exchanged/negotiated
   together with increase the parameters in Section 7.

   [Editors note: This should be further described in draft XXXX]


5. SigComp operation

   The application provides compression ratio, the compressor with data to effect 
   can be compressed.
   The data is compressed mitigated by adding feedback at the encoder in such away application level that 
   indicates whether the peer
   decompressor with the UDVM can decompress the data correctly if the
   compressed data is not tampered with during the transportation.

   When state requested was actually installed -- This 
   allows a message is system under attack to be compressed the gracefully degrade by no longer 
   installing compressor identifies state that is not matched by application state. 
    
   *** Attacking the UDVM by faking state or making unauthorized changes 
   to use. If no state exists, one is created. An indication of 
    
    (See "Integrity risks" above.) 
    
   *** Attacking the used state MUST be sent along with UDVM by sending it looping code 
    
   The application sets an upper limit to the number of "CPU cycles" 
   that can be used per compressed message to and per input bit in the
   peer decompressor. 
   compressed message. The SigComp message damage inflicted by sending packets with 
   looping code is then passed on to
   underlying layers therefore limited, although this may still be 
   substantial if a large number of CPU cycles are offered by the UDVM. 
   However, this would be true for transport any decompressor that can receive 
   packets from anywhere. 
    
    
11. IANA considerations 
    
   The SigComp solution currently requires two identifiers to be 
   assigned by IANA: the peer decompressor.
   Upon UDVM_version and the arrival state identifier. 
    
   Upgraded versions of a SigComp message the decompressor will load the UDVM with will contain additional instructions to 
   improve the indicated state. The message is then decompressed by performance of the UDVM, overall SigComp solution; new 
   UDVM_version parameters will be needed in this case. 
    
   Well-known decompression algorithms will also need to be assigned 
   fixed state identifiers. 
    
    
12. Acknowledgements 
    
   Thanks to  
        
            Abigail Surtees (abigail.surtees@roke.co.uk)  
            Mark A West (mark.a.west@roke.co.uk)  
            Lawrence Conroy (lwc@roke.co.uk)  

 
 
 
Price, Hannu, et al.                                           [Page 43] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
            Christian Schmidt (christian.schmidt@icn.siemens.de)  
            Max Riegel (maximilian.riegel@icn.siemens.de)  
            Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) 
            Stefan Forsgren (stefan.forsgren@epl.ericsson.se)  
            Krister Svanbro (krister.svanbro@epl.ericsson.se)  
            Christopher Clanton (christopher.clanton@nokia.com)  
            Khiem Le (khiem.le@nokia.com)  
            Ka Cheong Leung (kacheong.leung@nokia.com)  
    
   for valuable input and review. 
    
    
13. AuthorsĘ addresses 
    
   Richard Price         Tel: +44 1794 833681 
   Email:                richard.price@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
   Hans Hannu            Tel: +46 920 20 21 84 
   Email:                hans.hannu@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Carsten Bormann       Tel: +49 421 218 7024 
   Email:                cabo@tzi.org 
    
   Universitaet Bremen TZI 
   Postfach 330440 
   D-28334 Bremen, Germany 
    
    
   Jan Christoffersson   Tel: +46 920 20 28 40 
   Email:                jan.christoffersson@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Zhigang Liu           Tel: +1 972 894-5935 
   Email:                zhigang.liu@nokia.com 
    
   Nokia Research Center 
   6000 Connection Drive 

 
 
 
Price, Hannu, et al.                                           [Page 44] 

INTERNET-DRAFT                  SigComp              February 14 , 2002 
 
 
   Irving, TX 75039 
   USA 
    
    
   Jonathan Rosenberg 
   Email:                jdrosen@dynamicsoft.com 
    
   dynamicsoft 
   72 Eagle Rock Avenue 
   First Floor 
   East Hanover, NJ 07936 
    
 
14. References 
 
   [SIP]       "SIP: Session Initiation Protocol", Handley et al,  
               RFC 2543, Internet Engineering Task Force, March 1999 
    
   [RTSP]      "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.  
               Rao and possible passed on to the receiving application.


5.1. Pre-requisites

   A compressor MUST be certain that compressed data can be decompressed
   before the data is to be sent, i.e. the UDVM instructions R. Lanphier, , RFC 2326, April 1998 
    
   [HTTP]      "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et  
               al.", RFC 2616, June 1999 
    
   [SIPsrv]    "SIP: Locating SIP Servers", J. Rosenberg, H.  
               Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002,  
               work in progress 
    
   [DEFLATE]   "DEFLATE Compressed Data Format Specification version  
               1.3", P. Deutsch, RFC 1951, Internet Engineering Task  
               Force, May 1996 
    
   [SCTP]      "Stream Control Transmission Protocol", Stewart et al,  
               RFC 2960, Internet Engineering Task Force, October 2000 
    
   [MD5]       "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,  
               Internet Engineering Task Force, April 1992 
    
   [RFC-1662]  "PPP in HDLC-like Framing", Simpson et al, Internet  
               Engineering Task Force, July 1994 
    
   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott 
               Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]  "Key words for
   decompression MUST be available at the peer decompressor. Some
   options exists: use in RFCs to Indicate Requirement 
               Levels", Scott Bradner, Internet Engineering Task Force, 
               March 1997 
    




 
 
 
Price, Hannu, et al.                                           [Page 6] 45] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


   1. Each SigComp message sent from the compressor contains the
      necessary UDVM instructions for decompression.

   2. By setting up a reliable connection, such as a TCP connection,
      between a compressor and its peer decompressor the 
 
 
Appendix A. Mnemonic language 
    
   Writing UDVM
      instructions can programs directly in bytecode would be transferred, and an initial state a daunting task, 
   so a simple mnemonic language is
      established.

   In order provided to save delay for "time-critical" sessions, facilitate the UDVM
   instructions should be sent prior to any initiation creation 
   of "time-
   critical" sessions.


5.2. Compression states

   The UDVM may request new decompression algorithms. Most importantly, the decompressor to save language 
   allows the contents parameters of its
   memory, i.e. request that a state is created based on the information
   in the UDVM memory. an instruction to be specified as text names 
   rather than as integer values. 
    
   If an instruction parameter is given as a UDVM will perform such text name, it should 
   correspond to exactly one instance of a request depends on
   the instructions sent from its peer compressor. See [UDVM] for
   details on how label, a reserved memory 
   address or an externally defined keyword. A label is simply a request text 
   name preceded by a colon, for example: 
    
   :loop 
   JUMP (loop) 
    
   For any parameters corresponding to save a state is performed.

   The kind label, the integer value of information, which the 
   parameter is included in a state, calculated by the following formula: 
    
    parameter_value = (instruction_address - label_address) modulo 2^16 
    
   Note that the "label address" is up to simply the
   implementation memory address of the compressor and 
   instruction immediately following the downloaded instructions to label. In particular, the peer decompressor's UDVM. However, above 
   example can be rewritten as said in Section 5.1. a
   compressor MUST not use a state that JUMP (0). 
    
   A reserved memory address is not known to be established
   at specified using the peer decompressor.


5.3. Saving and deleting states

   Saving states

     If there already exists "reserve" keyword 
   followed by a state for text_name and (optionally) an integer value. For 
   example: 
    
   reserve apples 
   reserve pears (8) 
   reserve bananas 
   LOAD (bananas, 5) 
    
   For any parameters corresponding to a state index, then this state
     MUST not be replaced with reserved memory address, the requested state to be saved. This 
   integer value of the parameter is
     to avoid the next free memory address that a compressed message cannot be decompressed because
     its state 
   has not yet been replaced.

   Deleting states

     A decompressor SHOULD NOT delete reserved. Starting at this address, the specified 
   number of bytes of memory are then reserved (if no value is given 
   then a state before it total of 2 bytes is enough
     confident that reserved). 
    
   The first instance of a "reserve" keyword begins reserving memory at 
   Address 6 (to avoid overwriting the state three well-known variables of 
   Section 7.2). So the above example can be rewritten as LOAD (16, 5). 
    
   An externally defined keyword is not used by a peer compressor anymore.

     [Editors note: I think some rules specified outside of the mnemonic 
   language. All of the application-defined parameters are needed considered to handle deletion of
     states] 
   be externally defined keywords and can be referenced in the mnemonic 
   code (useful for adapting the code based on the available memory or 
   CPU cycles). The following additional keywords can also be used: 
    
    

 
 
 
Price, Hannu, et al.                                           [Page 7] 46] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


6. SigComp message 
 
 
   Keyword:                 Corresponding value: 
    
   byte_copy_left                 0 
   byte_copy_right                2 
   stack_location                 4 
   reserved_end               See below 
   bytecode_length            See below 
   total_length               See below 
    
   The keyword reserved_end specifies the highest reserved memory 
   address for the entire mnemonic code (taking into account all the 
   occasions where memory is reserved). 
    
   The basic SigComp message consist keyword bytecode_length specifies the total size of compressed data, a reference the bytecode 
   corresponding to the state mnemonic code. Any instances of bytecode_length 
   are initially replaced with 3 bytes of zeroes, and then are filled in 
   after the remainder of the bytecode has been generated. 
    
   Similarly, the keyword total_length specifies the total amount of 
   memory required at the decompressor should initialize its UDVM with, including bytecode and
   possible also a CRC. Figure 2, shows a basic SigComp message. reserved memory 
   addresses. 
    
   A decompressor must be able to separate two SigComp messages, in case complete description of UDP a SigComp message corresponds exactly to one UDP packet. For
   TCP each 0xFFFF delimiter, see [UDVM] is followed by a new SigComp
   message.

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | i | c |      Reserved         |
   +---+---+---+---+---+---+---+---+
   |                               |
   :    state_index (n-bytes)      : Present if 'i' is set
   |                               |
   +---+---+---+---+---+---+---+---+
   |                               |
   :            CRC                : Present if 'c' is set
   |                               |
   +---+---+---+---+---+---+---+---+
   |           Possible            |
   :  Compressed data that the mnemonic language and how it should  :
   | be 
   translated into bytecode is given below: 
    
   Instructions:     Instruction names are given in capitals. Replace  
                     each name with the corresponding 1-byte value as input  
                     per Chapter 9. 
    
   $:                When appended to the UDVM |
   +---+---+---+---+---+---+---+---+

   Figure 2. Basic SigComp message

   The CRC is calculated over the uncompressed message.

   The reserved bits are set to zero by front of an instruction  
                     parameter then the compressor.

   In case parameter is a decompressor encounters that memory address 
                     rather than a direct value. This symbol is  
                     mandatory for reference parameters, optional for 
                     multitype parameters and disallowed for literals. 
    
   Integers:         Instruction parameters can be given in the Reserved bits form of 
                     decimal integers. They are not all
   zeros, it should remove m-bytes from converted into the SigComp message following 
                     shortest bytecode capable of representing the CRC according to:

   m-bytes = (bit(2) + bit(3)) * n.

   For 
                     integer by the value rules of n see Section 7.


7. Capability exchange

   Applications that use SigComp MUST agree on 7.3. 
    
   Text references:  Instruction parameters can also be given in the  
                     form of lowercase names. These names should match 
                     exactly one label, reserved memory address or 
                     externally defined keyword as described above. 
    
   Labels:           Label names are given as a colon followed by  
                     lowercase text. They are deleted when converting  
                     the capabilities they
   intend mnemonics to use. This involves exchange/negotiation of the parameters
   below, see also [UDVM]. Preferably the exchange of parameters is done
   simultaneously with bytecode. 
    
   Reserved memory:  Memory addresses are reserved using the negotiation of "reserve" 
                     keyword. The line containing the applicability of SigComp.
   Consider also Section 5.1. reserve keyword 

 
 
 
Price, Hannu, et al.                                           [Page 8] 47] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


   CRC

     Two CRCs are available to use with SigComp.
     CRC_1: C(x) = 1 + x + x^2 + x^8
     CRC_2: C(x) = 1 + x^5 + x^12 + x^16
     [Editors note: 
 
 
                     is it feasible (for implementation) deleted when converting to bytecode. 
    
   .LSB:             When appended to provide two
      CRCs in the spec?]

   For end of a text name, the parameters below consult also [UDVM].

   maximum_compressed_size

     The maximum_compressed_size parameter limits  
                     integer value corresponding to the size of one
     compressed message.

   maximum_uncompressed_size

     The maximum_uncompressed_size parameter limits name is  
                     increased by 1. This is useful for addressing the size  
                     LSBs of one
     uncompressed message.

   minimum_hash_size (n-bytes) a 2-byte variable. 
    
   0b, 0d:           Bytecode values can be specified directly in  
                     binary or decimal via the appropriate prefix. The minimum_hash_size parameter specifies  
                     direct bytecode continues until a character occurs  
                     that is not an integer or whitespace. 
    
   Whitespace:       All whitespace (plus brackets and commas) just  
                     delimit the minimum size instructions. Delete. 
    
   Comments:         These are indicated by a semicolon and continue 
                     to the end of the
     state index that line. Delete. 
    
   Once the mnemonic code has been converted into bytecode, it can be used to reference state. This corresponds
     to 
   executed by copying the n in n-bytes.

   overall_memory_size

     The overall_memory_size parameter specifies bytecode into the total number UDVM memory beginning at 
   the first memory address that has not been reserved by an instance of
     bytes in 
   the UDVM memory.

   working_memory_start

     The working_memory_start parameter specifies "reserve" keyword. Program execution is assumed to begin at this 
   address. 
    
   Note that further to the start rules outlined above, well-written mnemonic 
   code will also have the following properties: 
    
   *    Any instance of a memory address will be specified as a text 
        reference rather than an integer value. This ensures that the UDVM 
        mnemonic code is portable. 
    
   *    The mnemonic code will not write to any memory area address except 
        those reserved by the "reserve" keyword. This ensures that the 
        code can be modified. Memory addresses below this value compiled. 
    
    
Appendix B. Example application-defined parameters 
    
   This appendix gives some example values for each of the application-
   defined parameters. These values are considered read-only by geared towards the compression 
   of a signaling protocol such as [SIP]. 
    
   Note that all of the UDVM.

   working_memory_end

     The working_memory_end parameter specifies proposed values are fixed and not negotiated 
   between the end two instances of the UDVM
     memory area that can be modified. Memory addresses above this value
     are considered read-only by application invoking SigComp. This 
   is because it is possible for the UDVM.

   cycles_per_bit

     The cycles_per_bit parameter specifies application invoking the number of "CPU cycles"
     that can be used 
   decompressor to decompress a single bit of data. One CPU cycle
     typically receive compressed messages from several different 
   applications, and it is difficult to determine which message 
   corresponds to which application. [SIP] does this using "From:" and 
   "To:" fields in the message itself, but these are not visible until 
   the message has been decompressed. It is simpler just to fix a single UDVM instruction, although some set of 
   parameter for every instance of the high-level instructions may require additional cycles. application. 

 
 
 
Price, Hannu, et al.                                           [Page 9] 48] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002 
 
 
   UDVM_version                    0 
   minimum_compression_ratio       0.5 
   maximum_compressed_size         65535 
   maximum_uncompressed_size       65535 
   minimum_hash_size               6 
   overall_memory_size             8192 
   working_memory_start            0 
   working_memory_end              8191 
   cycles_per_bit                  20 
   cycles_per_message

     The cycles_per_message parameter specifies the number of additional
     CPU cycles made available at              2000 
   first_instruction               6 
    
   Note that the start of a compressed message.
     These cycles parameters overall_memory_size, cycles_per_bit and 
   cycles_per_message can be useful when decompressing algorithms that
     download additional data increased on a per-message basis, for example a new
     set of Huffman codes as with [DEFLATE]. the fly using the capabilities 
   announcement mechanism. This mechanism is designed to function 
   correctly even when the receiving application is sent compressed 
   messages from several different applications. 
    
   The total number initial contents of "CPU cycles" available the UDVM memory also need to be defined. It 
   is not enough simply to initialize the memory containing all zeroes, 
   as the UDVM would be unable to input any compressed data. Instead, 
   for each new compressed message is specified by the memory should be initialized 
   containing a simple decompressor capable of extracting the first few 
   bytes of compressed data. These bytes can then be interpreted as a 
   state identifier to retrieve the correct decompression algorithm. 
    
   As an example, the following formula:

     total_cycles = message_size * cycles_per_bit + cycles_per_message

   first_instruction

     The first_instruction parameter specifies mnemonic code can be converted to 
   bytecode and pasted into the UDVM memory address of beginning at Address 6: 
    
   reserve state_identifier (6) 
   INPUT-BYTECODE (6, state_identifier, fail) 
   STATE-EXECUTE (state_identifier, 6) 
   :fail 
   DECOMPRESSION-FAILURE 
    
   Finally, the
     first instruction application can define initial state that is available 
   to the UDVM. Examples of application-defined state include common 
   decompression algorithms, dictionaries of common text phrases etc. 
    
    
Appendix C. Example decompression algorithms 
    
   This appendix gives examples of decompression algorithms which can be executed when 
   run on the UDVM in the form of bytecode. 
    
C.1.  Example UDVM code for simple LZ77 decompression 
    
   The first example gives the code required to decompress data from a 
   very simple LZ77-based algorithm. The UDVM is initialized.


8. Security considerations

   [Editors note: TBW]


9. IANA considerations

   [Editors note: TBW]


10. Acknowledgements

   [Editors note: TBW]


11. AuthorsĘ addresses

   Hans Hannu, Editor
   Phone:  +46 920 20 21 84
   Fax:    +46 920 20 20 99
   E-Mail: hans.hannu@epl.ericsson.se

   Jan Christoffersson
   Phone:  +46 920 20 28 40
   Fax:    +46 920 20 20 99
   E-Mail: jan.christoffersson@epl.ericsson.se

   Stefan Forsgren
   Phone:  +46 920 20 23 39
   Fax:    +46 920 20 20 99
   E-Mail: stefan.forsgren@epl.ericsson.se instructed to interpret 
   a compressed message as a set of 4-byte characters, where each 
   character contains a 2-byte position integer followed by a 2-byte 

 
 
 
Price, Hannu, et al.                                           [Page 10] 49] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


   Krister Svanbro
   Phone:  +46 920 20 20 77
   Fax:    +46 920 20 20 99
   E-Mail: krister.svanbro@epl.ericsson.se

     Box 920
     Ericsson Erisoft AB
     SE-971 28 Lulea, Sweden


   Christopher Clanton
   Phone:  +1 972 894-4886
   Fax:    +1 972 894-4589
   E-mail: christopher.clanton@nokia.com

   Khiem Le
   Phone:  +1 972 894-4882
   Fax:    +1 972 894-4589
   E-mail: khiem.le@nokia.com

   Ka Cheong Leung
   Phone:  +1 972 374-0630
   Fax:    +1 972 894-4589
   E-mail: kacheong.leung@nokia.com

   Zhigang Liu
   Phone:  +1 972 894-5935
   Fax:    +1 972 894-4589
   E-Mail: zhigang.liu@nokia.com

     Nokia Research Center
     6000 Connection Drive Irving, TX 75039, USA


   Richard Price
   Phone:  +44 1794 833681
   E-mail: richard.price@roke.co.uk

     Roke Manor Research Ltd
     Romsey, Hants, SO51 0ZN
     United Kingdom

   Jonathan Rosenberg
   E-mail: jdrosen@dynamicsoft.com

     dynamicsoft
     72 Eagle Rock Avenue
     First Floor
     East Hanover, NJ 07936 
 
 
   length integer. Taken together these integers point to a previously 
   received text string in the UDVM memory, which is then copied to the 
   end of the uncompressed message. 
    
   Since the compressor can only send references to strings already 
   present in the UDVM memory, before the first message is decompressed 
   the memory must be initialized with a static dictionary containing 
   the 256 ASCII characters. 
    
   The algorithm write-protects the memory containing the UDVM 
   instructions used to decompress each character, so that they can 
   easily be compiled to improve the speed of decompression. 
    
   A 2-byte CRC over the uncompressed message is appended to the end of 
   the compressed message, to verify that correct decompression has 
   occurred. The algorithm also requests that the contents of the UDVM 
   memory be saved using the state request mechanism, so that it can be 
   retrieved by sending the appropriate 6-byte hash. 
    
   reserve byte_copy_left 
   reserve byte_copy_right 
   reserve uncompressed_start 
   reserve uncompressed_end 
   reserve uncompressed_length 
   reserve position 
   reserve length 
   reserve static_dictionary (256) 
   reserve circular_buffer (2048) 
         
   WORKING-MEMORY (uncompressed_start, reserved_end) 
   MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary, 
   circular_buffer, 0, 0, 0) 
    
   :unpack_static_dictionary 
    
   ; The following instructions initialize the static dictionary. 
    
   COPY-LITERAL (position.LSB, 1, $uncompressed_start) 
   ADD ($position, 1) 
   COMPARE ($position, 256, unpack_static_dictionary, next_character, 0) 
    
   :next_character 
    
   INPUT-FIXED (16, position, fail) 
   INPUT-FIXED (16, length, end_of_message) 
   COPY-LITERAL ($position, $length, $uncompressed_end) 
   ADD ($uncompressed_length, $length) 
   JUMP (next_character) 
    
   :fail 
    

 
 
 
Price, Hannu, et al.                                           [Page 11] 50] 

INTERNET-DRAFT                  SigComp               January 28              February 14 , 2002


12. References

   [DEFLATE]   "DEFLATE Compressed Data Format Specification 
 
 
   DECOMPRESSION-FAILURE 
    
   :end_of_message 
    
   CRC ($position, $uncompressed_start, $uncompressed_length, fail) 
   OUTPUT ($uncompressed_start, $uncompressed_length) 
   END-MESSAGE (6, 0, total_length, next_character, 0) 
    
    
Appendix D. Document history 
 
   - October 19, 2001, version
               1.3", RFC 1951, P. Deutsch, May 1996.

   [UDVM]      R. Price et. al., Universal Decompressor Virtual Machine
               (UDVM), Internet Draft (work 00 
    
   First version. The draft describes the current ideas, from people    
   involved in progress), the ROHC WG, of how to perform compression of  
   application signaling messages. 
    
   - October 31, 2001, version 01 
    
   Second version. Additional section, 5.2.1, which describes when a  
   message identifier can be reused. 
    
   - November 21, 2001, version 02 
    
   Third version. Section 6 has been moved to a separate draft. The  
   third version describes a modular solution, providing flexibility  
   for implementers to decide which functions they want to integrate. 
    
   - January 2002.
               <draft-ietf-rohc-sigcomp-udvm-00.txt>

   [RTSP]      H. Schulzrinne, A. Rao and R. Lanphier, Real Time
               Streaming Protocol (RTSP), RFC 2326, April 1998.

   [SIP]       M. Handley, H. Schulzrinne, E. Schooler 28, 2002, version 03  
       
   Fourth version. SigComp version 02 is divided into this draft, a UDVM 
   draft and J. Rosenberg,
               SIP: Session Initiation Protocol, RFC 2543, August 2000. a extended operation mechanisms draft. 
   Compressor/decompressor (UDVM) state approach has been introduced for 
   security reasons.  
    
   - February 14, 2002, version 04 
    
   Fifth version. Describes the complete base SigComp solution including 
   the UDVM. 
    
    
    
    
    
    
    
    
    
    
 
   This Internet-Draft expires in July August 2002. 

 
 
 
Price, Hannu, et al.                                           [Page 12] 51] 

----