view Side-By-Side changes
CFRG Working Group T. Krovetz,IntelEditor INTERNET-DRAFTJ. Black, UNRCSU Sacramento Expires April2001 S. Halevi, IBM A. Hevia, U.C. San Diego H. Krawczyk, Technion P. Rogaway, U.C. Davis2005 October20002004 UMAC: Message Authentication Code using Universal Hashing<draft-krovetz-umac-01.txt><draft-krovetz-umac-02.txt> By submitting this Internet-Draft, we certify that any applicable patent or other IPR claims of which we are aware have been disclosed, or will be disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed athttp://www.ietf.org/ietf/1id-abstracts.txthttp://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This specification describes how to generate an authentication tag(also called a "MAC")using the UMAC message authenticationcode.algorithm. UMAC is designed to be very fast tocompute,compute insoftware,software on contemporaryprocessors.uniprocessors. Measured speeds are as low as1.0 cyclesone cycle per byte.The heart ofUMACis a universal hash function, UHASH, whichrelies on addition of 32-bit and 64-bit numbers and multiplication of16-bit, 32-bit, or 64-bit32-bit numbers, operations well-supported by contemporary machines. To generate the authentication tag on a given message,UHASHa "universal" hash function is applied to the message and key to produce a short,fixed-length,fixed-length hash value, and this hash value is thenXOR-edxor'ed with a key-derived pseudorandom pad. UMAC enjoys a rigorous security analysis and its only internal "cryptographic" use is a block cipher, AES, to generate the pseudorandom pads and internal key material. Krovetzet al.Expires April20012005 [Page0]1] INTERNET-DRAFT UMAC October20002004 Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 21.1 Organization . . .Notation and basic operations . . . . . . . . . . . . . . . . . . 4 2.1 Operations on strings . . .5 2 Named parameter sets: UMAC16 and UMAC32. . . . . . . . . . . . .5 2.1 Named parameters. . . 4 2.2 Operations on integers . . . . . . . . . . . . . . . . . . . 52.2 Alternative instantiations . . . .2.3 String-Integer conversion operations . . . . . . . . . . . . 5 2.4 Mathematical operations on strings .6 2.3 Naming convention. . . . . . . . . . . . 6 2.5 ENDIAN-SWAP: Adjusting endian orientation . . . . . . . . .76 3NotationKey andbasic operations . .pad derivation functions . . . . . . . . . . . . . . . . 7 3.1Operations on strings .KDF: Key-derivation function . . . . . . . . . . . . . . . . 7 3.2 PDF: Pad-derivation function . .8 3.2 Operations on integers. . . . . . . . . . . . . . 8 4 UMAC tag generation . . . . .10 3.3 String-Integer conversion operations. . . . . . . . . . . .10 3.4 Mathematical operations on strings. . . . . . 9 4.1 UMAC Algorithm . . . . . . .11 4 Key and pad derivation functions. . . . . . . . . . . . . . . .12 4.1 KDF: Key derivation function9 4.2 UMAC-32, UMAC-64 and UMAC-96 . . . . . . . . . . . . . . . .12 4.2 PDF: Pad-derivation9 5 UHASH: Universal hash function . . . . . . . . . . . . . . . .13 5 UHASH-32: Universal. 10 5.1 L1-HASH: First-layer hashfunction for a 32-bit word size. . . .15 5.1 NH-32: NH hashing with a 32-bit word size. . . . . . . . .16. . . . 11 5.2L1-HASH-32: First-layerL2-HASH: Second-layer hash . . . . . . . . . . . . . . . .17. 13 5.3POLY: PolynomialL3-HASH: Third-layer hash . . . . . . . . . . . . . . . . .. . 19 5.4 L2-HASH-32: Second-layer hash15 6 Security considerations . . . . . . . . . . . . . . .20 5.5 L3-HASH-32: Third-layer hash. . . . . . 16 6.1 Resistance to cryptanalysis . . . . . . . . . .22 5.6 UHASH-32: Three-layer universal hash. . . . . . 16 6.2 Tag lengths and forging probability . . . . . .23 6 UHASH-16: Universal hash function for a 16-bit word size. . . .24 6.1 NH-16: NH hashing with a 16-bit word size. .. . . . . . . 24 6.2 L1-HASH-16: First-layer hash . . . . . . . . . . . . . . . . 2517 6.3L2-HASH-16: Second-layer hash . . . . . . . . . . . . . . . 27 6.4 L3-HASH-16: Third-layer hash . . . . . . . . . . . . . . . . 28 6.5 UHASH-16: Three-layer universal hash . . . . . . . . . . . . 29 7 UMAC tag generation . . . . . . . . . . . . . . . . . . . . . . . 30 7.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . 30 7.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30 8 Security considerations . . . . . . . . . . . . . . . . . . . . . 31 8.1 Resistance to cryptanalysis . . . . . . . . . . . . . . . . 31 8.2 Tag lengths and forging probability . . . . . . . . . . . . 31 8.3 Selective-assurance authentication . . . . . . . . . . . . . 33 8.4Nonce considerations . . . . . . . . . . . . . . . . . . . .34 8.518 6.4 Guarding against replay attacks . . . . . . . . . . . . . .35 919 7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .36 10 References20 Appendix - Test vectors . . . . . . . . . . . . . . . . . . . . . . 20 References . . . . .36 11 Author contact information. . . . . . . . . . . . . . . . . . .37 A Suggested application programming interface (API). . . . . 21 Author contact information . . .38 B Reference code and test vectors. . . . . . . . . . . . . . . . .39. 21 Krovetzet al.Expires April20012005 [Page1]2] INTERNET-DRAFT UMAC October20002004 1 IntroductionThis specification describes how to generate an authentication tag (also called a "MAC") using theUMACmessage authentication code. Typically the authentication tag will be transmitted along with a message andis anonce to allow the receiver of themessageto verify the message's authenticity. Generation and verification of theauthenticationtag depends on the message, the nonce, and on a secret key (typically, shared by sender and receiver). UMAC isalgorithm (MAC) designed for high performance. It has been rigorously proven to bevery fastsecure and there are no intellectual property claims made tocompute,any ideas used insoftware, on contemporary processors.its design. Theheartoutput of UMAC is auniversal hash function, UHASH, which relies on addition and multiplication of 16-bit, 32-bit, and 64-bit numbers. These operations are supported well by contemporary machines. For many applications, especially ones with short-lived authentication needs, sufficient speedstring called a tag. UMAC isalready obtained by algorithms such as HMAC-SHA1 [2, 9]designed to produce 32-, 64- or 96-bit tags, depending on theCBC-MAC of a block cipher [1, 8]. Butuser's preference, with 64 bits recommended forthemostspeed-demanding applications, UMAC may be a better choice: An optimized implementation ofapplications. When UMACcan achieve peak performance which is about an order of magnitude faster than what can be achieved with HMACproduces 32-, 64- orCBC-MAC. Moreover, UMAC offers a tradeoff between forging probability and speed (it is possible to trade forging96-bit tags, the probabilityfor speed). UMAC has been designed sothatcomputing the prefix ofan attacker could produce a correct tagcan be done faster than computing the entire tag. This feature allowsfora receiver to verify the authenticity of aany messageto various levelsofassurance depending onitsneeds and resources. Finally, UMAC enjoys better analytical security properties than many other constructions. Closely associated to this specification arechoosing is about 1/2^30, 1/2^60 or 1/2^90, respectively. These probabilities remains thepapers [3, 4, 10, 11]. See those paperssame fordescriptions ofeach new forgery attempt by theideas which underlie this algorithm, for performance data, andattacker. Our analysis has shown that doing any better would imply that an effective attack exists forproofs ofthecorrectness and maximal forging probability ofAdvanced Encryption Standard (AES). Hence, assuming AES is strong, so is UMAC.TheSecurity analysis of UMACalgorithms describedis inthe papers [3, 4][2, 5]. UMAC performs best in environments where 32-bit quantities are"parameterized". This means that various low-level choices, likeefficiently multiplied into 64-bit results. In producing 64-bit tags on an Intel Pentium 4 using SSE2 instructions, which do two of these multiplications in parallel, UMAC processes messages at a peak rate of about one CPU cycle per byte, with theendian conventionpeak being achieved on messages of around four kilobytes and longer. On theunderlying cryptographic primitive, have not been fixed. One must choose values for these parameters beforePentium III, without theauthentication tag generated byuse of SSE parallelism, UMAC(forachieves agiven message, key, and nonce) becomes fully-defined. In this document we provide two collectionspeak ofparameter settings, and have named the sets UMAC16 and UMAC32. The parameter sets have been chosen basedtwo cycles per byte. On shorter messages UMAC still performs well: around four cycles per byte onexperimentation256 byte messages andprovide good performanceunder two cycles per byte on 1500 byte messages. The time to produce awide variety of processors. UMAC1632-bit tag isdesigneda little more than half that needed toexcel on processors which provide small-scale SIMD parallelism of the type foundproduce a 64-bit tag, while 96-bit tags take about one-and-a-half times as long. UMAC is a MAC inIntel's MMXthe style of Wegman andMotorola's AltiVec instruction sets, while UMAC32Carter [3, 6]. A fast "universal" hash function isdesignedused todo well on processors Krovetz et al. Expires April 2001 [Page 2] INTERNET-DRAFT UMAC October 2000hash an input message into a short string. This short string is then encrypted by xor'ing withgood 32- and 64- bit support. UMAC32 may take advantage of SIMD parallelisma pseudorandom string, resulting infuture processors. UMAC has been designed to allow implementations which accommodate "on-line" authentication. This means that pieces ofthemessage may be presented toUMACat different times (but in correct order) and an on-line implementation will be able to process the message correctly without the need to buffer more than a few dozen bytes of the message. For simplicity, the algorithms in this specification are presented as iftag. Security depends on theentire message being authenticated were available at once. The ideas which underlie UMAC go back to Wegman and Carter [12]. Thesender and receiversharesharing a randomly-chosen secretkey (the MAC key) which determines: * The key for a "universal hash function". This hash function is "non-cryptographic", in the sense that it does not need to have any cryptographic "hardness" property. Rather, it needs to satisfy some combinatorial property, which can be proven to hold without relying on unproven hardness assumptions. The concept of a universalhash function(family) is due to [5]. * The key for aand pseudorandomfunction.string. This iswhere one needs a cryptographic hardness assumption. The pseudorandom function may be obtained (for example) fromachieved by using ablock cipher or cryptographickeyed hashfunction. The concept of a pseudorandomfunction(family) is due to [6]. To authenticate a message, Msg, one first applies the universal hash function, resulting in a string which is typically much shorter than the original message. Theh and pseudorandom functionis applied to a nonce, and the result is used in the manner of a Vernam cipher: the authenticationf. A tag is thus generated by performing thexor of the output from the hash function and the output from the pseudorandom function. Thus, an authenticationcomputation tagis generated as AuthTag=f(Nonce)f_k(nonce) xorh(Msg). Here fh_k(message) where k isthe pseudorandom functiona secret key sharedbetween theby sender andthereceiver, andhnonce is auniversal hash function sharedvalue that changes with each generated tag. The receiver needs to know which nonce was used by thesender and the receiver. In UMAC, a shared key is usedsender, so some method of synchronizing nonces needs tokeybe used. This can be done by explicitly sending thepseudorandom function f,nonce along with the message andthen f is used for both tag generation and internally to generate all of the bits needed by the universal hash function. For a general discussion oftag, or agreeing upon thespeed and assurance advantages of this approach see, for example, [3, 7]. The universal hash function that weuseis called UHASH. It combinesof some other non-repeating value such as Krovetzet al.Expires April20012005 [Page 3] INTERNET-DRAFT UMAC October2000 several software-optimized algorithms into a multi-layered structure.2004 message number. Thealgorithm is moderately complex. Some of this complexity comes from extensive speed optimizations. For the pseudorandom function we use the block cipher of the Advanced Encryption Standard (AES). (Atnonce need not be kept secret, but care needs to be taken to ensure that, over thetimelifetime ofthis working draft,theAES definition processUMAC key, a different nonce isstill in progress. Here AES refers to the final blok cipher defined by this process.) Any block cipherused withthe same block-length (128 bits) and key-length (128 bits) could triviallyeach message. Optimized source code, performance data and papers concerning UMAC can besubstituted in placefound at http://www.cs.ucdavis.edu/~rogaway/umac/. 2 Notation and basic operations The specification ofwhat we call AES. With slightly more effort one can defineUMACusing a pseudorandom function other than a block cipher. One unusual featureinvolves the manipulation of both strings and numbers. String variables are denoted with an initial upper-case letter, whereas numeric variables are denoted in all lower case. The algorithms of UMACis that authentication-tag generation depends onare denoted in all upper-case letters. Simple functions, like those for string-length and string-xor, are written in all lower case. Whenever anonce (in addition to depending onvariable is followed by an underscore ("_"), themessage and key) Itunderscore isimperative thatintended to denote a subscript, with thenonce notsubscripted expression needing to bereused when generating authentication tags underevaluated to resolve thesame key. Thusmeaning of thenonce will normally be implemented by a counter, though any other wayvariable. For example, if i=2, then M_{2 * i} refers toachieve a non- repeating value (or almost certainly non-repeating value) is acceptable. This document specifies the procedure for generating the authentication tag from the message, key and nonce. The exact way in whichthemessage, nonce and authentication tagvariable M_4. 2.1 Operations on strings Messages to be hashed aretransmitted between sender and receiver is not specified here. It is the responsibilityviewed as strings ofthe particular applications using UMACbits which get zero- padded tospecify how the message, nonce and tag are transmitted. For example,anapplication may choose to send the three values concatenated by some encoding scheme while others may choose not to transmitappropriate byte-length. Once thenonce atmessage is padded, allif itstrings are viewed as strings of bytes. A "byte" isknown to both parties (e.g., when the noncean 8-bit string. The following notation isa shared stateused todetect replaymanipulate these strings. bytelength(S): The length ofmessages), or to send only partstring S in bytes. bitlength(S): The length ofthe bitsstring S in bits. zeroes(n): The string made of n zero-bytes. S xor T: The string which is thenonce. Section 8 discusses security considerations that are important for the proper understanding and usebitwise exclusive-or ofUMAC. To the authors' knowledge no ideas utilized in UMACS and T. Strings S and T always havebeen or will be patented. To the best oftheauthors' knowledge, it should be possible to use UMAC immediately, without any intellectual property concerns. Public-domain reference code for UMACsame length. S and T: The string which isavailable fromtheUMAC homepage: http://www.cs.ucdavis.edu/~rogaway/umac/ Other information, like timing databitwise conjunction of S andpapers, are distributed fromT. Strings S and T always have the sameURL. Krovetz et al. Expires April 2001 [Page 4] INTERNET-DRAFT UMAC October 2000 1.1 Organizationlength. S[i]: Theresti-th byte ofthis document is organized as follows: In Section 2 parameters of the named parameter sets UMAC16 and UMAC32 are described. In Section 3 we introduce the basic notations used throughout the rest of the document. Section 4 describes the methods used for generating the Vernam pad and the pseudorandom strings needed internally for hashing. In Sections 5 and 6 the universal hash function is described. Finally, in Section 7 we describe how all these components fit together in the UMAC construction. Some readers may prefer to read sections 4-7 backwards, in order to get a top-down description. Section 8 describes some security considerations in the use of UMAC. 2 Named parameter sets: UMAC16 and UMAC32 As described in [3, 4], a concrete instantiation of UMAC requires the setting of many parameters. We have chosen two sets of values for all of these parameters which allow for good performance on a wide variety of processors. For maximum performance we offer UMAC16 which is designed to exploit the vector-parallel instructions on the Intel MMX and Motorola AltiVec instruction sets. For good performance on processors which support 32- and 64-bit quantities well, we offer UMAC32. 2.1 Named parameters Throughoutthealgorithms described in this document, we have integrated moststring S (indices begin at 1). S[i..j]: The substring ofthe parameters required for a concrete UMAC instantiation as unnamed numeric constants. However, we have named six parameters and assign them the following values depending on whether one wishes to use UMAC16 or UMAC32. UMAC16 UMAC32 ------ ------ WORD-LEN 2 4 UMAC-OUTPUT-LEN 8 8 L1-KEY-LEN 1024 1024 UMAC-KEY-LEN 16 16 ENDIAN-FAVORITE LITTLE LITTLE L1-OPERATIONS-SIGN SIGNED UNSIGNED Here we give a brief explanationS consisting ofthe role each named parameter plays.bytes i through j. Krovetzet al.Expires April20012005 [Page5]4] INTERNET-DRAFT UMAC October2000 WORD-LEN: Specifies2004 S || T: The string S concatenated with string T. zeropad(S,n): The string S, padded with zero-bits to thesize in bytesnearest positive multiple ofa "word". UMAC will be significantly faster in execution ifn bytes. Formally, zeropad(S,n) = S || T, where T is theexecuting machine supports well certain operations on datatypesshortest string ofthis size. Notezero-bits (possibly empty) so thatWORD-LENS || T isnot necessarily the native wordsize of the target machine (andnon-empty and 8n divides bitlength(S || T). 2.2 Operations onsome machines a smaller value turns out to be preferable). UMAC-OUTPUT-LEN: Specifies the length of the authentication tag generated by UMAC, in bytes. L1-KEY-LEN: Specifies the "block length," in bytes, on which the hash-function initially operates. This much storage (and then some) will be needed in the run-time environmentintegers Standard notation is used forUMAC's internal keys. UMAC-KEY-LEN: Specifies the length in bytes of the user-sup- plied UMAC key. ENDIAN-FAVORITE: Specifies which endian-orientation will be fol- lowed in the reading of data to be hashed. This need not be equal to the native endianess of any specific machine running UMAC. L1-OPERATIONS-SIGN: Specifies whether the strings manipulated in the hash-functionmost mathematical operations, such as "*" for multiplication, "+" for addition and "mod" for modular reduction. Some less standard notations are defined here. a^i: The integer a raised tobe initially consid- ered as signed or unsigned integers. 2.2 Alternative instantiations Although this document only specifies two named parameter sets,thenamed parameters could be altered to suit specific authentication needs which are not adequately served by either UMAC16i-th power. ceil(x): The smallest integer greater than orUMAC32. Below, we list alternatives that are supported by this specification for each of the named parameters. WORD-LEN ::= 2equal to x. prime(n): The largest prime number less than 2^n. The prime numbers used in UMAC are: +-----+--------------------+---------------------------------------+ |4 UMAC-OUTPUT-LEN ::= 1n |2prime(n) [Decimal] |...prime(n) [Hexadecimal] |31+-----+--------------------+---------------------------------------+ | 36 | 2^36 - 5 | 0x0000000F FFFFFFFB |32 L1-KEY-LEN ::= 32| 64 |1282^64 - 59 |2560xFFFFFFFF FFFFFFC5 |...|2^28 UMAC-KEY-LEN ::= 16128 |32 ENDIAN-FAVORITE ::= BIG2^128 - 159 |LITTLE L1-OPERATIONS-SIGN ::= SIGNED0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFF61 |UNSIGNED Roughly speaking, doubling UMAC-OUTPUT-LEN approximately doubles execution time+-----+--------------------+---------------------------------------+ 2.3 String-Integer conversion operations Conversion between strings andsquares (ie. decreases)integers is done using theprobability of MAC Krovetz et al. Expires April 2001 [Page 6] INTERNET-DRAFT UMAC October 2000 forgery. Setting ENDIAN-FAVORITE to BIG causes UMAC to perform better on big-endian processors ratherfollowing functions. Each function treats initial bits as more significant thanlittle-endian processors. Setting L1-OPERATIONS-SIGN to UNSIGNED slightly increases UMAC security atlater ones. bit(S,n): Returns theexpense of complicating implementations on systems which do not support unsigned integers well. This effectively disallowsinteger 1 if theuse of Intel's MMX instructions which only support signed integers. Finally, increasing L1-KEY-LEN tends to speed tag generation on large messages, but requires more memory for processing and could potentially slow the processor by overflowing its cache. 2.3 Naming convention A concise shorthand may be used to specify an instance of UMAC. The word "UMAC" followed by up to six parameters specifies unambiguously an instance of UMAC. If only a prefixn-th bit of thesix parameters are written, itstring S isimplicitly specified that those missing parameters take on default values listed below. The format of1, otherwise returns theshorthandinteger 0 (indices begin at 1). str2uint(S): The non-negative integer whose binary representation is"UMAC-w/l/n/k/s/e", and the meaning oftheletters (and their defaults)string S. More formally, if S isas follows: w = WORD-LEN (4) l = UMAC-OUTPUT-LEN (8) n = L1-KEY-LEN (1024) k = UMAC-KEY-LEN (16) s = L1-OPERATIONS-SIGN (UNSIGNED) et bits long then str2uint(S) =ENDIAN-FAVORITE (LITTLE) Some examples UMAC-4/8/1024/16/UNSIGNED/LITTLE (Same as named set "UMAC32" ) UMAC-2/8/1024/16/SIGNED/LITTLE (Same as named set "UMAC16" ) UMAC-4/12 ("UMAC32" with 96-bit output) UMAC-2/8/4096 ("UMAC16" with 4K L1-key and) (unsigned L1-OPERATIONS ) 3 Notation and basic operations The specification of UMAC involves the manipulation of both strings and numbers. String variables are denoted with initial capitals (upper-case), whereas numeric variables are denoted in all lower- case. Global parameters are denoted in all capital letters. Simple functions, like those for string-length and string-xor, are written with all lower-case, while the algorithms of UMAC are named in all upper-case. Whenever a variable is followed by an underscore ("_"), the2^{t-1} * bit(S,1) + 2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t). Krovetzet al.Expires April20012005 [Page7]5] INTERNET-DRAFT UMAC October2000 underscore is intended to denote a subscript, with the subscripted expression needing to be evaluated to resolve the meaning2004 uint2str(n,i): The i-byte string S such that str2uint(S) = n. 2.4 Mathematical operations on strings One of thevariable. For example, if i=2, then M_{2 * i} refers to the variable M_4. We now define some basicprimary operationsfor manipulating strings and numbers,in UMAC is repeated application of addition andfor converting between the two. 3.1 Operationsmultiplication onstrings In this specification, we view the messages to be hashed (as well as the keys used for hashing) as strings of bytes. A "byte" is an 8-bit string. The algorithms have been designed so that they are easily extendable to allow arbitrary bit-strings, if necessary. We use the following notation to manipulate thesestrings.length(S): The length of string S in bytes. zeroes(n): The string made of n zero-bytes. S xor T:Thestring which is the bitwise exclusive-or of Soperations "+_32", "+_64" andT. Strings S"*_64" are defined "S +_32 T" as uint2str(str2uint(S) + str2uint(T) mod 2^32, 4), "S +_64 T" as uint2str(str2uint(S) + str2uint(T) mod 2^64, 8) andT must have"S *_64 T" as uint2str(str2uint(S) * str2uint(T) mod 2^64, 8). These operations correspond well with thesame length. Saddition andT: The stringmultiplication operations which are performed efficiently on registers by modern computers. 2.5 ENDIAN-SWAP: Adjusting endian orientation Message data isthe bitwise conjunction of S and T. Strings S and T must have the same length. S[i]: The i-th byte of the string S (indices begin at 1). S[i..j]: The substring of S consisting of bytes i through j. S || T: Theread little-endian to speed tag generation on little- endian computers. On little-endian processors, this is a free operation. 2.5.1 ENDIAN-SWAP Algorithm Input: S, stringS concatenatedwith length divisible by 4 bytes. Output: T, stringT. zeropad(S,n): The string S, paddedS withzero-bytes toeach 4-byte word endian-reversed. Compute T using thenearest non-zero multiple offollowing algorithm. // // Break S into 4-byte chunks // nbytes. Formally, zeropad(S,n)=S || zeroes(i), where i is the smallest nonnegative integer such that S || zeroes(i) is non-empty and n divides length(S)+i. 3.1.1 ENDIAN-SWAP: Adjusting endian orientation This routine is used to make the data input to UMAC conform to the ENDIAN-FAVORITE global parameter. Krovetz et al. Expires April 2001 [Page 8] INTERNET-DRAFT UMAC October 2000 3.1.1.1 Discussion The most time consuming portion of many UMAC computations involves the reading of key and message data from memory. Because big- and little-endian computers will read these bytes differently, specifying a particular endian-orientation for UMAC could have significant performance ramifications. If necessary, the key-bytes can be preprocessed once during key setup to eliminate the need for their reorientation during performance-critical tag generation. But, message data presumably cannot be preprocessed. Any reorientation needed for each message must be done during tag generation, introducing a significant penalty to computers whose native endian- orientation is opposite to that specified for UMAC. Therefore, UMAC defines a parameter, ENDIAN-FAVORITE, which allows UMAC to be specified to favor big- or little-endian memory conventions. If the parameter is set to favor little-endian computers, then we specify the reversal of the bytes of every word in the input message using the following support function. By reversing the data in the specification, an implementation on a little-endian machine would in fact do nothing but read the input data using native-endian word loads. The loads would automatically reverse the bytes within each word, fulfilling the requirements of the specification. Any other endian reorientation needed to comply with the specification requires an insignificant amount of time during each tag calculation. 3.1.1.2 Interface Function Name: ENDIAN-SWAP Input: S, string with length divisible by WORD-LEN bytes. Output: T, string S with each word endian-reversed. 3.1.1.3 Algorithm Compute T using the following algorithm. // // Break S into word-size chunks // n = length(S) / WORD-LEN Let S_1, S_2, ..., S_n be strings of length WORD-LEN bytes so that S_1bytelength(S) / 4 Let S_1, S_2, ..., S_n be strings of length 4 bytes so that S_1 || S_2 ||..... || S_n = S. // // Byte-reverse each chunk, and build-up TKrovetz et al. Expires April 2001 [Page 9] INTERNET-DRAFT UMAC October 2000// T = <empty string> for i = 1 to n do Let W_1, W_2,..., W_{WORD-LEN}W_3, W_4 be bytes Krovetz Expires April 2005 [Page 6] INTERNET-DRAFT UMAC October 2004 so that W_1 || W_2 ||...W_3 ||W_{WORD-LEN}W_4 = S_i SReversed_i =W_{WORD-LEN}W_4 ||W_{WORD-LEN - 1}W_3 ||...W_2 || W_1 T = T || SReversed_i end for Return T3.2 Operations on integers In this specification, we generally use standard notation for mathematical operations, such as "*" for multiplication, "+" for addition3 Key and"mod" for modular reduction. Some less standard notationspad derivation functions Pseudorandom bits aredefined here. a^i: The integer a raised toneeded internally by UHASH and at theinteger i-th power. lg a: The base-2 logarithmtime ofinteger a. floor(x):tag generation. Thelargest integer less than or equalfollowing two functions use a block cipher tox. ceil(x): The smallest integer greater than or equalgenerate these bits. All references to AES refer tox. prime(n): The largest prime number less than 2^n. The prime numbers used in UMAC are: +-----+--------------------+---------------------------------------+ | x | prime(x) [Decimal] | prime(x) [Hexadecimal] | +-----+--------------------+---------------------------------------+ | 19 | 2^19 - 1 | 0x0007FFFF | | 32 | 2^32 - 5 | 0xFFFFFFFB | | 36 | 2^36 - 5 | 0x0000000F FFFFFFFB | | 64 | 2^64 - 59 | 0xFFFFFFFF FFFFFFC5 | | 128 | 2^128 - 159 | 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFF61 | +-----+--------------------+---------------------------------------+ 3.3 String-Integer conversion operations We convert between strings and integers usingthefollowing functions. Each128-bit key encryption functiontreats initial bits as more significant than later ones. Krovetz et al. Expires April 2001 [Page 10] INTERNET-DRAFT UMAC October 2000 bit(S,n): Returnsof theinteger 1 ifAdvanced Encryption Standard (AES) [1]. 3.1 KDF: Key-derivation function The key-derivation function generates pseudorandom bits used to key then-th bithash functions. 3.1.1 KDF Algorithm Input: K, string of length 16 bytes // key to AES index, an integer in the range 0...7. numbytes, a positive integer. Output: Y, stringS is 1, otherwise returnsof length numbytes bytes. Compute Y using theinteger 0 (indices begin at 1). Here n must be between 1 and the bit- lengthfollowing algorithm. // // Calculate number ofS. str2uint(S): The non-negative integer whose binary representation is the string S. More formally, if S is t bits long then str2uint(S)AES iterations, set indexed starting point // n =2^{t-1} * bit(S,1) + 2^{t-2}ceil(numbytes / 16) B = uint2str((2 *bit(S,2) + ...index +2^{1} * bit(S,t-1)1)^2 +bit(S,t). uint2str(n,i): The i-byte string S such that str2uint(S)index, 1) xor uint2str(90, 1) T =n. If no such string exists then uint2str(n,i) is unde- fined. str2sint(S): The integer whose binary representationB repeated 16 times Y = <empty string> // // Build Y using AES intwo's- complement is the string S. More formally, if S is t bits long then str2sint(S)a feedback mode // for i =-2^{t-1} * bit(S,1) + 2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t). sint2str(n,i): The i-byte string S such that str2sint(S)1 to n do T =n. If no such string exists then sint2str(n,i) is unde- fined. 3.4 Mathematical operations on strings One of the primary operations in the universal hashing part ofT[1..15] || uint2str(i mod 256, 1) T = AES(K, T) Krovetz Expires April 2005 [Page 7] INTERNET-DRAFT UMACis repeated application of additionOctober 2004 Y = Y || T end for Y = Y[1..numbytes] Return Y 3.2 PDF: Pad-derivation function This function takes a key andmultiplication on strings. We use "+_n"a nonce and"*_n" to denote the string operations whichreturns a pseudorandom pad for use in tag generation. A pad of length 4-, 8- or 12-bytes can be generated. Notice that pads generated using nonces that differ only in their last bit (when generating 8-byte pads) or last two bits (when generating 4-byte pads) areequivalent to addition and multiplication modulo 2^n, respectively. These operations correspond exactly withderived from theadditionsame AES encryption. This allows caching andmultiplication operations which are performed efficiently on registers by modern computers. So, when n is 16, 32 or 64, these operations can be preformed by computers very quickly. There are two interpretationssharing a single AES invocation for sequential nonces. 3.2.1 PDF Algorithm Input: K, string of length 16 bytes Nonce, string of length 1 to 16 bytes. padlen, theoperators depending on whether the strings are interpreted as signedinteger 4, 8 orunsigned integers. The global parameter L1-OPERATIONS-SIGN determines which interpretation is made. If strings S12. Output: Y, string of length padlen bytes. Compute Y using the following algorithm. // // Extract andT are interpreted as signed integers (that is, L1-OPERATIONS-SIGN == SIGNED) then "S *_n T" as uint2str(str2sint(S) * str2sint(T)zero low bit(s) of Nonce if needed // if (padlen = 4) index = str2uint(Nonce) mod2^n, n/8), and "S +_n T" as uint2str(str2sint(S) + str2sint(T)4 Nonce = Nonce xor uint2str(index, bytelength(Nonce)) else if (padlen = 8) index = str2uint(Nonce) mod2^n, n/8).2 Nonce = Nonce xor uint2str(index, bytelength(Nonce)) end if // // Make Nonce 16 bytes by appending zeroes if needed // Nonce = Nonce || zeroes(16 - bytelength(Nonce)) // Krovetzet al.Expires April20012005 [Page11]8] INTERNET-DRAFT UMAC October2000 If strings S2004 // Generate subkey, AES and extract indexed substring // K' = KDF(K, 0, 16) Tare interpreted as unsigned integers (that is, L1-OPERATIONS-SIGN == UNSIGNED) then we define "S *_n T" as uint2str(str2uint(S) * str2uint(T) mod 2^n, n/8), and "S +_n T" as uint2str(str2uint(S)= AES(K', Nonce) if (padlength = 4) Y = T[1 +str2uint(T) mod 2^n, n/8). In any case, the number n must be divisible by 8. In this document we use S *_16 T, S *_32 T, S *_64 T, S +_32 T and S +_64 T, corresponding to multiplication of 2,(index*4) .. 4and+ (index*4)] else if (padlength = 8) Y = T[1 + (index*8) .. 8byte numbers, and the addition of 4 and 8 byte numbers.+ (index*8)] else Y = T[1..padlen] end if Return Y 4Key and pad derivation functions UMAC, as describedUMAC tag generation Tag generation for UMAC proceeds by using UHASH (defined inthis document, requires either a 16- or 32-byte key which is used with a key-derivation function (KDF) to produce pseudorandom bits needed withintheuniversalnext section) to hashfunction. 4.1 KDF: Key derivation function Stretchingtheuser-supplied key into pseudorandom bits used internally by UMAC is done with a key-derivation function (KDF). In this section we define a KDF which is efficiently instantiated with a block cipher. The Advanced Encryption Standard (AES) is used in output-feedback modemessage, applying the PDF toproducetherequired bits. If UMAC-KEY-LEN is 16, thennonce and computing the128-bit key/128-bit block-length variantxor ofAES is used, and if UMAC-KEY-LEN is 32, thenthe256-bit key/128-bit block- length variant is used.resulting strings. TheKDF requires an "index" parameter. Usinglength of thesame key, but different indices, generates different pseudorandom outputs. 4.1.1 Interface Function Name: KDFpad and hash can be either 4, 8 or 12 bytes. 4.1 UMAC Algorithm Input: K, string of lengthUMAC-KEY-LEN bytes // key to AES index, a non-negative integer16 bytes. M, string of length less than256. numbytes, a positive integer.2^67 bits. Nonce, string of length 1 to 16 bytes. taglen, the integer 4, 8 or 12. Output:Y,AuthTag, string of lengthnumbytestaglen bytes.Krovetz et al. Expires April 2001 [Page 12] INTERNET-DRAFT UMAC October 2000 4.1.2 AlgorithmComputeYAuthTag using the following algorithm.// // Calculate number of AES iterations, set indexed starting point // nHashedMessage =ceil(numbytes / 16) TUHASH(K, M, taglen) Pad =zeroes(15) || uint2str(index, 1) YPDF(K, Nonce, taglen) AuthTag =<empty string> // // Build Y using AES in a feedback mode //Pad xor HashedMessage Return AuthTag 4.2 UMAC-32, UMAC-64 and UMAC-96 The preceding definition fori = 1 to n do T = AES(K, T) Y = Y || T Y = Y[1..numbytes] Return Y 4.2 PDF: Pad-derivation function The Wegman-Carter MAC scheme used inUMACrequireshas an input parameter "taglen" which specifies theexclusive-orlength ofa pseudorandom string withtag generated by theoutput fromalgorithm. The following aliases define names that make tag length explicit in theuniversalKrovetz Expires April 2005 [Page 9] INTERNET-DRAFT UMAC October 2004 name. UMAC-32(K, M, Nonce) = UMAC(K, M, Nonce, 4) UMAC-64(K, M, Nonce) = UMAC(K, M, Nonce, 8) UMAC-96(K, M, Nonce) = UMAC(K, M, Nonce, 12) 5 UHASH: Universal hashfunction. The pseudorandom string is obtained by applying a pad- derivationfunction(PDF) toUHASH is anonce which, for security reasons, must change with each authentication-tag computation. Nonces may be any number of bytes from 1 to 16, but all nonces inkeyed hash function, which takes as input asingle authentication session must bestring ofequal length. In this section we define a PDF which is efficiently instantiated witharbitrary length, and produces ablock cipher. Again we use AES with either 16-4-, 8- or32-bytes keys depending on the value of UMAC-KEY-LEN. 4.2.1 Discussion The PDF12-byte output. UHASH does its work in three stages, or layers. A message is first hashed by L1-HASH, its output isexclusive-or'd with the result ofthen hashed by L2-HASH, whose output is then hashed by L3-HASH. If theuniversal hash function. AES, however, may provide more or fewer bits per invocationmessage being hashed is no longer thanare needed for this purpose. For example, UMAC- OUTPUT-LEN1024 bytes, then L2-HASH isnormally 8 bytes and AES producesskipped as anoutputoptimization. Because L3-HASH outputs a string whose length is only four bytes long, multiple iterations of16 bytes. It would save processing timethis three-layer hash are used ifhalfa total hash-output longer than four bytes is requested. To reduce memory use, L1-HASH reuses most ofthe AES output bits could be used to generate one tag, and then the second half of the same AES output could be used for the tag of the next message. For this reason, we include an optimization which allows the use of different substringsits key material between iterations. A significant amount ofthe same AES output. This optimizationinternal key isKrovetz et al. Expires April 2001 [Page 13] INTERNET-DRAFT UMAC October 2000 effective only when nonces are sequential. We dorequired for UHASH, but it remains constant soby using the low bits of the noncelong asan index into the AES output, whichUMAC's key is unchanged. It isgenerated using the higher bits ofthenonce which are not used for indexing. This speeds message authentication by reducingimplementor's choice whether to generate theaverage time spent by AES forinternal keys eachauthentication. Note that iftime acounter-variablemessage isusedhashed, or toexploit this optimization,cache them between messages. Please note that UHASH has certain combinatoric properties making it suitable for Wegman-Carter message authentication. UHASH is not a cryptographic hash function andthe variableisstorednot a suitable general replacement for functions like SHA-1. UHASH is presented here inmemory, then the variable must be treated as big-endian. If UMAC- OUTPUT-LENa top-down manner. First UHASH islarger than 16,described, thentwo AES invocations are required to produce a sufficient numbereach ofbits. 4.2.2 Interface Function Name: PDFits component hashes are presented. 5.0.1 UHASH Algorithm Input: K, string of lengthUMAC-KEY-LEN bytes // key for AES Nonce,16 bytes. M, string of length1 to 16 bytes.less than 2^67 bits. outlen, the integer 4, 8 or 12. Output: Y, string of lengthUMAC-OUTPUT-LENoutlen bytes.4.2.3 AlgorithmCompute Y using the following algorithm. // //Make Nonce 16One internal iteration per 4 bytesby prepending zeroesof output //NonceKrovetz Expires April 2005 [Page 10] INTERNET-DRAFT UMAC October 2004 iters =Nonce || zeroes(16 - length(Nonce))outlen / 4 // //If one AES invocation is enoughDefine total key needed formore than oneall iterations using KDF. //PDF invocation. // if (UMAC-OUTPUT-LEN <= 8) then // // Compute number of index bits neededL1Key and L3Key1 both reuse most key between iterations. //iL1Key =floor(16 / UMAC-OUTPUT-LEN) numlowbitsKDF(K, 1, 1024 + (iters - 1) * 16) L2Key =floor(lg(i)) // // Extract index bits and zero low bits of Nonce // nlowbitsnumKDF(K, 2, iters * 24) L3Key1 =str2uint(Nonce) mod 2^numlowbits NonceKDF(K, 3, iters * 64) L3Key2 =Nonce xor uint2str(nlowbitsnum, 16) Krovetz et al. Expires April 2001 [Page 14] INTERNET-DRAFT UMAC October 2000KDF(K, 4, iters * 4) // //Generate subkey, AES andFor each iteration, extractindexed substringkey and three-layer hash. //K'If bytelength(M) <= 1024, then skip L2-HASH. // Y =KDF(K, 128, UMAC-KEY-LEN) T<empty string> for i =AES(K', Nonce) Y1 to iters do L1Key_i =T[ nlowbitsnumL1Key [(i-1) *UMAC-OUTPUT-LEN16 + 1 ..(nlowbitsnum(i-1) * 16 +1)1024] L2Key_i = L2Key [(i-1) *UMAC-OUTPUT-LEN] else // // Repeated AES calls to build length // K_124 + 1 .. i * 24] L3Key1_i =KDF(K, 128, UMAC-KEY-LEN) K_2L3Key1[(i-1) * 64 + 1 .. i * 64] L3Key2_i =KDF(K, 129, UMAC-KEY-LEN)L3Key2[(i-1) * 4 + 1 .. i * 4] A = L1-HASH(L1Key_i, M) if(UMAC-OUTPUT-LEN(bitlength(M) <=16) Ybitlength(L1Key_i)) then B =AES(K_1, Nonce)zeroes(8) || A elseYB =AES(K_1, Nonce) || AES(K_2, Nonce)L2-HASH(L2Key_i, A) end if C = L3-HASH(L3Key1_i, L3Key2_i, B) Y =Y[1..UMAC-OUTPUT-LEN]Y || C end for Return Y5 UHASH-32: Universal5.1 L1-HASH: First-layer hashfunction for a 32-bit word size UHASH is a keyedThe first-layer hashfunction, which takes as inputbreaks the message into 1024 byte chunks and hashes each with astringfunction called NH. The concatenation ofarbitrary length, and produces as outputthese hash values results in a string up to 128 times shorter than the original. 5.1.1 L1-HASH Algorithm Input: K, string offixedlength(such as 8 bytes). The actual output1024 bytes. M, string of lengthdepends on the parameter UMAC-OUTPUT-LEN. UHASH has been shown to be epsilon-ASU ("Almost Strongly Universal"), where epsilon is a small (parameter-dependent) real number. Informally, saying that a keyed hash function is epsilon-ASU means that for any two distinct fixed input strings, the two outputs of the hash function with a random key "look almost like a pair of random strings". The number epsilon measures how non-random the output strings may be. For details, see [3, 4, 11]. UHASH has been designed to be fast by exploiting several architectural features of modern commodity processors. It was specifically designed for use in UMAC. But UHASH is useful beyond that domain, and can be easily adopted for other purposes. UHASH does its work in three layers. First, a hash function called NH [3] is used to compress input messages into strings which are typically many times smallerless thanthe input message. Second, the compressed message is hashed with an optimized "polynomial hash2^67 bits. Krovetzet al.Expires April20012005 [Page15]11] INTERNET-DRAFT UMAC October2000 function" into a fixed-length 16-byte string. Finally, the 16-byte string is hashed using an "inner-product hash" into a2004 Output: Y, string of lengthWORD-LEN bytes. These three layers are repeated (with a modified key) until the outputs total UMAC-OUTPUT-LEN(8 * ceil(bytelength(M)/1024)) bytes.Note: Because the repetitions of the three-layer scheme are independent (aside from sharing some internal key), it follows that each "word" ofCompute Y using thefinal output canfollowing algorithm. // // Break M into 1024 byte chunks (final chunk may becomputed independently. Hence, to compute a prefix of a UMAC tag, one can simply repeat the three-layer scheme fewer times. Thus, computing a prefix of the tag canshorter) // t = ceil(bytelength(M) / 1024) Let M_1, M_2, ..., M_t bedone significantly faster than computingstrings so that M = M_1 || M_2 || ... || M_t, and bytelength(M_i) = 1024 for all 0 < i < t. // // For each chunk, except thewhole tag. 5.1 NH-32:last: endian-adjust, NHhashing with a 32-bit word size The first of the three hash-layers that UHASH uses is the NH hash function [3]. More than any other part of UHASH, NH is designed to be fast on modern processors, because it is where the bulk of the UHASH work is done. The NH universalhashfunction hashes an input string M using a key K by considering M// andKadd bit-length. Use results tobe arrays of integers, each WORD-LEN bytes in length, and performing a sequence of arithmetic operations on them.build Y. // Len = uint2str(1024 * 8, 8) Y = <empty string> for i = 1 to t-1 do ENDIAN-SWAP(M_i) // See[3]endian discussion in section 3.1.1 Y = Y || (NH(K, M_i) +_64 Len) end fordefinitions, proofs and rationale relating// // For the last chunk: pad toNH. The NH-32 algorithm is designed32-byte boundary, endian-adjust, // NH hash and add bit-length. Concatenate the result toperform well on processors which support well multiplications of 32-bit operands into 64-bit results. NH-32Y. // Len = uint2str(bitlength(M_t), 8) M_t = zeropad(M_t, 32) ENDIAN-SWAP(M_t) Y = Y || (NH(K, M_t) +_64 Len) return Y 5.1.2 NH Algorithm Because this routine isalso designedapplied directly toexploit the recent trend of including instructions for small-scale vector parallelism in uniprocessor CPUs. Intel's Streaming SIMD 2 instruction set is a good exampleevery bit ofthis trend. It supports an instruction, which multiplies two pairsinput data, optimized implementation of32-bit operands into two 64-bit results, which can be used by UHASH-32 for accelerated hashing. To accommodate this parallelism, NH-32 accesses data-words in pairs which are 4 words (16 bytes) apart. 5.1.1 Interface Function Name: NH-32it yields great benefit. Input: K, string of lengthL1-KEY-LEN1024 bytes. M, string with length divisible by 32 bytes. Output: Y, string of length 8 bytes. Krovetzet al.Expires April20012005 [Page16]12] INTERNET-DRAFT UMAC October2000 5.1.2 Algorithm2004 Compute Y using the following algorithm. // // Break M and K into 4-byte chunks // t =length(M)bytelength(M) / 4 Let M_1, M_2, ..., M_t be 4-byte strings so that M = M_1 || M_2 ||..... || M_t. Let K_1, K_2, ..., K_t be 4-byte strings so that K_1 || K_2 ||..... || K_t is a prefix of K. // // Perform NH hash on the chunks, pairing words for multiplication // which are 4 apart to accommodate vector-parallelism. // Y = zeroes(8) i = 1 while (i < t) do Y = Y +_64 ((M_{i+0} +_32 K_{i+0}) *_64 (M_{i+4} +_32 K_{i+4})) Y = Y +_64 ((M_{i+1} +_32 K_{i+1}) *_64 (M_{i+5} +_32 K_{i+5})) Y = Y +_64 ((M_{i+2} +_32 K_{i+2}) *_64 (M_{i+6} +_32 K_{i+6})) Y = Y +_64 ((M_{i+3} +_32 K_{i+3}) *_64 (M_{i+7} +_32 K_{i+7})) i = i + 8 end while Return Y 5.2L1-HASH-32: First-layerL2-HASH: Second-layer hashTo limitThe second-layer rehashes thelength of key required inL1-HASH output using a polynomial hash called POLY. If thefirst layeroutput of L1-HASH is long, then POLY is called once on a prefix ofhashing, L1-HASH-32 breakstheinput message into chunks no longer than L1-KEY-LENL1-HASH output andNH hashes each with a key of the same length. 5.2.1 Discussion The NH hash function requires a key which is just as long as the message being hashed. To limit the amount of key used inthen called using different settings on theNHremainder. (This two-step hashinglayer, we use a key of fixed length (defined by the parameter L1-KEY-LEN), and process the message in chunks of this length (or less). The L1-HASH-32 algorithm takes an input message and breaks it into chunksofL1-KEY-LEN bytes (except the last chuck, which may be shorter and may need to be zero-padded to an appropriate length). Each chunk is hashed with NH-32, and the outputs from all the NH invocations are annotated with some length information and concatenated to producethefinal L1-HASH-32 result. Krovetz et al. Expires April 2001 [Page 17] INTERNET-DRAFT UMAC October 2000 If ENDIAN-FAVORITEL1-HASH output isLITTLE, then each word inonly needed if theinputinitial message length isrequired to be endian reversed. 5.2.2 Interface Function Name: L1-HASH-32greater than 16 megabytes.) 5.2.1 L2-HASH Algorithm Input: K, string of lengthL1-KEY-LEN24 bytes. M, string of length less than 2^64 bytes. Output: Y, string of length(8 * ceil(length(M)/L1-KEY-LEN))16 bytes.5.2.3 AlgorithmComputeYy using the following algorithm. Krovetz Expires April 2005 [Page 13] INTERNET-DRAFT UMAC October 2004 // //Break M into L1-KEY-LEN byte chunks (final chunk may be shorter)Extract keys and restrict to special key-sets //tMask64 =ceil(length(M) / L1-KEY-LEN) Let M_1, M_2, ..., M_t be strings so that Muint2str(0x01ffffff01ffffff, 8) Mask128 =M_1 || M_2 || .. || M_t,uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16) k64 = str2uint(K[1..8] andlength(M_i)Mask64) k128 =L1-KEY-LEN for all 0 < i < t.str2uint(K[9..24] and Mask128) // //For each chunk, except the last: endian-adjust, NHIf M is no more than 2^17 bytes, hash under 64-bit prime, // otherwise, hash first 2^17 bytes under 64-bit prime andadd bit-length. Use results to build Y.//Len = uint2str(L1-KEY-LEN * 8, 8) Y = <empty string> for i = 1 to t-1 doremainder under 128-bit prime. // if(ENDIAN-FAVORITE == LITTLE)(bytelength(M) <= 2^17) then //See endian discussion ENDIAN-SWAP(M_i)2^14 64-bit words //in section 3.1.1 Y = Y || (NH-32(K, M_i) +_64 Len)// View M as an array of 64-bit words, and use POLY modulo //For the last chunk: padprime(64) (and with bound 2^64 - 2^32) to32-byte boundary, endian-adjust, // NHhashand add bit-length. Concatenate the result to Y.it. //Leny =uint2str(length(M_t) * 8, 8) M_tPOLY(64, 2^64 - 2^32, k64, M) else M_1 =zeropad(M_t, 32)M[1 .. 2^17] M_2 = M[2^17 + 1 .. bytelength(M)] M_2 = zeropad(M_2 || uint2str(0x80,1), 16) y = POLY(64, 2^64 - 2^32, k64, M_1) y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_2) end if(ENDIAN-FAVORITE == LITTLE) then ENDIAN-SWAP(M_t)Y = uint2str(y, 16) Return Y|| (NH-32(K, M_t) +_64 Len) return Y5.2.2 POLY Algorithm Input: wordbits, The integer 64 or 128. maxwordrange, positive integer less than 2^wordbits. k, integer in the range 0 ... prime(wordbits) - 1. M, string with length divisible by (wordbits / 8) bytes. Output: y, integer in the range 0 ... prime(wordbits) - 1. Compute y using the following algorithm. // // Define constants used for fixing out-of-range words Krovetzet al.Expires April20012005 [Page18]14] INTERNET-DRAFT UMAC October2000 5.3 POLY: Polynomial hash The output from L1-HASH is a string which is shorter than, but still proportional to, that of its input. The POLY hash algorithm takes an arbitrary message and hashes it to a fixed length. 5.3.1 Discussion Polynomial hashing treats an input message as a sequence of coefficients of a polynomial, and the hash-key is the point at which this polynomial is evaluated. The security guarantee assured by polynomial hashing degrades linearly in the length of the message being hashed. If two messages of n words are hashed, then the probability they collide when hashed by POLY with a prime modulus of p is no more than n / p. For more information on the polynomial hashing schemes used in UMAC see [10]. The parameter 'wordbits' specifies the prime modulus used in the polynomial as well as the granularity (length of words) in which the input message should be broken. Because some strings of length wordbits are greater than prime(wordbits), a mechanism is needed to fix words which are not in the range 0 .. prime(wordbits) - 1. To this end, any word larger than 'maxwordrange' is split into two words guaranteed to be in range, and each is hashed by the polynomial hash. 5.3.2 Interface Function Name: POLY Input: wordbits, positive integer divisible by 8. maxwordrange, positive integer less than 2^wordbits. k, integer in the range 0 .. prime(wordbits) - 1. M, string with length divisible by (wordbits / 8) bytes. Output: y, integer in the range 0 .. prime(wordbits) - 1. 5.3.3 Algorithm Compute y using the following algorithm. // // Define constants used for fixing out-of-range words // wordbytes = wordbits / 8 Krovetz et al. Expires April 2001 [Page 19] INTERNET-DRAFT UMAC October 2000 p = prime(wordbits) offset = 2^wordbits - p marker = p - 1 // // Break M into chunks2004 // wordbytes = wordbits / 8 p = prime(wordbits) offset = 2^wordbits - p marker = p - 1 // // Break M into chunks of length wordbytes bytes // n =length(M)bytelength(M) / wordbytes Let M_1, M_2, ..., M_n be strings of length wordbytes bytes so that M = M_1 || M_2 ||..... || M_n // //For eachEach inputword, compare itword m is compared with maxwordrange. Iflargernot smaller // thenhash the words'marker' and (m - offset), both inrange.range, are hashed. // y = 1 for i = 1 to n do m = str2uint(M_i) if (m >= maxwordrange) then y = (k * y + marker) mod p y = (k * y + (m - offset)) mod p else y = (k * y + m) mod p end if end for Return y5.4 L2-HASH-32: Second-layer hash Because L1-HASH may produce quite long strings, and POLY's security guarantee degrades linearly, a scheme is required to allow long strings while ensuring that the collision probability never grows beyond a certain pre-set bound. This is accomplished by dynamically increasing the prime modulus used in the polynomial hashing as the collision probability bound is approached. 5.4.1 Discussion The probability of two n-word messages hashing to the same result when polynomially hashed with prime modulus p is as much as (n / p). To maintain a limit on the maximum collision probability, a scheme is needed to disallow (n / p) growing too large. The scheme used here hashes a number of words n_1 under modulus p_1 until (n_1 / p_1) reaches a critical point. The result of the hash-so-far is prepended to the remaining message needing to be hashed, and the hashing continues, but under a prime modulus p_2 which is substantially larger than p_1. Hashing continues for n_2 more words until (n_2 / Krovetz et al. Expires April 2001 [Page 20] INTERNET-DRAFT UMAC October 2000 p_2) also reaches a critical point, at which time a new larger prime p_3 could be used. Because polynomial hashing under a small prime modulus is often faster than hashing under a large one, this dynamic ramping-up of the polynomial's modulus provides a hash function which is faster on short messages, but still accommodates long ones. The keys used for polynomial hashing are restricted to particular subsets to allow for faster implementations on 32-bit architectures. The restrictions allow an implementor to disregard some potential arithmetic carries during computation. For more information see [10]. 5.4.2 Interface Function Name: L2-HASH-32 Input: K, string of length 24 bytes. M, string of length less than 2^64 bytes. Output: Y, string of length 16 bytes. 5.4.3 Algorithm Compute y using the following algorithm. // // Extract keys and restrict to special key-sets // Mask64 = uint2str(0x01ffffff01ffffff, 8) Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16) k64 = str2uint(K[1..8] and Mask64) k128 = str2uint(K[9..24] and Mask128) // // If M no more than 2^17 bytes, hash under 64-bit prime, // otherwise, hash first 2^17 bytes under 64-bit prime and // remainder under 128-bit prime. // if (length(M) <= 2^17) then // 2^14 64-bit words // // View M as an array of 64-bit words, and use POLY modulo Krovetz et al. Expires April 2001 [Page 21] INTERNET-DRAFT UMAC October 2000 // prime(64) (and with bound 2^64 - 2^32) to hash it. // y = POLY(64, 2^64 - 2^32, k64, M) else M_1 = M[1 .. 2^17] M_2 = M[2^17 + 1 .. length(M)] M_2 = zeropad(M_2 || uint2str(0x80,1), 16) y = POLY(64, 2^64 - 2^32, k64, M_1) y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_2) Y = uint2str(y, 16) Return Y 5.5 L3-HASH-32: Third-layer hash The output from L2-HASH-32 is 16 bytes long. This final hash function hashes the 16-byte string to a fixed length of 4 bytes using a simple inner-product hash with affine translation. A 36-bit prime modulus is used to improve security. 5.5.1 Interface Function Name: L3-HASH-32 Input: K1, string of length 64 bytes. K2, string of length 4 bytes. M, string of length 16 bytes. Output: Y, string of length 4 bytes. 5.5.2 Algorithm Compute Y using the following algorithm. y = 0 // // Break M and K1 into 8 chunks and convert to integers // for i = 1 to 8 do M_i = M [(i - 1) * 2 + 1 .. i * 2] K_i = K1[(i - 1) * 8 + 1 .. i * 8] Krovetz et al. Expires April 2001 [Page 22] INTERNET-DRAFT UMAC October 2000 m_i = str2uint(M_i) k_i = str2uint(K_i) mod prime(36) // // Inner-product hash, extract last 32 bits and affine-translate // y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(36) y = y mod 2^32 Y = uint2str(y, 4) Y = Y xor K2 Return Y 5.6 UHASH-32: Three-layer universal hash The hash functions L1-HASH, L2-HASH and L3-HASH are used together in a straightforward manner. A message is first hashed by L1-HASH, its output is then hashed by L2-HASH, whose output is then hashed by L3-HASH. If the message being hashed is no longer than L1-KEY-LEN bytes, then L2-HASH is skipped as an optimization. Because L3-HASH outputs a string whose length is only WORD-LEN bytes long, multiple iterations of this three-layer hash are used, with different keys each time, until UMAC-OUTPUT-LEN have been generated. To reduce memory requirements, L1-HASH and L3-HASH both reuse most of their key-material between iterations. 5.6.1 Interface Function Name: UHASH-32 Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length UMAC-OUTPUT-LEN bytes. 5.6.2 Algorithm Compute Y using the following algorithm. // // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes // streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN) // Krovetz et al. Expires April 2001 [Page 23] INTERNET-DRAFT UMAC October 2000 // Define total key needed for all iterations using KDF. // L1Key and L3Key1 both reuse most key between iterations. // L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16) L2Key = KDF(K, 1, streams * 24) L3Key1 = KDF(K, 2, streams * 64) L3Key2 = KDF(K, 3, streams * 4) // // For each iteration, extract key and three-layer hash. // If length(M) <= L1-KEY-LEN, then skip L2-HASH. // Y = <empty string> for i = 1 to streams do L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN] L2Key_i = L2Key [(i-1) * 24 + 1 .. i * 24] L3Key1_i = L3Key1[(i-1) * 64 + 1 .. i * 64] L3Key2_i = L3Key2[(i-1) * 4 + 1 .. i * 4] A = L1-HASH-32(L1Key_i, M) if (length(M) <= L1-KEY-LEN) then B = zeroes(8) || A else B = L2-HASH-32(L2Key_i, A) C = L3-HASH-32(L3Key1_i, L3Key2_i, B) Y = Y || C Y = Y[1 .. UMAC-OUTPUT-LEN] Return Y 6 UHASH-16: Universal hash function for a 16-bit word size See Section 5 (UHASH-32) for general discussion of the UHASH algorithm. Each sub-section of Section 6 will note only differences between UHASH-32 and UHASH-16. 6.1 NH-16: NH hashing with a 16-bit word size The NH-16 algorithm is designed to exploit the recent trend of including instructions for small-scale vector parallelism in uniprocessor CPUs. Intel's MMX and Mororola's AltiVec instruction sets are good examples of this trend. Both support single- instruction multiply-add instructions on vectors of 16-bit words which can be used by UHASH-16 for accelerated hashing. To accommodate this parallelism, NH-16 accesses data-words in pairs which are 8 words (16 bytes) apart. Krovetz et al. Expires April 2001 [Page 24] INTERNET-DRAFT UMAC October 2000 6.1.1 Interface Function Name: NH-16 Input: K, string of length L1-KEY-LEN bytes. M, string with length divisible by 32 bytes. Output: Y, string of length 4 bytes. 6.1.2 Algorithm Compute Y using the following algorithm. // // Break M and K into 2-byte chunks // t = length(M) / 2 Let M_1, M_2, ..., M_t be 2-byte strings so that M = M_1 || M_2 || .. || M_t. Let K_1, K_2, ..., K_t be 2-byte strings so that K_1 || K_2 || .. || K_t is a prefix of K. // // Perform NH hash on the chunks, pairing words for multiplication // which are 8 apart to accommodate vector-parallelism. // Y = zeroes(4) i = 1 while (i < t) do Y = Y +_32 ((M_{i+0} +_16 K_{i+0}) *_32 (M_{i+ 8} +_16 K_{i+ 8})) Y = Y +_32 ((M_{i+1} +_16 K_{i+1}) *_32 (M_{i+ 9} +_16 K_{i+ 9})) Y = Y +_32 ((M_{i+2} +_16 K_{i+2}) *_32 (M_{i+10} +_16 K_{i+10})) Y = Y +_32 ((M_{i+3} +_16 K_{i+3}) *_32 (M_{i+11} +_16 K_{i+11})) Y = Y +_32 ((M_{i+4} +_16 K_{i+4}) *_32 (M_{i+12} +_16 K_{i+12})) Y = Y +_32 ((M_{i+5} +_16 K_{i+5}) *_32 (M_{i+13} +_16 K_{i+13})) Y = Y +_32 ((M_{i+6} +_16 K_{i+6}) *_32 (M_{i+14} +_16 K_{i+14})) Y = Y +_32 ((M_{i+7} +_16 K_{i+7}) *_32 (M_{i+15} +_16 K_{i+15})) i = i + 16 Return Y 6.2 L1-HASH-16: First-layer hash To limit the length of key required in the first layer of hashing, L1-HASH-16 breaks the input message into chunks no longer than Krovetz et al. Expires April 2001 [Page 25] INTERNET-DRAFT UMAC October 2000 L1-KEY-LEN bytes and NH hashes each with a key of that same length. 6.2.1 Interface Function Name: L1-HASH-16 Input: K, string of length L1-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length (4 * ceil(length(M)/L1-KEY-LEN)) bytes. 6.2.2 Algorithm Compute Y using the following algorithm. // // Break M into L1-KEY-LEN byte chunks (final chunk may be shorter) // t = ceil(length(M) / L1-KEY-LEN) Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. || M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t. // // For each chunk, except the last: endian-adjust, NH hash // and add bit-length. Use results to build Y. // Len = uint2str(L1-KEY-LEN * 8, 4) Y = <empty string> for i = 1 to t-1 do if (ENDIAN-FAVORITE == LITTLE) then // See endian discussion ENDIAN-SWAP(M_i) // in section 3.1.1 Y = Y || (NH-16(K, M_i) +_32 Len) // // For the last chunk: pad to 32-byte boundary, endian-adjust, // NH hash and add bit-length. Concatenate the result to Y. // Len = uint2str(length(M_t) * 8, 4) M_t = zeropad(M_t, 32) if (ENDIAN-FAVORITE == LITTLE) then ENDIAN-SWAP(M_t) Y = Y || (NH-16(K, M_t) +_32 Len) return Y Krovetz et al. Expires April 2001 [Page 26] INTERNET-DRAFT UMAC October 2000 6.3 L2-HASH-16: Second-layer hash L2-HASH-16 differs from L2-HASH-32 by beginning the ramped hash with a smaller prime modulus. See Section 5.3 for the definition of POLY. 6.3.1 Interface Function Name: L2-HASH-16 Input: K, string of length 28 bytes. M, string of length less than 2^64 bytes. Output: Y, string of length 16 bytes. 6.3.2 Algorithm Compute Y using the following algorithm. // // Extract keys and restrict to special key-sets // Mask32 = uint2str(0x1fffffff, 4) Mask64 = uint2str(0x01ffffff01ffffff, 8) Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16) k_32 = str2uint(K[1..4] and Mask32) k64 = str2uint(K[5..12] and Mask64) k128 = str2uint(K[13..28] and Mask128) // // If M no more than 2^11 bytes, hash under 32-bit prime. // Otherwise, hash under increasingly long primes. // if (length(M) <= 2^11) then // 2^9 32-bit words y = POLY(32, 2^32 - 6, k_32, M) else if (length(M) <= 2^33) then // 2^31 32-bit words M_1 = M[1 .. 2^11] M_2 = M[2^11 + 1 .. length(M)] M_2 = zeropad(M_2 || uint2str(0x80,1), 8) y = POLY(32, 2^32 - 6, k_32, M_1) y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2) else Krovetz et al. Expires April 2001 [Page 27] INTERNET-DRAFT UMAC October 2000 M_1 = M[1 .. 2^11] M_2 = M[2^11 + 1 .. 2^33] M_3 = M[2^33 + 1 .. length(M)] M_3 = zeropad(M || uint2str(0x80,1), 16) y = POLY(32, 2^32 - 6, k_32, M_1) y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2) y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_3) Y = uint2str(y, 16) Return Y 6.4 L3-HASH-16: Third-layer hash The L3-HASH-16 algorithm differs from L3-HASH-32 by hashing under a 19-bit prime modulus (instead of a 36-bit prime modulus) and then returning a 2-byte result (instead of a 4-byte result). 6.4.1 Interface Function Name: L3-HASH-16 Input: K1, string of length 32 bytes. K2, a string of length 2 bytes. M, a string of length 16 bytes. Output: Y, a string of length 2 bytes. 6.4.2 Algorithm Compute Y using the following algorithm. y = 0 // // Break M and K1 into 8 chunks and convert to integers // for i = 1 to 8 do M_i = M[(i - 1) * 2 + 1 .. i * 2] K_i = K1[(i - 1) * 4 + 1 .. i * 4] m_i = str2uint(M_i) k_i = str2uint(K_i) mod prime(19) // Krovetz et al. Expires April 2001 [Page 28] INTERNET-DRAFT UMAC October 2000 // Inner-product hash, extract last 32 bits and affine-translate // y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(19) y = y mod 2^16 Y = uint2str(y, 2) Y = Y xor K2 Return Y 6.5 UHASH-16: Three-layer universal hash The algorithm UHASH-16 differs from UHASH-32 only in the size of its keys generated, and in that it refers to the 16-bit variants of the three-layer hash functions. 6.5.1 Interface Function Name: UHASH-16 Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length UMAC-OUTPUT-LEN bytes. 6.5.2 Algorithm Compute Y using the following algorithm. // // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes // streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN) // // Define total key needed for all iterations using KDF. // L1Key and L3Key1 both reuse most key between iterations. // L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16) L2Key = KDF(K, 1, streams * 28) L3Key1 = KDF(K, 2, streams * 32) L3Key2 = KDF(K, 3, streams * 2) // // For each iteration, extract key and three-layer hash. Krovetz et al. Expires April 2001 [Page 29] INTERNET-DRAFT UMAC October 2000 // If length(M) <= L1-KEY-LEN, then skip L2-HASH. // Y = <empty string> for i = 1 to streams do L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN] L2Key_i = L2Key [(i-1) * 28 + 1 .. i * 28] L3Key1_i = L3Key1[(i-1) * 32 + 1 .. i * 32] L3Key2_i = L3Key2[(i-1) * 2 + 1 .. i * 2] A = L1-HASH-16(L1Key_i, M) if (length(M) <= L1-KEY-LEN) then B = zeroes(12) || A else B = L2-HASH-16(L2Key_i, A) C = L3-HASH-16(L3Key1_i, L3Key2_i, B) Y = Y || C Y = Y[1 .. UMAC-OUTPUT-LEN] Return Y 7 UMAC tag generation Tag generation for UMAC proceeds as follows. Use UHASH to hash the message and apply the PDF to the nonce to produce a string to xor with the UHASH output. The resulting string is the authentication- tag. 7.1 Interface Function Name: UMAC Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Nonce, string of length 1 to 16 bytes. Output: AuthTag, string of length UMAC-OUTPUT-LEN bytes. 7.2 Algorithm Compute AuthTag using the following algorithm. if (WORD-LEN == 2) then HashedMessage = UHASH-16(K, M) else HashedMessage = UHASH-32(K, M) Krovetz et al. Expires April 2001 [Page 30] INTERNET-DRAFT UMAC October 2000 Pad = PDF(K, Nonce) AuthTag = Pad xor HashedMessage Return AuthTag 8 Security considerations As a specification of a message authentication code, this entire document is about security. Here we describe some security considerations important for the proper understanding and use of UMAC. 8.1 Resistance to cryptanalysis The strength of UMAC depends on the strength of its underlying cryptographic functions: the key-derivation function (KDF) and the pad-derivation function (PDF). In this specification it is assumed that both operations are implemented using the Advanced Encryption Standard (AES). However, the full design and specification allow for the replacement of these components. Indeed, it is straightforward to use other block ciphers or other cryptographic objects, such as SHA-1 or HMAC for the realization of the KDF or PDF. The core of the UMAC design, the UHASH function, does not depend on any "cryptographic assumptions": its strength is specified by a purely mathematical property stated in terms of collision probability, and this property is proven in an absolute sense. In this way, the strength of UHASH is guaranteed regardless of future advances in cryptanalysis. The analysis of UMAC [3, 4] shows this scheme to have "provable security", in the sense of modern cryptography, by way of tight reductions. What this means is that an adversarial attack on UMAC which forges with probability significantly exceeding the established collision probability will give rise to an attack of comparable complexity which breaks the AES, in the sense of distinguishing AES from a family of random permutations. This design approach essentially obviates the need for cryptanalysis on UMAC: cryptanalytic efforts might as well focus on AES, the results imply. 8.2 Tag lengths and forging probability A MAC algorithm5.3 L3-HASH: Third-layer hash The output from L2-HASH isused between two parties that share16 bytes long. This final hash function hashes the 16-byte string to asecret MAC key, K. Messages transmitted between these parties are accompanied by authentication tags computedfixed length of 4 bytes. 5.3.1 L3-HASH Algorithm Input: K1, string of length 64 bytes. K2, string of length 4 bytes. M, string of length 16 bytes. Output: Y, string of length 4 bytes. Compute Y usingK and a given nonce. Breakingthe following algorithm. Krovetzet al.Expires April20012005 [Page31]15] INTERNET-DRAFT UMAC October2000 the MAC means that the attacker is able to generate, on its own, a new message2004 y = 0 // // Break M(i.e. one not previously transmitted between the legitimate parties)and K1 into 8 chunks and convert tocompute onintegers // for i = 1 to 8 do M_i = M [(i - 1) * 2 + 1 .. i * 2] K_i = K1[(i - 1) * 8 + 1 .. i * 8] m_i = str2uint(M_i) k_i = str2uint(K_i) mod prime(36) end for // // Inner-product hash, extract last 32 bits and affine-translate // y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(36) y = y mod 2^32 Y = uint2str(y, 4) Y = Y xor K2 Return Y 6 Security considerations As acorrect authentication tag under the key K. This is called a forgery. Note that if the authentication tag is specified to be of length t then the attacker can trivially break the MAC with probability 1/2^t. For this the attacker can just generate any messagespecification ofits choice and tryarandom tag; obviously, the tagmessage authentication code, this entire document iscorrect with probability 1/2^t. By repeated guessesabout security. Here we describe some security considerations important for theattacker can increase linearly its probabilityproper understanding and use ofsuccess. UMAC is designedUMAC. 6.1 Resistance tomake this guessing strategy the best possible attack againstcryptanalysis The strength of UMACas long as the attacker does not invest the computational effort needed to breakdepends on the strength of its underlyingcipher, e.g. AES, used to producecryptographic functions: theone time pads used in UMAC computation. More precisely, underkey-derivation function (KDF) and theassumed strength ofpad-derivation function (PDF). In thiscipher UMAC provides for close-to-optimal security with regards to forgery probability as represented in the next table. -------------------------------------------------------------------- UHASH-OUTPUT-LEN Forging probability Approximate actual forging (bytes)specification both operations are implemented usinga random tag probability in UMAC (using a clever tag) 2 2^-16 2^-15 4 2^-32 2^-30 8 2^-64 2^-60 16 2^-128 2^-120 -------------------------------------------------------------------- Recall thattheparameter UHASH-OUTPUT-LEN specifiesAdvanced Encryption Standard (AES). However, thelengthdesign oftheUMACauthentication tag. The above table states, for example,allows for thecase of an 8-byte tag that the ideal forging probability would be 2^-64 while UMAC would withstand an actual forging probability of 2^-60. Note that under this tag length (which is the default length in UMAC) the probabilityreplacement offorging a message is well under the chance that a randomly guessed DES key is correct. DESthese components. Indeed, it isnow widely seen as vulnerable, but the problem has never been that some extraordinarily lucky attacker might, in a single guess, find the right key. Instead,straightforward to use other block ciphers or other cryptographic objects, such as (properly keyed) SHA-1 or HMAC for theproblem is that large amountsrealization ofcomputation can be thrown attheproblem until, off-line, the attacker findsKDF or PDF. The core of theright key. With UMAC, off-line computation aimed at exceedingUMAC design, theforging probabilityUHASH function, does not depend on any cryptographic assumptions: its strength ishopeless, regardlessspecified by a purely mathematical property stated in terms oftag length, as long as the underlying cipher is not broken. The only way to forgecollision probability, and this property isto interact with the entity that verifiesproven in an absolute sense [2, 5]. In this way, theMAC and to try a huge amountstrength offorgeries before oneUHASH islikely to succeed.guaranteed regardless of future advances in cryptanalysis. ThesystemUHASH function was not designed to provide Krovetzet al.Expires April20012005 [Page32]16] INTERNET-DRAFT UMAC October2000 architecture will determine the extent to which this is possible. In a well-architected system there should not be any high-bandwidth capability for presenting forged MACs2004 cryptographic collision resistance properties, as does SHA-1, anddetermining if they are valid. In particular, the number of authentication failures at the verifying partyso should not belimited. Ifused as alarge numbersubstitute for it. The analysis ofsuch attempts are detected the session keyUMAC [2, 5] shows this scheme to have provable security, inuse should be dropped andtheevent be recorded in an audit log. Let us reemphasize: a forging probabilitysense of1 / 2^60 does not mean that there is an attack that runs in 2^60 time - as long as AES maintains its believed security theremodern cryptography, by way of tight reductions. What this means isno suchthat an adversarial attackfor UMAC. Instead, a 1 / 2^60 forgingon UMAC which forges with probabilitymeans that ifsignificantly exceeding the established collision probability will give rise to anattacker could try out 2^60 MACs, thenattack of comparable complexity which breaks theattacker would probably get one right. ButAES, in thearchitecturesense ofany real system should render this infeasible. One can certainly imagine an attacker havingdistinguishing AES from ahigh bandwidth channel (e.g., 1 Gbit/second or more) over which it can continually present attempted forgeries,family of random permutations. This design approach essentially obviates theattacker being signaled when a correct tagneed for cryptanalysis on UMAC: cryptanalytic efforts might as well focus on AES, the results imply. 6.2 Tag lengths and forging probability A MAC algorithm isfound, but realizing such a scenario inused between two parties that share areal system would representsecret MAC key, K. Messages transmitted between these parties are accompanied by authentication tags computed using K and amajor lapse ingiven nonce. Breaking thesecurity architecture. It should be pointed out that once an attempted forgery is successful, it is entirely possibleMAC means thatall subsequent messages under this key may be forged, too. Thisthe attacker isimportantable tounderstanding in gauging the severitygenerate, on its own, with no knowledge ofa successful forgery. In conclusion, the default 64-bit tags seem appropriate for most security architectures and applications. In cases where whentheconsequences of an authentication failure arekey K, a new message M (i.e. one notextremely severe, and whenpreviously transmitted between thesystem architecture is designedlegitimate parties) and toconscientiously limit the number of forgery attempts beforecompute on M asession is torn down, 32-bitcorrect authenticationtags may be adequate. Fortag under theparanoid, or if an attackerkey K. This isallowedcalled afantastic number of forgery tests, 96- or 128-bits may be utilized. 8.3 Selective-assurance authentication We have already remarked aboutforgery. Note that if theflexibility built into UMAC to useauthenticationtags of various lengths: shorter tags are faster to compute and one needstag is specified totransmit fewer bits, butbe of length t then theforgingattacker can trivially break the MAC with probabilityis higher. There is an additional degree of flexibility built into1/2^t. For this thedesignattacker can just generate any message ofUMAC: even if the sender generatesits choice andtransmitstry a random tag; obviously, the tag is correct with probability 1/2^t. By repeated guesses the attacker can increase linearly its probability of8 bytes, say, a receiver may electsuccess. UMAC is designed toverify onlymake this guessing strategy thefirst 4 bytes ofbest possible attack against UMAC as long as thetag, and computing that 4-byte prefix byattacker does not invest thereceiver will be substantially faster than computing whatcomputational effort needed to break thefullunderlying cipher, AES, used to produce the one time pads used in UMAC computation. More precisely, under the assumed strength of this cipher UMAC provides for close-to-optimal security with regards to forgery probability. An adversary could guess an 8-byte UMAC tagwould be. Indeed when WORD-LEN is 2 one cancorrectly with probability 1/2^64 by simply guessing a random value. The proofs of [2, 5] show that an adversary being morequickly checkclever in tag guessing can do no better than about 1/2^60. This means that for 8-byte tags the2-byte prefixideal forging probability is 2^-64 while UMAC produces an actual forging probability of at most 2^-60. This probability of forging a message is well under thetag thanchance that a randomly guessed DES key is correct. DES is now widely seen as vulnerable, but the4-byte prefix ofproblem has never been that some extraordinarily lucky attacker might, in a single guess, find thetag, one can more quickly checkright key. Instead, the4-byte prefixproblem is that large amounts ofthe tag than thecomputation Krovetzet al.Expires April20012005 [Page33]17] INTERNET-DRAFT UMAC October2000 6-byte prefix of the tag, and so forth. When WORD-LEN is 4 one2004 canmore quickly checkbe thrown at the4-byte prefix ofproblem until, off-line, thetag than an entire 8-byte tag, and so forth. This type of flexibility allows different parties who receive a MAC (as in a multicast setting) to authenticateattacker finds the right key. With UMAC, off-line computation aimed at exceeding the forging probability is hopeless as long as thetransmissionunderlying cipher is not broken. The only way to forge is to interact with theextent deemed necessaryentity that verifies the MAC and to try a huge amount of forgeries before one is likely to succeed. The system architecture will determine the extentconsistent withto which this is possible. In a well-architected system there should not be anycomputational limitshigh-bandwidth capability for presenting forged MACs and determining if they are valid. In particular, the number of authentication failures at thereceiver. Inverifying party should be limited. If ascenario where receivers are allowed to verify short prefixeslarge number oflonger tags, it is even more important that conservative policiessuch attempts arefollowed whendetected the session key in use should be dropped and the event be recorded in an audit log. Let us reemphasize: abad tagforging probability of 1 / 2^60 does not mean that there ispresented to the receiver. Because short prefixes are easieran attack that runs in 2^60 time - toforge than arethe contrary, as longones,as AES maintains its believed security there is no such attack for UMAC. Instead, a 1 / 2^60 forging probability means that if an attackermay attempt to forge short prefixes andcould try out 2^60 MACs, thenleverage information gained from these attacks to forge longer tags. If the attacker can learn which short tags are good and which are bad,the attacker would probably get one right. It should be pointed out that once an attempted forgery is successful, it is possible, in principle, that subsequent messages under this key may beable to learn enoughforged, too. This is important toallow longer forgeries. One salient feature ofunderstand in gauging thesecurity-performance trade-off offered byseverity of a successful forgery, even though no such attack on UMAC isits usability in contexts where performance is severely constrained.known to date. Insuch cases, usingconclusion, 64-bit tags seem appropriate for most security architectures and applications. If one wants more security, at amild-security authentication tagcost of 50% more computation, UMAC can produce 96-bit tags which cannot beof significant value especiallyforged with probability better than 1/2^90. Likewise, if less security is required, with thealternative wouldbenefit of 50% less computation, UMAC can produce 32-bit tags which cannot benot to use authentication at all (a possible such scenario couldforged with probability better than 1/2^30. Great care must bethe high-speed transmission of real-time multimedia streams). Another potential scenario where short and fast-to-computetaken when using 32-bit tags because 1/2^30 forgery probability is considered fairly high. Still, high-speed low-security authentication can beuseful is for fast detection ofapplied usefully on low-value dataforgery intended as a denial of service attack. In this case, even a moderate increase in the attacker's difficulty to produce forgeries may suffice to make the attack worthless for the attacker. Moreover, being able to detect just a portion of attempted forgeries may be enough to identify the attack. 8.4or rapidly-changing key environments. 6.3 Nonce considerationsThe functionUMAC(Section 7)requires a nonce with length in the range 1 to 16 bytes. All nonces in an authentication session must be equal in length. For secure operation, no nonce value should be repeated within the life of a single UMAC session-key. To authenticate messages over a duplex channel (where two parties Krovetz Expires April 2005 [Page 18] INTERNET-DRAFT UMAC October 2004 send messages to each other), a different key could be used for each direction. If the same key is used in both directions, then it is crucial that all nonces be distinct. For example, one party can use even nonces while the other party uses odd ones. The receiving party must verify that the sender is using a nonce of the correct form. This specification does not indicate how nonce values are created, updated, or communicated between the entity producing a tag and the entity verifying a tag. The following exemplify some ofthe Krovetz et al. Expires April 2001 [Page 34] INTERNET-DRAFT UMAC October 2000the possibilities: 1. The nonce is an eight-byte [resp., four-byte] unsigned number, Counter, which is initialized to zero, which is incremented by one following the generation of each authentication tag, and which is always communicated along with the message and the authentication tag. An error occurs at the sender if there is an attempt to authenticate more than 2^64 [resp., 2^32] messages within a session. 2. The nonce is a 16-byte unsigned number, Counter, which is initialized to zero and which is incremented by one following the generation of each authentication tag. The Counter is not explicitly communicated between the sender and receiver. Instead, the two are assumed to communicate over a reliable transport, and each maintains its own counter so as to keep track of what the current nonce value is. 3. The nonce is a 16-byte random value. (Because repetitions in a random n-bit value are expected at around 2^(n/2) trials, the number of messages to be communicated in a session using n-bit nonces should not be allowed to approach 2^(n/2).) We emphasize that the value of the nonce need not be kept secret. When UMAC is used within a higher-level protocol there may already be a field, such as a sequence number, which can be co-opted so as to specify the nonce needed byUMAC.UMAC [5]. The application will then specify how to construct the nonce from this already-existing field.Note that if the nonce starts at zero and is incremented with each message then an attacker can easily ascertain the number of messages which have been sent during a session. If this is information which one wishes to deny the attacker then one might have the sender initialize the nonce to a random value, rather than to zero. Inspecting the current nonce will no longer reveal to the attacker the number of messages which have been sent during this session. This is a computationally cheaper approach than enciphering the nonce. 8.56.4 Guarding against replay attacks A replay attack entails the attacker repeating a message, nonce, and authentication tag. Insystems,many applications, replay attacks may be quitedamaging,damaging andmany applications will want to guard against them.must be prevented. In UMAC, this would normally be done at the receiver by having the receiver check that no nonce value is used twice. On a reliableKrovetz et al. Expires April 2001 [Page 35] INTERNET-DRAFT UMAC October 2000connection, when the nonce is a counter, this is trivial. On an unreliable connection, when the Krovetz Expires April 2005 [Page 19] INTERNET-DRAFT UMAC October 2004 nonce is a counter, one would normally cache some"window"window of recent nonces. Out-of-order message delivery in excess of what the window allows will result in rejecting otherwise valid authentication tags. We emphasize that it is up to the receiver when a givenmessage, nonce and tag(message, nonce, tag) triple will be deemed authentic. Certainly the tag should be valid for the message and nonce, as determined by UMAC, but the message may still be deemed inauthentic because the nonce is detected to be a replay.97 AcknowledgmentsThanks are due to David Balenson and David Carman of NAI Labs, who suggested the advantages of allowing a receiver to verify authentication tags to various forgery probabilities. Thanks are also due toDavid McGrew and ScottFluhrerFluhrer, of CiscoSystems forSystems, played a significant role in improving UMAC by encouraging us toimprove UMACpay more attention to the performanceonof short messages.Phillip Rogaway, JohnBlack, Krovetz, andTed Krovetz were supported inRogaway have received support for this work underRogaway'sNSFCAREER Award CCR-962540, and under MICRO grants 97-150, 98-129,awards 0208842, 0240000, 9624560, and99-103 fundeda gift from Cisco Systems. Funding for the RFC Editor function is currently provided byRSA Data Security, Inc., and ORINCON Corporation. Muchthe Internet Society. Appendix - Test vectors Following are some sample UMAC outputs over a collection ofRogaway's work was carried out during two sabbatical visits to Chiang Mai University. Special thanks to Prof. Darunee Smawatakul for helping to arrange these visits. 10input values. Let K = "abcdefghijklmnop" // A 16-byte UMAC key N = "bcdefghi" // An 8-byte nonce The tags generated by UMAC using key K and nonce N are: Message 32-bit Tag 64-bit Tag 96-bit Tag ------- ---------- ---------- ---------- <empty> EC085847 B9FE492F357C6DF8 3383059D11C13E532BD1E310 'a' * 3 5DA7EE32 0851FF5A9FFA52A0 822CB3E8BB47010BAEC943F8 'a' * 2^10 C8B389F9 9D459891837A7B7D 1738D423A7C728D603BE1725 'a' * 2^15 7B4291BF 2EB480D7EB0EFACA A4C9CC65CFB3A961C5456D6D 'a' * 2^20 A1AB1E5D F45D0F35F15E64D4 7E204387D5E3377F131EF03D 'a' * 2^25 961CA14D C3EAB025C055F3DB 4997FC97E4E8A0709A5842DD 'abc' * 1 CA507696 9FA667FE61D9E4C8 15DB2B4C4564B763303B8E31 'abc' * 500 87347438 D2C26550692E16F1 58BF29E24D93455AE5A05F07 The first column lists a small sample of messages which are strings of repeated ASCII 'a' bytes or 'abc' strings. The remaining columns give in hexadecimal the tags generated when UMAC is called with the corresponding message, nonce N and key K. Krovetz Expires April 2005 [Page 20] INTERNET-DRAFT UMAC October 2004 References Normative References [1]ANSI X9.9, "American NationalFIPS-197, "Advanced Encryption Standardfor Financial Institution Message Authentication (Wholesale)", American Bankers Association, 1981. Revised 1986. [2] M. Bellare, R. Canetti, and H. Krawczyk, "Keyed hash functions(AES)", National Institute of Standards andmessage authentication", Advances in Cryptology - CRYPTO '96, LNCS vol. 1109, pp. 1-15. Full version available from http://www.research.ibm.com/security/keyed-md5.html/ [3]Technology, 2001. Informative References [2] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, "UMAC: Fast and provably secure message authentication", Advances in Cryptology - CRYPTO '99, LNCS vol. 1666, pp.216-233. Full version available from http://www.cs.ucdavis.edu/~rogaway/umac [4] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, "The UMAC message authentication code", work in progress, 2000. Krovetz et al. Expires April 2001 [Page 36] INTERNET-DRAFT UMAC October 2000 To be available from http://www.cs.ucdavis.edu/~rogaway/umac [5]216-233, Springer-Verlag, 1999. [3] L. Carter and M. Wegman, "Universal classes of hash functions", Journal of Computer and System Sciences, 18 (1979), pp. 143-154.[6] O. Goldreich, S. Goldwasser and S. Micali, "How to construct random functions", Journal of the ACM, 33, No. 4 (1986), pp. 210-217. [7][4] S.Halevi and H. Krawczyk, "MMH: Software message authentication in the Gbit/second rates", Fast Software Encryption, LNCS Vol. 1267, Springer-Verlag, pp. 172-189, 1997. [8] ISO/IEC 9797-1, "Information technology - Security techniques - Data integrity mechanism using a cryptographic check function employing a block cipher algorithm", International Organization for Standardization, 1999. [9] H. Krawczyk, M. Bellare,Kent and R.Canetti, "HMAC: Keyed-hashing for message authentication", RFC-2104, February 1997. [10]Atkinson, "IP Encapsulating Security Payload (ESP)", RFC 2406, IETF, 1998. [5] T. Krovetz,and P. Rogaway, "Fast"Software-optimized universal hashingwith small keys and no preprocessing", work in progress, 2000. To be available from http://www.cs.ucdavis.edu/~rogaway/umac [11] T. Krovetz,andP. Rogaway, "Variationally universal hashing", work in progress,message authentication", UMI Dissertation Services, 2000.To be available from http://www.cs.ucdavis.edu/~rogaway/umac [12][6] M. Wegman and L. Carter, "New hash functions and their use in authentication and set equality", Journal of Computer and System Sciences, 22 (1981), pp. 265-279.11Author contact information Authors' Addresses John Black Department of Computer Science University ofNevada Reno NV 89557Colorado Boulder CO 80309 USA EMail:jrb@cs.unr.edujrblack@cs.colorado.edu Shai HaleviKrovetz et al. Expires April 2001 [Page 37] INTERNET-DRAFT UMAC October 2000IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights NY 10598 USA EMail:shaih@watson.ibm.comshaih@alum.mit.edu Krovetz Expires April 2005 [Page 21] INTERNET-DRAFT UMAC October 2004 Alejandro Hevia Department of Computer Science & Engineering University of California at San Diego La Jolla CA 92093 USA EMail: ahevia@cs.ucsd.edu Hugo KrawczykDeprtmentDepartment of Electrical Engineering Technion Haifa 32000 ISRAEL EMail: hugo@ee.technion.ac.il Ted KrovetzIntel Corporation 1900 Prairie City Road FolsomDepartment of Computer Science California State University Sacramento CA9563095819 USA EMail: tdk@acm.org Phillip Rogaway Department of Computer Science University of California Davis CA 95616 USA and Department of Computer Science Faculty of Science Chiang Mai University Chiang Mai 50200 THAILAND EMail: rogaway@cs.ucdavis.eduA Suggested application programming interface (API) /* umac.h */ typedef struct UMAC_CTX *umac_ctx_t; umac_ctx_t umac_alloc(char key[]); /* Dynamically allocate UMAC_CTX struct */ Krovetz et al. Expires April 2001 [Page 38] INTERNET-DRAFT UMAC October 2000 /* initialize variables and generate */ /* subkeys for default parameters. */ int umac_free(umac_ctx_t ctx); /* Deallocate the context structure. */ int umac_set_params(umac_ctx_t ctx, void *params); /* After default initialization, */ /* optionally set parametersFull Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to*/ /* different valuesthe rights, licenses andreset for */ /* new message. */ int umac_update(umac_ctx_t ctx, char *input, long len); /* Incorporate len bytes pointed to by */ /* input into context ctx. */ int umac_final(umac_ctx_t ctx, char tag[], char nonce[]); /* Incorporate nonce valuerestrictions contained in BCP 78, andreturn */ /* tag. Reset ctx for next message. */ int umac(umac_ctx_t ctx, char *input, long len, char tag[], char nonce[]); /* All-in-one (non-incremental) */ /* implementation ofexcept as set forth therein, thefunctions */ /* umac_update() and umac_final(). */ Each routine returns zero if unsuccessful. B Reference codeauthors retain all their rights. This document andtest vectors Seethe information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET Krovetz Expires April 2005 [Page 22] INTERNET-DRAFT UMACWorld Wide Web homepage for reference code and test vectors. http://www.cs.ucdavis.edu/~rogaway/umac/October 2004 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Krovetzet al.Expires April20012005 [Page39]23] ----