view Side-By-Side changes
Internet-Draft Ryan Moatsdraft-ietf-urn-syntax-01.txtdraft-ietf-urn-syntax-02.txt AT&T Expires in six monthsNovember 1996January 1997 URN Syntax Filename:draft-ietf-urn-syntax-01.txtdraft-ietf-urn-syntax-02.txt Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract Uniform Resource Names (URNs) are intended to serve as persistent resource identifiers. This document sets forward the canonical syntax for URNs.Support forA discussion of both existing legacy and new namespacesis discussed. Requirementsand requirements for URN presentation and transmission are presented. Finally, there is a discussion of URN equivalence and how to determine it. 1. Introduction Uniform Resource Names (URNs) are intended to serve as persistent resource identifiers and are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space.TheTherefore, the URN syntaxthereforeprovides a means to encode character data in a form that can be sent in existing protocols, transcribed on most keyboards, etc. Expires5/19/977/31/97 [Page 1] INTERNET DRAFT URN SyntaxNovember 1996January 1997 2. Syntax All URNs have the followingsyntax:syntax (phrases enclosed in quotes are REQUIRED): <URN> ::=["urn:"]"urn:" <NID> ":" <NSS> where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The leadingcase-insensitive"urn:" sequence iscurrently optional, as no closure on its definite presence or absence has been reached.case-insensitive. The Namespace IDis used to determinedetermines the _syntactic_ interpretation of the Namespace Specific String (as discussed in [1]). RFC17371630 [2] and RFC 1737 [3] each presents additionalrequirements onconsiderations for URN encoding, whichallhave implications as far as limiting syntax. On the other hand, the requirement to support existing legacy naming systems has the effect of broadening syntax. Thus, we discuss the acceptable syntax for both the Namespace Identifier and the Namespace Specific String separately. 2.1 Namespace Identifier Syntax The following is the syntax for the Namespace Identifier. To (a) be consistent with all potential resolution schemes and (b) not put any undue constraints on any potential resolution scheme, the syntax for the Namespace Identifier is: <NID> ::=<letter><let-num> [<let-hyp>*<let-num-hyp> ]<let-hyp> ::= <letter> | "-" | <let-hyp> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case This is slightly more restrictive that what is stated in RFC 1738 [4] (which allows the period "."). Further, the Namespace Identifier is case insensitive, so that "ISBN" and "isbn" refer to the same namespace. To avoid confusion with the optional "urn:" identifier, the NID "urn" is reserved and may not be used. 2.2 Namespace Specific String Syntax As required by 1737, there is a single canonical representation of the NSS portion of an URN. The format of this single canonical form follows: Expires 5/19/97 [Page 2] INTERNET DRAFT URN Syntax November 1996 <NSS> ::= <URN chars>* <URN chars> ::= <trans> | "%" <hex> <hex> <trans><let-num-hyp> ::= <upper> | <lower> | <number> |<other> <hex>"-" <let-num> ::=<number> | "A" | "B" | "C" | "D"<upper> |"E"<lower> |"F"<number> <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"<other> ::= "(" | ")" | "+" | " ":" | "=" | "?" | "@" Depending on the rules governing a namespace, valid identifiersThis is slightly more restrictive that what is stated ina namespace might contain[4] (which Expires 7/31/97 [Page 2] INTERNET DRAFT URN Syntax January 1997 allows the characters "." and "+"). Further, the Namespace Identifier is case insensitive, so thatare not members of"ISBN" and "isbn" refer to theURN character set above (<URN chars>). Such strings MUST be translated into canonical NSS format before usingsame namespace. To avoid confusion with the "urn:" identifier, the NID "urn" is reserved and MUST NOT be used. 2.2 Namespace Specific String Syntax As required by RFC 1737, there is a single canonical representation of the NSS portion of an URN. The format of this single canonical form follows: <NSS> ::= 1*<URN chars> <URN chars> ::= <trans> | "%" <hex> <hex> <trans> ::= <upper> | <lower> | <number> | <other> | <reserved> <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" <other> ::= "(" | ")" | "+" | "," | "-" | "." | ":" | "=" | "?" | "@" | ";" | "$" | "_" | "!" | "~" | "*" | "'" Depending on the rules governing a namespace, valid identifiers in a namespace might contain characters that are not members of the URN character set above (<URN chars>). Such strings MUST be translated into canonical NSS format before using them as protocol elements or otherwise passing them on to other applications. Translation is done by encoding each character outside the URN character set as a sequence of one to six octets using UTF-8 encoding, and the encoding of each of those octets as "%" followed by two characters from the <hex> character set above. The two characters give the hexadecimal representation of that octet. 2.3 Reserved characters The remaining character set left to be discussed above is the reserved character set, which contains various characters reserved from normal use. The reserved character set follows, with a discussion on the specifics of why each character is reserved. The reserved character set is: <reserved> ::= "/" | "%" Expires 7/31/97 [Page 3] INTERNET DRAFT URN Syntax January 1997 2.3.1 The "%" character The "%" character is reserved in the URN syntax for introducing the escape sequence for an octet. Literal use of the "%" character in a namespace must be encoded using "%25" in URNs for that namespace. The presence of an "%" character in an URN MUST be followed by two characters from the <hex> character set. Namespaces MAY designate one or more characters from the URN character set as having special meaning for that namespace. If the namespace also uses that character in a literal sense as well, the character used in a literal sensemustMUST be encoded with "%" followed by the hexadecimal representation of that octet. Therefore, the process of registering a namespace identifier shall include publication of a definition of which characters have a special meaningand howtoencode thesethat namespace. 2.3.2 The "/" character The "/" character is RESERVED for future developments. It might be used for denoting hierarchy to allow for relative URN processing, but the WG has not yet reached consensus on this, so such developments will be documented separately. Meanwhile, namespace developers SHOULD NOT use an unencoded "/", but rather use %-encoding for "/" ("%2F"). 2.4 Excluded characters The following list is included only for the sake of completeness. Any octets/characters on this list are explicitly NOT part of the URN character set, and if used ina literal sense. Expires 5/19/97 [Page 3] INTERNET DRAFTan URN, MUST be %encoded: <excluded> ::= octets 0-32 (0-20 hex) | "\" | """ | "#" | "&" | "<" | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | octets 127-255 (7F-FF hex) An URNSyntax November 1996ends when an octet/character from the excluded character set (<excluded>) is encountered. The character from the excluded character set is NOT part of the URN. 3. Support of existing legacy naming systems and new naming systemsURN-aware applications MAY acceptAny namespace (existing or newly-devised) that is proposed asinput other resource identifiers from existing legacy namespaces.an URN-namespace and fulfills the criteria of URN-namespaces MUST be expressed in this syntax. Ifsuch identifiersnames in these namespaces contain charactersthat are not members ofother than those defined for the URN characterset specified in section 2.2, the identifierset, they MUST be translatedtointo canonicalformatform as discussed in section 2.2.Some existing name spaces that have the properties of the URN-space contain some human-significant components,Expires 7/31/97 [Page 4] INTERNET DRAFT URN Syntax January 1997 4. URN presentation andthese exist in a wide variety of languages. However, URNs are NOT intended to convey information that is significant to humans. Whiletransport The URN syntax defines thetranslation rule in section 2.2 is provided for existing namespaces, new namespaces, as part of their registration documentation, MUST define a disciplinecanonical format forassigning newURNsthat does not simplify the generation of human-significant names. 4. URN presentationand all URN transport and interchanges MUST take place in this format. Further, all URN-aware applicationsMAY support "natural" displayMUST offer the option of displaying URNswhich contain characters encoded using "%" notation. However, they MUST providein this canonical form to allow for direct transcription (for example by cut and paste techniques). Such applications MAY support display of URNs incanonicala more human-friendly form(i.e. inand may use aformat suitable for transcription). URNscharacter set that includes characters that aren't permitted in URN syntax as defined in this RFC (that is, they mayonly be transportedreplace %-notation by characters incanonical format.some extended character set in display to humans). 5. Lexical Equivalence in URNsURNs are considered equivalent if they return the same resource.For variouspurposes,purposes such as caching,a test is necessaryit's often desirable to determineequivalence without actually resolving theif two URNsand fetching/comparingare theunderlying resources. "Lexical equivalence"same without resolving them. The general purpose means of doing so isa stricter condition that the equivalence described above (functional equivalence). 5.1 Lexical Equivalence Lexical equivalence may be determinedbycomparing two URNs without making any network accesses.testing for "lexical equivalence" as defined below. Two URNs are lexically equivalent if they are octet-by-octet equal after the followingpreprocessingpreprocessing: 1.drop any precedingnormalize the case of the leading "urn:" token 2. normalize the case of the NID 3. normalizing the case of any %-escaping Note that %-escaping MUST NOT be removed. Some namespaces may define additional lexical equivalences, such as case-insensitivity of the NSS (or parts thereof). Additional lexical equivalences MUST be documented as part of namespace registration, MUST always have the effect of eliminating some of the false negatives obtained by the procedure above, and MUST NEVERsayssay thatExpires 5/19/97 [Page 4] INTERNET DRAFT URN Syntax November 1996two URNs are not equivalent if the procedure above says they are equivalent.5.26. Examples of lexical equivalence The following URN comparisons highlight the lexical equivalence definitions: 1- URN:foo:a123/456 2- urn:foo:a123/456 3- urn:FOO:a123/456 4- urn:foo:A123/456 5- urn:foo:a123%2F456 6- URN:FOO:a123%2f456 URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not lexically equivalent any of the other URNs of the above set. URNs 5 and 6 are only lexically equivalent to each other. Expires 7/31/97 [Page 5] INTERNET DRAFT URN Syntax January 1997 7. Functional EquivalenceResolvers determine functionalin URNs Functional equivalencebased on specific rules for the namespace. Therefore,is determined by practice within a given namespace and managed by resolvers for that namespeace. Namespace registration must includedocumentationguidance on how to determine functional equivalence for thatnamespace. 5.3 Examples The following URN comparisons highlight the difference between these types of equivalence: urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv. urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv. urn:isbn:1-23485-8-29, isbn:123485829namespace, i.e. when two URNs arenot lexically equiv. but may be functionally equivalent. 6.the identical within a namespace. 8. Security considerationsBecause ofThis document specifies thenumber of potential namespaces, it must be restated thatsyntax for URNs. While some namespaces resolvers may assign special meaning to certain of the charactersinof the Namespace SpecificString may have special meaning to certain namespace resolvers. TheString, any security consideration resulting from such assignment are outside the scope of this document. It is strongly recommended that the process of registering a namespace identifiershall thereforeincludepublication of a definition of which characters have a special meaning. 7.any such considerations. 9. Acknowledgments Thanks to various members of the URN working group and <<your name here!!>> for comments on earlier drafts of this document. This document is partially supported by the National ScienceFoundation. 8.Foundation, Cooperative Agreement NCR-9218179. 10. References Request For Comments (RFC) and Internet Draft documents are available from <URL:ftp://ftp.internic.net> and numerous mirror sites. [1]L. L. Daigle, P. Faltstrom,K. R.Iannella. "A Frame- work for the AssignmentSollins, "Requirements and a Framework for URN Resolutionof Uniform Resource Names,"Systems," Internet Draft (work inprogress). Juneprogress), November 1996. [2] T. Berners-Lee, "Universal Resource Identifiers in WWW," RFC 1630, June 1994. [3] K.Sollins,Sollins and L.Masinter.Masinter, "FunctionalRequirementsRequire- ments for Uniform Resource Names," RFC 1737. DecemberExpires 5/19/97 [Page 5] INTERNET DRAFT URN Syntax November 1996 1994. [3] T. Berners-Lee. "Universal Resource Identifiers in WWW," RFC 1630. June1994. [4] T. Berners-Lee, R. Fielding, L. Masinter,M. McCahill."Uniform Resource Locators (URL),"RFC 1738.Internet Draft (work in progress), December1994. 9.1996. Expires 7/31/97 [Page 6] INTERNET DRAFT URN Syntax January 1997 11. Editor's address Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA Phone: +1 402 894-9456 EMail: jayhawk@ds.internic.net Appendix A. Handling of URNs by URL resolvers/browsers. The URN syntax has been defined so that URNs can be used in places where URLs are expected. A resolver that conforms to the current URL syntax specification [3] will extract a scheme value of "urn:" rather than a scheme value of "urn:<nid>". An URN MUST be considered an opaque URL by URL resolvers and either passed (with the "urn:" tag) to an URN resolver for resolution. The URN resolver can either be an external resolver that the URL resolver knows of, or it can be functionality built-in to the URL resolver. To avoid confusion of users, an URL browser SHOULD display the com- plete URN (including the "urn:" tag) to ensure that there is no con- fusion between URN namespace identifiers and URL scheme identifiers. This Internet Draft expiresMay 19,July 31, 1997. Expires5/19/977/31/97 [Page6]7] ----