view Side-By-Side changes
Internet-Draft Ryan Moatsdraft-ietf-urn-syntax-00.txtdraft-ietf-urn-syntax-01.txt AT&T Expires in six monthsOctoberNovember 1996 URN Syntax Filename:draft-ietf-urn-syntax-00.txtdraft-ietf-urn-syntax-01.txt Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract Uniform Resource Names (URNs) are intended to serve as persistent resource identifiers. This documentpresentssets forward the canonical syntax for URNs. Support for both existing legacy and new namespaces is discussed. Requirements for URN presentation and transmissionencoding requirementsare presented. Finally, there is a discussion of URN equivalence and how to determine it. 1. Introduction Uniform Resource Names (URNs) are intended to serve as persistent resource identifiers and are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space. The URN syntax therefore provides a means to encode character data in a form that can be sent in existing protocols, transcribed on most keyboards, etc. Expires 5/19/97 [Page 1] INTERNET DRAFT URN Syntax November 1996 2. Syntax All URNs have the following syntax: <URN> ::= ["urn:"] <NID> ":" <NSS> <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The leading case-insensitive "urn:" sequence is currently optional, as no closure on its definite presence or absence has been reached. The Namespace ID is used to determine the _syntactic_ interpretation of the Namespace Specific String (as discussed in [1]). RFC 1737 [2]suggestspresents additional requirements on URN encoding, which all have implications as far as limiting syntax. On the other hand, the requirement to support existing legacy naming systems has the effect of broadening syntax. Thus, we discuss the acceptable syntax for both the Namespace Identifier and the Namespace Specific String separately.1.12.1 Namespace Identifier Syntax The following is the syntax for the Namespace Identifier. To (a) be consistent with all potential resolution schemes and (b) not put any undue constraints on any potential resolution scheme, the syntax for the Namespace Identifier is: <NID> ::= <letter> [ <let-hyp> ] <let-hyp> ::= <letter> | "-" | <let-hyp> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case This is slightly more restrictive that what is stated in RFC 1738 [4] (which allows the period "."). Further, the Namespace Identifier is case insensitive, so that "ISBN" and "isbn" refer to the same namespace. To avoid confusion with the optional "urn:" identifier, the NID "urn" is reserved and may not be used.1.22.2 Namespace Specific String Syntax As required by 1737, there is a single canonical representation of the NSS portion of an URN. The format of this single canonical form follows: Expires 5/19/97 [Page 2] INTERNET DRAFT URN Syntax November 1996 <NSS> ::= <URN chars>* <URN chars> ::= <trans> | "%" <hex> <hex> <trans> ::= <upper> | <lower> | <number> | <other> <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" <other> ::= "(" | ")" | "+" | " ":" | "=" | "?" | "@" Depending on the rules governing a namespace, valid identifiers in a namespace might contain characters that arereserved characters in URI syntax or non-printable ASCII characters. To accommodate the largest setnot members ofvalid identifiers,theNSS portion of aURNshall use UTF-8 representation of ISO 10646 as itscharacterset. Namespaces that do not currently use ISO 10646/UTF-8 are encouraged to migrate to it. Clientsset above (<URN chars>). Such strings MUST becapable of %encoding the UTF-8 formatted NSS. %encoding, (as discussed in [3]) uses a percent sign "%" immediately followedtranslated into canonical NSS format before using them as protocol elements or otherwise passing them on to other applications. Translation is done bytwo hexadecimal digits (0-9, A-F) giving the binary code for that octet. The rules for %encoding presented in [3] apply with the following exceptions: 1. [3] states that occurrence ofencoding each character outside the'/'URN characterin URIs must denote hierarchy, so that partial forms of a URI are possible. This restriction is unenforceable, and relative URLs do not haveset as ascheme prefix, so we allow URNs to contain unescaped occurrencessequence ofthe '/' character that do not denote hierarchy. 2. As an optimization when the transport between systems is knownone tobe 8-bit-clean, clients MAY omitsix octets using UTF-8 encoding, and the%encoding on 8-bitencoding of each of those octets as "%" followed by two charactersbut MUST still %encodefrom thereserved<hex> character set above. The two charactersbelow. For historic reasons,give thecharacters "#" (%23), "?" (%3F), "%" (%25), "*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are reserved and must be %encoded. Thus client implementers should accept URNs from users in an unencoded form but must encode them before sending them to a resolver. URN resolvers MUST be capable of accepting URNs that have been %encoded for either 8-bit clean or 7-bit transports. %encoding is removed first, then UTF-8 decoding is performed. URN resolvers MUST return identical results from ANY legally encoded formhexadecimal representation ofthe URN. It should be notedthatcertainoctet. Namespaces MAY designate one or more charactersinfrom theNamespace Specific String syntax may haveURN character set as having special meaning for that namespace. If the namespace also uses that character incertain namespaces.a literal sense as well, the character used in a literal sense must be encoded with "%" followed by the hexadecimal representation of that octet. Therefore, the process of registering a namespace identifier shall include publication of a definition of which characters have a special meaning and how to encode these characters if used in a literal sense.2.Expires 5/19/97 [Page 3] INTERNET DRAFT URN Syntax November 1996 3. Support of existing legacy naming systemsTo allow for support existing legacyand new naming systems(as required by [2]),URN-aware applications MAY accept as input other resource identifiers from existing legacy namespaces. If such identifiers contain characters that are not members of theNamespace Specific String shall be considered an "opaque string"URN character set specified in section 2.2, thesense of structure except as mentioned in Section 1. In addition, URN servers shouldidentifier MUST bepreparedtranslated toaccept URNs that do not use ISO 10646/UTF-8 for those namespacescanonical format as discussed in section 2.2. Some existing name spaces thatcurrently usehave the properties of the URN-space contain some human-significant components, and these exist in adifferent encoding. Notewide variety of languages. However, URNs are NOT intended to convey information thatthisisnot a general requirement on all resolvers, only resolvers that handlesignificant to humans. While the translation rule in section 2.2 is provided for existing namespaces, new namespaces, as part of their registration documentation, MUST define anamespacediscipline for assigning new URNs thatis knowndoes notto use ISO 10646/UTF-8. 3. URN encoding for transmission Becausesimplify theNSSgeneration ofahuman-significant names. 4. URNis considered a series of octetspresentation and transport URN-aware applications MAY support "natural" display ofdata, encodingURNs which contain characters encoded using "%" notation. However, they MUST provide fortransport is the responsibilitydisplay ofthe transport mechanism and is not discussed here. Any mechanism that can handle arbitrary 8-bit data will successfully transportURNs in canonical form (i.e. in aURN. 4.format suitable for transcription). URNs may only be transported in canonical format. 5. Equivalence in URNs URNs are considered equivalent if they return the sameresult.resource. For various purposes, such as caching, a test is necessary to determine equivalence without actually resolving the URNs and fetching/comparing the underlying resources. "Lexical equivalence" is a stricter condition that the equivalence described above (functional equivalence).4.15.1 Lexical Equivalence Lexical equivalence may be determined by comparing two URNs without making any network accesses. Two URNs are lexically equivalent if they are octet-by-octet equal after the following preprocessing 1.remove any %encoding that might be present 2.drop any preceding "urn:" token3.2. normalize the case of the NID Some namespaces may define additional lexical equivalences, such as case-insensitivity of the NSS (or parts thereof). Additional lexical equivalences MUST be documented as part of namespace registration, MUST always have the effect of eliminating some of the false negatives obtained by the procedure above, and MUST NEVER says that Expires 5/19/97 [Page 4] INTERNET DRAFT URN Syntax November 1996 two URNs are not equivalent if the procedure above says they are equivalent.4.25.2 Functional Equivalence Resolvers determine functional equivalence based on specific rules for the namespace. Therefore, namespace registration must include documentation on how to determine functional equivalence for that namespace.4.35.3 Examples The following URN comparisons highlight the difference between these types of equivalence: urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv. urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv. urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv. but may be functionally equivalent.5.6. Security considerations Because of the number of potential namespaces, it must be restated that certain of the characters in the Namespace Specific String may have special meaning to certain namespace resolvers. The process of registering a namespace identifier shall therefore include publication of a definition of which characters have a specialmeaning and how to encode these characters if used in a literal sense. 6.meaning. 7. Acknowledgments Thanks to various members of the URN working group and <<your name here!!>> for comments on earlier drafts of this document. This document is partially supported by the National Science Foundation.7.8. References Request For Comments (RFC) and Internet Draft documents are available from <URL:ftp://ftp.internic.net> and numerous mirror sites. [1] L. L. Daigle, P. Faltstrom, R. Iannella. "AFrameworkFrame- work for the Assignment and Resolution of Uniform ResourceNames",Names," Internet Draft (work in progress). June 1996. [2] K. Sollins, L. Masinter. "Functional Requirements for Uniform ResourceNames",Names," RFC 1737. December Expires 5/19/97 [Page 5] INTERNET DRAFT URN Syntax November 1996 1994. [3] T. Berners-Lee. "Universal Resource Identifiers inWWW",WWW," RFC 1630. June 1994. [4] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource Locators(URL)",(URL)," RFC 1738. December 1994.8. Author's9. Editor's address Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA Phone: +1 402 894-9456 EMail: jayhawk@ds.internic.net This Internet Draft expiresApril 1,May 19, 1997. Expires 5/19/97 [Page 6] ----