view Side-By-Side changes
Network Working Group N. Walsh Internet-Draft Sun Microsystems, Inc. Expires:August 14,November 6, 2001 J. Cowan Reuters Health Information P. Grosso Arbortext, Inc.February 13,May 8, 2001 A URN Namespace for Public Identifiersdraft-walsh-urn-publicid-00draft-walsh-urn-publicid-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire onAugust 14,November 6, 2001. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document describes a URN namespace that is designed to allow Public Identifiers to be expressed in URI syntax. 1. Introduction XML[1] external entities have two identifiers: apublicsystem identifier and asystempublic identifier. The system identifier is a URI, by definition, but the public identifier is simply a string. Walsh, et. al. ExpiresAugust 14,November 6, 2001 [Page 1] Internet-Draft A URN Namespace for Public IdsFebruaryMay 2001 Historically, the system identifier of an external entity has been a local, or system-specific identifier while the public identifier has been a more global, persistent name. Unfortunately, public identifiers do not fit neatly into the existing web architecture because they are not legal URIs. Many new specifications (XSLT, XML Schema, etc.) have the implicit or explicit requirement that all external identifiers be URIs.Any string which consists only of the public identifier characters (defined by Production 13 of Extensible Markup Language (XML) 1.0 Second Edition[1]) is a legal public identifier. But SGML[3] defines a restricted subset of public identifier called a "Formal Public Identifier" (FPI). For theThe purpose of thisdocument, the significant difference betweennamespace is to allow public identifiersand FPIs is that FPIs have internal structure and may have registered owner identifiers.to be encoded in URNs in a reliable, comparable way. This document describes a scheme for representing public identifiers as URNs by introducing a public identifier namespace, "publicid". This namespace specification is for a formal namespace.2. Specification Template Namespace ID: "publicid" requested. Registration Information: Registration Version Number: 1 Registration Date: 2001-02-13 Declared registrant1.1 Public Identifiers Any string which consists only of thenamespace: Norman Walsh Sun Microsystems, Inc. One Network Drive MS UBURO2-201 Burlington, MA 01803-0902 Norman.Walsh@East.Sun.COM Declarationpublic identifier characters (defined by Production 13 ofstructure: The purposeExtensible Markup Language (XML) 1.0 Second Edition[1]) is a legal public identifier. In addition to the character set restriction, public identifiers must be normalized by changing all strings of whitespace (the characters #x20, #xD, and #xA, in thisnamespace iscontext) toallowsingle space characters (#x20), and removing all leading and trailing whitespace. In keeping with this specification's goal of allowing public identifiers to be encoded inURNs ina reliable, comparableway. To that end,way, thisdocumentspecification mandates that public identifiers beWalsh, et. al. Expires August 14, 2001 [Page 2] Internet-Draft A URN Namespace for Public Ids February 2001normalized before encoding them into URNs.As described in ISO 8879[3],Throughout this specification, we assume that normalization has already been performed. 1.2 Formal Public Identifiers SGML[3] defines a restricted subset of public identifieris normalized by removing all leading and trailing whitespace and replacing all remaining sequences of two or more whitespace characters withcalled asingle space. For public identifiers that"Formal Public Identifier" (FPI). FPIs arenot FPIs,strings composed from theNamespace Specific String (NSS) for URNs in the "publicid" namespace has the following structure: urn:publicid:{public-identifier-text} The character set of public identifiers is constrained by XML[1]. Mostsame range ofthe legal public identifier characters are also legal characters in URNs. Unless otherwise noted, thecharactersin the {public-identifier-text} are directly transcribed from the corresponding character in the public identifier. The following exceptions are made: + Spaces in the public identifier are transcribedas"+" characters. Whitespace normalization must be performed before constructing a URN in the "publicid" namespace, therefore the sequence of characters "++" should never occur in such URNs. + Literal "+" characters in the public identifier must be %-encoded. + Literal ":" characters in the public identifier must be %-encoded. + The reserved characters that may appear inpublic identifiers,"%", "/", "?", and "#", must be %-encoded. Formal Public Identifiers are a subset of public identifiers. They are strings composed from the same range of characters,buthavewith an explicit internal structure. The structure of Formal Public Identifiers is normatively described inSGML[3],SGML[3]; we review it here for convenience. Most Formal Public Identifiers consist of the following fields, in this order: an owner identifier, a public text class, a public text description, a public text language or public text designating sequence, and an optional public text display version. Walsh, et. al. Expires November 6, 2001 [Page 2] Internet-Draft A URN Namespace for Public Ids May 2001 Owner identifiers may begin with "-//" or"+//","+//"; otherwise "//" is used to delimit fields in the FPIwith(with the exception of the public text class which is delimited from the public text description by aspace.space). In other words, most FPIs look like this:Walsh, et. al. Expires August 14, 2001 [Page 3]owner//class description//language//version and most owners begin with "+//" or "-//", although they are not required to. Here are some example FPIs: +//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN//XML -//OASIS//DTD DocBook XML V4.1.2//EN -//ArborText::prod//DTD Help Navigation Document::19970708//EN ISO/IEC 10179:1996//DTD DSSSL Architecture//EN ISO 8879:1986//ENTITIES Added Latin 1//ENAnThis document describes an algorithm for encoding public identifiers into URNs that explicitly allows the structured nature of formal public identifiers to be preserved. However, an algorithm for correctly identifying a Formal Public Identifier and determining the various fields within it is out of scope for thisdocument. We begin our discussion ofdocument and not necessary for therepresentationimplementation ofFPIs in ourthis URNnamespace undernamespace. 2. Specification Template Namespace ID: "publicid" requested. Registration Information: Registration Version Number: 1 Registration Date: 2001-05-08 Declared registrant of theassumption that these steps have already been taken.namespace: Norman Walsh Sun Microsystems, Inc. One Network Drive MS UBURO2-201 Burlington, MA 01803-0902 Norman.Walsh@East.Sun.COM Declaration of structure: The Namespace Specific String (NSS) fortheURNs in the "publicid"namespace that represent FormalWalsh, et. al. Expires November 6, 2001 [Page 3] Internet-Draft A URN Namespace for PublicIdentifiers haveIds May 2001 namespace has the following structure:urn:publicid:{owner-identifier}:{text-class} :{text-description}:{language|designating-sequence} {:display-version}?urn:publicid:{transcribed-public-identifier} Where:{owner-identifier}{transcribed-public-identifier} isderived fromtheowner identifier intext of theFPI. Owner identifiers in FPIs have one of three forms: "+//" followed by a string, "-//" followed by a string, or a string that does not contain "//". The following rules apply to derive a URN {owner-identifier} from the ownerpublic identifierin an FPI: - Owner identifiers that begin "+//" are transcribed into the URN {owner-identifier} by replacing "+//" with "+:" and transcribing the remaining string. - Owner identifiers that begin "-//" are transcribed into the URN {owner-identifier} by replacing "-//" with "-:" and transcribing the remaining string. - All other {owner-identifiers} aretranscribeddirectly from the owner identifier in the FPI. {text-class} is the public text class from the FPI. The public text class of FPIs is constrained by SGML[3]according to the following13 strings: "CAPACITY", "CHARSET", "DOCUMENT", "DTD", "ELEMENTS", "ENTITIES", "LPD", "NONSGML", "NOTATION", "SHORTREF", "SUBDOC", "SYNTAX", or "TEXT". The "publicid" URN namespace explicitly relaxes this Walsh, et. al. Expires August 14, 2001 [Page 4] Internet-Draftrules: - AURN Namespace for Public Ids February 2001 constraint. Any string may be used. {text-description} is the public text description transcribed from the FPI. {language} is the public text language transcribed from the FPI. The {language} codes usedspace in"publicid" URNs should be drawn from RFC 3066[6]. {designating-sequence} isthe publictext designating sequence transcribed from the FPI. Formal Public Identifiers that describe character sets may use the designating sequence (a string defined by ISO 2022[2]) to identify the character set. {display-version} is the public text display version transcribed from the FPI. Most of the legal publicidentifiercharacters are also legal characters in URNs. Unless otherwise noted, the characters in the {owner-identifier}, {text-class}, {text-description}, {language}, {designating-sequence}, and {display-version} are directly transcribed from the corresponding character in the Formal Public Identifier. The following exceptions are made: + Spaces in the FPI areis transcribed as"+" characters."+". Whitespace normalization must be performed before constructing a URN in the "publicid" namespace, thereforethe sequence ofadjacent "+" characters"++" shouldnever occur insuch URNs. + Literal "+" charactersURNs inthe FPI, except at the beginning of {owner-identifier}s for FPIs that have the "+//"-form of owner identifier, must be %-encoded.this namespace. - The"+" characters at the beginning of {owner-identifier}s for FPIs that have the "+//"-formsequence ofowner identifier, must not be %-encoded. +characters "//" is transcribed as ":". - The sequence of characters "::" is transcribed as ";". - A literal "+" character is transcribed as %2B. - A literal ":" character (except inthe owner identifier or public text description"::") is transcribed as %3A. - A literal "/" character (except in "//") is transcribed as %2F. - A literal ";" character is transcribed as %3B. - A literal "'" character is transcribed as %27. - A literal "?" character is transcribed as %3F. - A literal "#" character is transcribed as"::"; all other uses of a%23. - A literal":" in"%" character is transcribed as %25. The special rules for "//" and "::" are designed to preserve the structured nature of formal public identifiers without requiring the translator to have special knowledge of FPImust be %-encoded. +syntax. Thereserved characters that may appear in FPIs, "%",rules for "+", ":", "/","?",and"#", must be %-encoded. A small subset of Formal Public Identifiers cannot be represented by this namespace. An FPI cannot be represented if either of the following conditions applies: + After transcription, the {owner-identifier}, {text-class}, {text-description}, {language}, or {designating-sequence} would be empty. Allowing any";" are required to preserve literal occurrences of thesefields to be empty could introduce ambiguous "::" sequences intocharacters in theURN. Walsh, et. al. Expires August 14, 2001 [Page 5] Internet-Draft A'publicid' URNNamespace for Public Ids February 2001 +namespace. TheFPI usesremaining characters, " " (space), "'", "?", "#", and "%", are theoptional unavailable text indicator defined in SGML[3] but rarely usedonly other legal characters inpractice.public identifiers that cannot be literally transcribed into a URN by the rules of RFC 2141[5] and RFC 2396[6]. Relevant ancillary documentation: Extensible Markup Language (XML) Version 1.0 Second Edition[1] Standard Generalized Markup Language (SGML)[3] Registration procedures for public text owner identifiers[4] Identifier uniqueness considerations: Walsh, et. al. Expires November 6, 2001 [Page 4] Internet-Draft A URN Namespace for Public Ids May 2001 The identifier uniqueness considerations for URNs in the "publicid" namespace are the same as the identifier uniqueness considerations for public identifiers. Formal Public Identifiers with registered owner identifiers are required tobe unique. For unregisteredbe unique. For unregistered owner identifiers and informal public identifiers, they may or may not be unique. No enforcement policy can be asserted. Identifier persistence considerations: The persistence of URNs in the "publicid" namespace is the same as the persistence of the corresponding public identifier. The "publicid" namespace is available for a wide range of uses, it cannot be subjected to a uniform persistence policy. As a general rule, formal public identifiers with registered owner identifiers are more likely to be persistent than informal public identifiers or formal public identifiers with unregistered owner identifiers. One exception to this rule is the "IDN" scheme for producing a registered owner identifier from a domain name. That scheme contains at least all the weaknesses associated with the persistence of domain names. It is important to note that a properly registered owneridentifiers and informal public identifiers, they may or may not be unique, no enforcement policyidentifier canbe asserted. Identifier persistence considerations: The persistenceapply any policy desired to the portion ofURNs inthe "publicid" URN namespaceis the same as the persistence of the corresponding publicidentified by that owner identifier. Process of identifier assignment: Identifiers in the "publicid" namespacemay beare assigned by applying thesame policies and procedures asconversions described above to a publicidentifiers.identifier. In order to provide a URN in this namespace for a resource that does not have a public identifier, one must be created (according to the rules for creating public identifiers). There is no requirement that a resource have only one public identifier. Process of identifier resolution: Identifiers in the "publicid" namespace may be resolved by the same policies and procedures as public identifiers. Public identifiers can be resolved in many different ways. Many existing systems provide facilities for resolving them by way of OASIS TR9401[8] Catalog files. Other systems resolve them Walsh, et. al. Expires November 6, 2001 [Page 5] Internet-Draft A URN Namespace for Public Ids May 2001 by mapping each component to a local pathname component. And some systems simply "know about" a fixed set of public identifiers. In addition, URNs in the 'publicid' namespace may be resolvable by other mechanisms unique to URIs (such as caches). Rules for Lexical Equivalence: Whitespace normalization is performed before constructing a URN in the "publicid" namespace, sosuchURNs are lexically equivalent if they are lexically identical. Conformance with URN Syntax: No special considerations. URNs in this namespace conform to both RFC 2141 and RFC 2396. Validation mechanism: None specified.Walsh, et. al. Expires August 14, 2001 [Page 6] Internet-Draft A URN Namespace for Public Ids February 2001Scope: Global 3. Examples The following examples are not guaranteed to be real. They are listed for pedagogical reasons only. "ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" becomes"urn:publicid:ISO%2FIEC+10179%3A1996:DTD:DSSSL+Architecture:EN""urn:publicid:ISO%2FIEC+10179%3A1996:DTD+DSSSL+Architecture:EN" "ISO 8879:1986//ENTITIES Added Latin 1//EN" becomes"urn:publicid:ISO+8879%3A1986:ENTITIES:Added+Latin+1:EN""urn:publicid:ISO+8879%3A1986:ENTITIES+Added+Latin+1:EN" "-//OASIS//DTD DocBook XML V4.1.2//EN" becomes"urn:publicid:-:OASIS:DTD:DocBook+XML+V4.1.2:EN""urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN" "+//IDNpython.org//DTDexample.org//DTD XMLBookmark Exchange LanguageBookmarks 1.0//EN//XML" becomes"urn:publicid:+:IDN+python.org:DTD:XML+Bookmark+Exchange+Language+1.0:EN:XML""urn:publicid:+:IDN+example.org:DTD+XML+Bookmarks+1.0:EN:XML" "-//ArborText::prod//DTD HelpNavigationDocument::19970708//EN" becomes"urn:publicid:-:ArborText::prod:DTD+Help+Navigation+Document::19970708:EN""urn:publicid:-:ArborText;prod:DTD+Help+Document;19970708:EN" "foo" becomes "urn:publicid:foo" Walsh, et. al. Expires November 6, 2001 [Page 6] Internet-Draft A URN Namespace for Public Ids May 2001 "3+3=6" becomes "urn:publicid:3%2B3=6" "-//Acme, Inc.//DTDGeneralBookMarkupVersion 1.0" becomes"urn:publicid:-%2F%2FAcme,+Inc.%2F%2FDTD+General+Book+Markup+Version+1.0" because it is not an FPI (it has no public text language or designating sequence)."urn:publicid:-:Acme,+Inc.:DTD+Book+Version+1.0" 4. Security Considerations There are no additional security considerations other than those normally associated with the use and resolution of URNs in general. References [1] W3C, XML WG, "Extensible Markup Language (XML) 1.0 SecondWalsh, et. al. Expires August 14, 2001 [Page 7] Internet-Draft A URN Namespace for Public Ids February 2001Edition", February 1998, <http://www.w3.org/TR/REC-xml>. [2] JTC 1, SC 2, "ISO (International Organization for Standardization) ISO 2022:1994 Information technology -- Character code structure and extension techniques (fourth edition).", 1994. [3] JTC 1, SC 34, "ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)", 1986. [4] JTC 1, SC 34, "ISO/IEC 9070:1991 Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers", 1991. [5] Moats, R., "URN Syntax", RFC 2141, May 1997. [6] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, Aug 1998. [7] Alvestrand, H., "Tags for the Identification of Languages", RFC 3066, January 2001. [8] Grosso, P., "Entity Management: OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401)", Sep 1997, <http://www.oasis-open.org/html/tr9401.html>. Walsh, et. al. Expires November 6, 2001 [Page 7] Internet-Draft A URN Namespace for Public Ids May 2001 Authors' Addresses Norman Walsh Sun Microsystems, Inc. One Network Drive MS UBURO2-201 Burlington, MA 01803-0902 US EMail: Norman.Walsh@East.Sun.COM John Cowan Reuters Health Information1700 Broadway, 31st45 West 36th St, 12th Floor New York, NY1001910018 US EMail: jcowan@reutershealth.com Paul Grosso Arbortext, Inc. 1000 Victors Way Ann Arbor, MI 48108-2744 US EMail: pgrosso@arbortext.com Walsh, et. al. ExpiresAugust 14,November 6, 2001 [Page 8] Internet-Draft A URN Namespace for Public IdsFebruaryMay 2001 Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC editor function is currently provided by the Internet Society. Walsh, et. al. ExpiresAugust 14,November 6, 2001 [Page 9] ----