draft-walsh-urn-publicid-00.txt  -->   draft-walsh-urn-publicid-01.txt-16849.txt

view Side-By-Side changes


Network Working Group                                           N. Walsh
Internet-Draft                                    Sun Microsystems, Inc.
Expires: August 14, November 6, 2001                                       J. Cowan
                                              Reuters Health Information
                                                               P. Grosso
                                                         Arbortext, Inc.
                                                       February 13,
                                                             May 8, 2001


                 A URN Namespace for Public Identifiers
                         draft-walsh-urn-publicid-00
                         draft-walsh-urn-publicid-01

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 14, November 6, 2001.

Copyright Notice

   Copyright (C) The Internet Society (2001). All Rights Reserved.

Abstract

   This document describes a URN namespace that is designed to allow
   Public Identifiers to be expressed in URI syntax.

1. Introduction

   XML[1] external entities have two identifiers: a public system identifier
   and a system public identifier. The system identifier is a URI, by
   definition, but the public identifier is simply a string. 



Walsh, et. al.          Expires August 14, November 6, 2001                [Page 1]

Internet-Draft       A URN Namespace for Public Ids        February             May 2001


   Historically, the system identifier of an external entity has been a
   local, or system-specific identifier while the public identifier has
   been a more global, persistent name.

   Unfortunately, public identifiers do not fit neatly into the
   existing web architecture because they are not legal URIs. Many new
   specifications (XSLT, XML Schema, etc.) have the implicit or
   explicit requirement that all external identifiers be URIs.

   Any string which consists only of the public identifier characters
   (defined by Production 13 of Extensible Markup Language (XML) 1.0
   Second Edition[1]) is a legal public identifier.  But SGML[3]
   defines a restricted subset of public identifier called a "Formal
   Public Identifier" (FPI). For the

   The purpose of this document, the
   significant difference between namespace is to allow public identifiers and FPIs is that
   FPIs have internal structure and may have registered owner
   identifiers. to be
   encoded in URNs in a reliable, comparable way.

   This document describes a scheme for representing public identifiers
   as URNs by introducing a public identifier namespace, "publicid".

   This namespace specification is for a formal namespace. 

2. Specification Template

   Namespace ID: 

     "publicid" requested.

   Registration Information: 

     Registration Version Number: 1
     Registration Date: 2001-02-13

   Declared registrant

1.1 Public Identifiers

   Any string which consists only of the namespace: 

     Norman Walsh
     Sun Microsystems, Inc.
     One Network Drive MS UBURO2-201
     Burlington, MA
     01803-0902

     Norman.Walsh@East.Sun.COM

   Declaration public identifier characters
   (defined by Production 13 of structure: 

     The purpose Extensible Markup Language (XML) 1.0
   Second Edition[1]) is a legal public identifier.

   In addition to the character set restriction, public identifiers
   must be normalized by changing all strings of whitespace (the
   characters #x20, #xD, and #xA, in this namespace is context) to allow single space
   characters (#x20), and removing all leading and trailing whitespace.

   In keeping with this specification's goal of allowing public
   identifiers to be encoded in URNs in a reliable, comparable way. To that
     end, way, this document
   specification mandates that public identifiers be


Walsh, et. al.          Expires August 14, 2001                 [Page 2]

Internet-Draft       A URN Namespace for Public Ids        February 2001 normalized before
   encoding them into URNs. As described in ISO
     8879[3],  Throughout this specification, we assume
   that normalization has already been performed.

1.2 Formal Public Identifiers

   SGML[3] defines a restricted subset of public identifier is normalized by removing all
     leading and trailing whitespace and replacing all remaining
     sequences of two or more whitespace characters with called a single
     space. 

     For public identifiers that
   "Formal Public Identifier" (FPI).

   FPIs are not FPIs, strings composed from the Namespace
     Specific String (NSS) for URNs in the "publicid" namespace has
     the following structure: 

       urn:publicid:{public-identifier-text} 

     The character set of public identifiers is constrained by
     XML[1]. Most same range of the legal public identifier characters are
     also legal characters in URNs. Unless otherwise noted, the characters in the {public-identifier-text} are directly
     transcribed from the corresponding character in the public
     identifier. The following exceptions are made: 
     
     +  Spaces in the public identifier are transcribed as "+"
        characters. Whitespace normalization must be performed
        before constructing a URN in the "publicid" namespace,
        therefore the sequence of characters "++" should never
        occur in such URNs.
     +  Literal "+" characters in the public identifier must be
        %-encoded.
     +  Literal ":" characters in the public identifier must be
        %-encoded.
     +  The reserved characters that may appear in
   public identifiers, "%", "/", "?", and "#", must be %-encoded. 

     Formal Public Identifiers are a subset of public identifiers.
     They are strings composed from the same range of characters, but have with an explicit internal structure.  The
   structure of Formal Public Identifiers is normatively described in SGML[3],
   SGML[3]; we review it here for convenience. 

   Most Formal Public Identifiers consist of the following fields, in
   this order: an owner identifier, a public text class, a public text
   description, a public text language or public text designating
   sequence, and an optional public text display version. 



Walsh, et. al.          Expires November 6, 2001                [Page 2]

Internet-Draft       A URN Namespace for Public Ids             May 2001


   Owner identifiers may begin with "-//" or "+//", "+//"; otherwise "//" is
   used to delimit fields in the FPI with (with the exception of the public
   text class which is delimited from the public text description by a space.
   space).

   In other words, most FPIs look like this: 


Walsh, et. al.          Expires August 14, 2001                 [Page 3] 

      owner//class description//language//version

   and most owners begin with "+//" or "-//", although they are not
   required to. Here are some example FPIs:

      +//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN//XML
      -//OASIS//DTD DocBook XML V4.1.2//EN
      -//ArborText::prod//DTD Help Navigation Document::19970708//EN
      ISO/IEC 10179:1996//DTD DSSSL Architecture//EN
      ISO 8879:1986//ENTITIES Added Latin 1//EN 

     An

   This document describes an algorithm for encoding public identifiers
   into URNs that explicitly allows the structured nature of formal
   public identifiers to be preserved. However, an algorithm for
   correctly identifying a Formal Public Identifier and determining the
   various fields within it is out of scope for this document. We begin our discussion of document and not
   necessary for the
     representation implementation of FPIs in our this URN namespace under namespace. 

2. Specification Template

   Namespace ID: 

         "publicid" requested.

   Registration Information: 

         Registration Version Number: 1
         Registration Date: 2001-05-08

   Declared registrant of the
     assumption that these steps have already been taken. namespace: 

         Norman Walsh
         Sun Microsystems, Inc.
         One Network Drive MS UBURO2-201
         Burlington, MA
         01803-0902

         Norman.Walsh@East.Sun.COM

   Declaration of structure: 

         The Namespace Specific String (NSS) for the URNs in the "publicid" namespace that represent Formal


Walsh, et. al.          Expires November 6, 2001                [Page 3]

Internet-Draft       A URN Namespace for Public Identifiers
     have Ids             May 2001


         namespace has the following structure: 

     urn:publicid:{owner-identifier}:{text-class}
                 :{text-description}:{language|designating-sequence}
                 {:display-version}? 
         
            urn:publicid:{transcribed-public-identifier} 

         Where: 
     
      {owner-identifier} 
         
          {transcribed-public-identifier} is derived from the owner identifier in text of the FPI. Owner identifiers in FPIs have one of three forms:
        "+//" followed by a string, "-//" followed by a string, or
        a string that does not contain "//". The following rules
        apply to derive a URN {owner-identifier} from the owner public
            identifier in an FPI:
      
        -  Owner identifiers that begin "+//" are transcribed into
           the URN {owner-identifier} by replacing "+//" with "+:"
           and transcribing the remaining string.
        -  Owner identifiers that begin "-//" are transcribed into
           the URN {owner-identifier} by replacing "-//" with "-:"
           and transcribing the remaining string.
        -  All other {owner-identifiers} are transcribed directly
           from the owner identifier in the FPI. 

      {text-class} is the public text class from the FPI. The
        public text class of FPIs is constrained by SGML[3] according to the following 13 strings: "CAPACITY", "CHARSET", "DOCUMENT",
        "DTD", "ELEMENTS", "ENTITIES", "LPD", "NONSGML",
        "NOTATION", "SHORTREF", "SUBDOC", "SYNTAX", or "TEXT". The
        "publicid" URN namespace explicitly relaxes this


Walsh, et. al.          Expires August 14, 2001                 [Page 4]

Internet-Draft rules: 
          
            -  A URN Namespace for Public Ids        February 2001


        constraint. Any string may be used. 

      {text-description} is the public text description transcribed
        from the FPI. 

      {language} is the public text language transcribed from the
        FPI. The {language} codes used space in "publicid" URNs should be
        drawn from RFC 3066[6]. 

      {designating-sequence} is the public text designating
        sequence transcribed from the FPI. Formal Public
        Identifiers that describe character sets may use the
        designating sequence (a string defined by ISO 2022[2]) to
        identify the character set. 

      {display-version} is the public text display version
        transcribed from the FPI. 

     Most of the legal public identifier characters are also legal
     characters in URNs. Unless otherwise noted, the characters in
     the {owner-identifier}, {text-class}, {text-description},
     {language}, {designating-sequence}, and {display-version} are
     directly transcribed from the corresponding character in the
     Formal Public Identifier. The following exceptions are made: 
     
     +  Spaces in the FPI are is transcribed as "+" characters. "+".
               Whitespace normalization must be performed before
               constructing a URN in the "publicid" namespace,
               therefore
        the sequence of adjacent "+" characters "++" should never occur in such
        URNs.
     +  Literal "+" characters URNs in the FPI, except at the beginning
        of {owner-identifier}s for FPIs that have the "+//"-form of
        owner identifier, must be %-encoded.
               this namespace.
            -  The "+" characters at
        the beginning of {owner-identifier}s for FPIs that have the
        "+//"-form sequence of owner identifier, must not be %-encoded. 
     + characters "//" is transcribed as ":".
            -  The sequence of characters "::" is transcribed as ";".
            -  A literal "+" character is transcribed as %2B.
            -  A literal ":" character (except in the owner identifier or public text
        description "::") is transcribed
               as %3A.
            -  A literal "/" character (except in "//") is transcribed
               as %2F.
            -  A literal ";" character is transcribed as %3B.
            -  A literal "'" character is transcribed as %27.
            -  A literal "?" character is transcribed as %3F.
            -  A literal "#" character is transcribed as "::"; all other uses of a %23.
            -  A literal ":" in "%" character is transcribed as %25. 

          The special rules for "//" and "::" are designed to preserve
            the structured nature of formal public identifiers without
            requiring the translator to have special knowledge of FPI must be %-encoded.
     +
            syntax. 

          The reserved characters that may appear in FPIs, "%", rules for "+", ":", "/",
        "?", and "#", must be %-encoded. 

     A small subset of Formal Public Identifiers cannot be
     represented by this namespace. An FPI cannot be represented if
     either of the following conditions applies:
     
     +  After transcription, the {owner-identifier}, {text-class},
        {text-description}, {language}, or {designating-sequence}
        would be empty. Allowing any ";" are required to preserve
            literal occurrences of these fields to be empty
        could introduce ambiguous "::" sequences into characters in the URN.


Walsh, et. al.          Expires August 14, 2001                 [Page 5]

Internet-Draft       A 'publicid'
            URN Namespace for Public Ids        February 2001


     + namespace. 

          The FPI uses remaining characters, " " (space), "'", "?", "#", and
            "%", are the optional unavailable text indicator
        defined in SGML[3] but rarely used only other legal characters in practice. public
            identifiers that cannot be literally transcribed into a URN
            by the rules of RFC 2141[5] and RFC 2396[6]. 

   Relevant ancillary documentation: 

         Extensible Markup Language (XML) Version 1.0 Second Edition[1]
         Standard Generalized Markup Language (SGML)[3]
         Registration procedures for public text owner identifiers[4]

   Identifier uniqueness considerations: 



Walsh, et. al.          Expires November 6, 2001                [Page 4]

Internet-Draft       A URN Namespace for Public Ids             May 2001


         The identifier uniqueness considerations for URNs in the
         "publicid" namespace are the same as the identifier uniqueness
         considerations for public identifiers. Formal Public
         Identifiers with registered owner identifiers are required to
     be unique. For unregistered
         be unique. For unregistered owner identifiers and informal
         public identifiers, they may or may not be unique. No
         enforcement policy can be asserted.

   Identifier persistence considerations: 

         The persistence of URNs in the "publicid" namespace is the
         same as the persistence of the corresponding public
         identifier. 

         The "publicid" namespace is available for a wide range of
         uses, it cannot be subjected to a uniform persistence policy.
         As a general rule, formal public identifiers with registered
         owner identifiers are more likely to be persistent than
         informal public identifiers or formal public identifiers with
         unregistered owner identifiers. 

         One exception to this rule is the "IDN" scheme for producing a
         registered owner identifier from a domain name. That scheme
         contains at least all the weaknesses associated with the
         persistence of domain names. 

         It is important to note that a properly registered owner identifiers and informal
     public identifiers, they may or may not be unique, no
     enforcement policy
         identifier can be asserted.

   Identifier persistence considerations: 

     The persistence apply any policy desired to the portion of URNs in the
         "publicid" URN namespace is the
     same as the persistence of the corresponding public identified by that owner identifier. 

   Process of identifier assignment: 

         Identifiers in the "publicid" namespace may be are assigned by
         applying the
     same policies and procedures as conversions described above to a public identifiers.
         identifier. In order to provide a URN in this namespace for a
         resource that does not have a public identifier, one must be
         created (according to the rules for creating public
         identifiers). 

         There is no requirement that a resource have only one public
         identifier.

   Process of identifier resolution: 

         Identifiers in the "publicid" namespace may be resolved by the
         same policies and procedures as public identifiers. Public
         identifiers can be resolved in many different ways. Many
         existing systems provide facilities for resolving them by way
         of OASIS TR9401[8] Catalog files. Other systems resolve them


Walsh, et. al.          Expires November 6, 2001                [Page 5]

Internet-Draft       A URN Namespace for Public Ids             May 2001


         by mapping each component to a local pathname component. And
         some systems simply "know about" a fixed set of public
         identifiers. In addition, URNs in the 'publicid' namespace may
         be resolvable by other mechanisms unique to URIs (such as
         caches).

   Rules for Lexical Equivalence: 

         Whitespace normalization is performed before constructing a
         URN in the "publicid" namespace, so such URNs are lexically
         equivalent if they are lexically identical.

   Conformance with URN Syntax: 

         No special considerations. URNs in this namespace conform to
         both RFC 2141 and RFC 2396.

   Validation mechanism: 

         None specified. 



Walsh, et. al.          Expires August 14, 2001                 [Page 6]

Internet-Draft       A URN Namespace for Public Ids        February 2001 

   Scope: 

         Global


3. Examples

   The following examples are not guaranteed to be real. They are
   listed for pedagogical reasons only. 

      "ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" becomes
        "urn:publicid:ISO%2FIEC+10179%3A1996:DTD:DSSSL+Architecture:EN"
      "urn:publicid:ISO%2FIEC+10179%3A1996:DTD+DSSSL+Architecture:EN" 

      "ISO 8879:1986//ENTITIES Added Latin 1//EN" becomes
        "urn:publicid:ISO+8879%3A1986:ENTITIES:Added+Latin+1:EN"
      "urn:publicid:ISO+8879%3A1986:ENTITIES+Added+Latin+1:EN" 

      "-//OASIS//DTD DocBook XML V4.1.2//EN" becomes
        "urn:publicid:-:OASIS:DTD:DocBook+XML+V4.1.2:EN"
      "urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN" 

      "+//IDN python.org//DTD example.org//DTD XML Bookmark Exchange Language Bookmarks 1.0//EN//XML" becomes
        "urn:publicid:+:IDN+python.org:DTD:XML+Bookmark+Exchange+Language+1.0:EN:XML"
      "urn:publicid:+:IDN+example.org:DTD+XML+Bookmarks+1.0:EN:XML" 

      "-//ArborText::prod//DTD Help Navigation Document::19970708//EN" becomes
        "urn:publicid:-:ArborText::prod:DTD+Help+Navigation+Document::19970708:EN"
      "urn:publicid:-:ArborText;prod:DTD+Help+Document;19970708:EN" 

      "foo" becomes
      "urn:publicid:foo" 



Walsh, et. al.          Expires November 6, 2001                [Page 6]

Internet-Draft       A URN Namespace for Public Ids             May 2001


      "3+3=6" becomes
      "urn:publicid:3%2B3=6" 

      "-//Acme, Inc.//DTD General Book Markup Version 1.0" becomes
        "urn:publicid:-%2F%2FAcme,+Inc.%2F%2FDTD+General+Book+Markup+Version+1.0"
        because it is not an FPI (it has no public text language or
        designating sequence).
      "urn:publicid:-:Acme,+Inc.:DTD+Book+Version+1.0" 


4. Security Considerations

   There are no additional security considerations other than those
   normally associated with the use and resolution of URNs in general. 

References

   [1]  W3C, XML WG, "Extensible Markup Language (XML) 1.0 Second


Walsh, et. al.          Expires August 14, 2001                 [Page 7]

Internet-Draft       A URN Namespace for Public Ids        February 2001
        Edition", February 1998, 
        <http://www.w3.org/TR/REC-xml>.

   [2]  JTC 1, SC 2, "ISO (International Organization for
        Standardization) ISO 2022:1994 Information technology --
        Character code structure and extension techniques (fourth
        edition).", 1994.

   [3]  JTC 1, SC 34, "ISO 8879:1986 Information processing -- Text and
        office systems -- Standard Generalized Markup Language (SGML)",
        1986.

   [4]  JTC 1, SC 34, "ISO/IEC 9070:1991 Information technology -- SGML
        support facilities -- Registration procedures for public text
        owner identifiers", 1991.

   [5]  Moats, R., "URN Syntax", RFC 2141, May 1997.

   [6]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
        Resource Identifiers (URI): Generic Syntax", RFC 2396, Aug 1998.

   [7]  Alvestrand, H., "Tags for the Identification of Languages", RFC
        3066, January 2001.

   [8]  Grosso, P., "Entity Management: OASIS Technical Resolution
        9401:1997 (Amendment 2 to TR 9401)", Sep 1997, 
        <http://www.oasis-open.org/html/tr9401.html>.









Walsh, et. al.          Expires November 6, 2001                [Page 7]

Internet-Draft       A URN Namespace for Public Ids             May 2001


Authors' Addresses

   Norman Walsh
   Sun Microsystems, Inc.
   One Network Drive MS UBURO2-201
   Burlington, MA  01803-0902
   US

   EMail: Norman.Walsh@East.Sun.COM


   John Cowan
   Reuters Health Information
   1700 Broadway, 31st
   45 West 36th St, 12th Floor
   New York, NY  10019  10018
   US

   EMail: jcowan@reutershealth.com


   Paul Grosso
   Arbortext, Inc.
   1000 Victors Way
   Ann Arbor, MI  48108-2744
   US

   EMail: pgrosso@arbortext.com
























Walsh, et. al.          Expires August 14, November 6, 2001                [Page 8]

Internet-Draft       A URN Namespace for Public Ids        February             May 2001


Full Copyright Statement

   Copyright (C) The Internet Society (2001). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC editor function is currently provided by the
   Internet Society.



















Walsh, et. al.          Expires August 14, November 6, 2001                [Page 9]

----