draft-fielding-uri-rfc2396bis-01.txt  -->   draft-fielding-uri-rfc2396bis-02.txt

view Side-By-Side changes



Network Working Group                                     T. Berners-Lee
Internet-Draft                                                   MIT/LCS
Updates: 1738 (if approved)                                  R. Fielding
Obsoletes: 2732, 2396, 1808 (if approved)                   Day Software
Expires: September 1, 2003
                                                             L. Masinter
Expires: November 21, 2003                                         Adobe
                                                           March 3,
                                                            May 23, 2003


           Uniform Resource Identifier (URI): Generic Syntax
                    draft-fielding-uri-rfc2396bis-01
                    draft-fielding-uri-rfc2396bis-02

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.
   <http://www.ietf.org/ietf/1id-abstracts.txt>.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 1, 2003.
   <http://www.ietf.org/shadow.html>.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   A Uniform Resource Identifier (URI) is a compact string of characters
   for identifying an abstract or physical resource.  This document
   defines the generic syntax of a URI, including both absolute and
   relative forms, and guidelines for their use.

   This document defines a grammar that is a superset of all valid URIs,
   such that an implementation can parse the common components of a URI
   reference without knowing the scheme-specific requirements of every
   possible identifier type.  This document does not define a generative



Berners-Lee, et al.    Expires September 1, 2003                [Page 1]

Internet-Draft             URI Generic Syntax                 March 2003
   grammar for all URIs; that task will be performed by the individual
   specifications of each URI scheme.



Berners-Lee, et al.    Expires November 21, 2003                [Page 1]

Internet-Draft             URI Generic Syntax                   May 2003


Editorial Note

   Discussion of this draft and comments to the editors should be sent
   to the uri@w3.org mailing list.  An issues list and version history
   is available at <http://www.apache.org/~fielding/uri/rev-2002/>. <http://www.apache.org/~fielding/uri/rev-2002/
   issues.html>.

Table of Contents

   1.    Introduction . . . . . . . . . . . . . . . . . . . . . . . .  4
   1.1   Overview of URIs . . . . . . . . . . . . . . . . . . . . . .  4
   1.2   URI, URL, and URN
   1.1.1 Generic Syntax . . . . . . . . . . . . . . . . . . . . . . .  5
   1.3   Example URIs
   1.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . .  6
   1.4   Hierarchical URIs
   1.1.3 URI, URL, and Relative Forms URN  . . . . . . . . . . . .  6
   1.5   URI Transcribability . . . . . . . . .  6
   1.2   Design Considerations  . . . . . . . . . . .  7
   1.6   Syntax Notation and Common Elements . . . . . . . .  6
   1.2.1 Transcription  . . . .  8
   2.    URI Characters and Escape Sequences . . . . . . . . . . . .  9
   2.1   URIs and non-ASCII characters . . . . . . .  6
   1.2.2 Separating Identification from Interaction . . . . . . . .  9
   2.2   Reserved Characters .  7
   1.2.3 Hierarchical Identifiers . . . . . . . . . . . . . . . . . .  9
   1.3   Syntax Notation  . 10
   2.3   Unreserved Characters . . . . . . . . . . . . . . . . . . . 11
   2.4   Escape Sequences . .  9
   2.    Characters . . . . . . . . . . . . . . . . . . . . 11
   2.4.1 Escaped Encoding . . . . . 10
   2.1   Encoding of Characters . . . . . . . . . . . . . . . . . 11
   2.4.2 When to Escape and Unescape . . 10
   2.2   Reserved Characters  . . . . . . . . . . . . . . 11
   2.4.3 Excluded US-ASCII Characters . . . . . . 10
   2.3   Unreserved Characters  . . . . . . . . . . 12
   3.    URI Syntactic Components . . . . . . . . . 11
   2.4   Escaped Characters . . . . . . . . . 14
   3.1   Scheme Component . . . . . . . . . . . . 12
   2.4.1 Escaped Encoding . . . . . . . . . . 15
   3.2   Authority Component . . . . . . . . . . . . 12
   2.4.2 When to Escape and Unescape  . . . . . . . . 15
   3.2.1 Registry-based Naming Authority . . . . . . . . 12
   2.5   Excluded Characters  . . . . . . 16
   3.2.2 Server-based Naming Authority . . . . . . . . . . . . . . 13
   3.    Syntax Components  . 16
   3.3   Path Component . . . . . . . . . . . . . . . . . . . . 15
   3.1   Scheme . . . 18
   3.4   Query Component . . . . . . . . . . . . . . . . . . . . . . 19
   4.    URI References . . 15
   3.2   Authority  . . . . . . . . . . . . . . . . . . . . . 20
   4.1   Fragment Identifier . . . . 16
   3.2.1 User Information . . . . . . . . . . . . . . . . 20
   4.2   Same-document References . . . . . . 16
   3.2.2 Host . . . . . . . . . . . . 21
   4.3   Parsing a URI Reference . . . . . . . . . . . . . . . . 17
   3.2.3 Port . . 21
   5.    Relative URI References . . . . . . . . . . . . . . . . . . 22
   5.1   Establishing a Base URI . . . . . . . . 18
   3.3   Path . . . . . . . . . . 23
   5.1.1 Base URI within Document Content . . . . . . . . . . . . . . 24
   5.1.2 Base URI from the Encapsulating Entity . . . . 19
   3.4   Query  . . . . . . . 24
   5.1.3 Base URI from the Retrieval URI . . . . . . . . . . . . . . 25
   5.1.4 Default Base URI . . . . . . 20
   3.5   Fragment . . . . . . . . . . . . . . . . 25
   5.2   Resolving Relative References to Absolute Form . . . . . . . 25
   6.    URI Normalization and Comparison . . . 20
   4.    Usage  . . . . . . . . . . . 29
   6.1   URI Equivalence . . . . . . . . . . . . . . . . 22
   4.1   URI Reference  . . . . . . 29
   6.2   Comparison Ladder . . . . . . . . . . . . . . . . . 22
   4.2   Relative URI . . . . 29
   6.2.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 30



Berners-Lee, et al.    Expires September 1, 2003                [Page 2]

Internet-Draft             URI Generic Syntax                 March 2003


   6.2.2 Syntax-based Normalization . . 22
   4.3   Absolute URI . . . . . . . . . . . . . . . 31
   6.2.3 Scheme-based Normalization . . . . . . . . . 23
   4.4   Same-document Reference  . . . . . . . . 32
   6.2.4 Protocol-based Normalization . . . . . . . . . . 23
   4.5   Suffix Reference . . . . . . 32
   6.3   Good Practice When Using URIs . . . . . . . . . . . . . . . 32
   7.    Security Considerations . 23
   5.    Relative Resolution  . . . . . . . . . . . . . . . . . 34
   7.1   Reliability and Consistency . . . 25
   5.1   Establishing a Base URI  . . . . . . . . . . . . . 34
   7.2   Malicious Construction . . . . . 25
   5.1.1 Base URI within Document Content . . . . . . . . . . . . . . 34
   7.3   Rare IP Address Formats 26
   5.1.2 Base URI from the Encapsulating Entity . . . . . . . . . . . 26
   5.1.3 Base URI from the Retrieval URI  . . . . . . . 35
   7.4   Sensitive Information . . . . . . . 27
   5.1.4 Default Base URI . . . . . . . . . . . . 35
   7.5   Semantic Attacks . . . . . . . . . . 27



Berners-Lee, et al.    Expires November 21, 2003                [Page 2]

Internet-Draft             URI Generic Syntax                   May 2003


   5.2   Obtaining the Referenced URI . . . . . . . . . . . . 36
   8.    Acknowledgements . . . . 27
   5.3   Recomposition of a Parsed URI  . . . . . . . . . . . . . . . 29
   5.4   Examples of Relative Resolution  . . . 37
         Normative References . . . . . . . . . . . 30
   5.4.1 Normal Examples  . . . . . . . . . 38
         Non-normative References . . . . . . . . . . . . . 30
   5.4.2 Abnormal Examples  . . . . . 39
         Authors' Addresses . . . . . . . . . . . . . . . . 31
   6.    Normalization and Comparison . . . . . 40
   A.    Collected BNF for URI . . . . . . . . . . . 33
   6.1   Equivalence  . . . . . . . . 42
   B.    Parsing a URI Reference with a Regular Expression . . . . . 43
   C.    Examples of Resolving Relative URI References . . . . . . . 44
   C.1   Normal Examples . . . . 33
   6.2   Comparison Ladder  . . . . . . . . . . . . . . . . . . 44
   C.2   Abnormal Examples . . . 33
   6.2.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 44
   D.    Embedding the Base URI in HTML documents 34
   6.2.2 Syntax-based Normalization . . . . . . . . . . 46
   E.    Recommendations for Delimiting URI in Context . . . . . . . 47
   F.    Abbreviated URIs 35
   6.2.3 Scheme-based Normalization . . . . . . . . . . . . . . . . . 36
   6.2.4 Protocol-based Normalization . . . . . 49
   G.    Summary of Non-editorial Changes . . . . . . . . . . . 36
   6.3   Canonical Form . . . 50
   G.1   Additions . . . . . . . . . . . . . . . . . . . . 36
   7.    Security Considerations  . . . . . 50
   G.2   Modifications from RFC 2396 . . . . . . . . . . . . . 38
   7.1   Reliability and Consistency  . . . 50
         Index . . . . . . . . . . . . . 38
   7.2   Malicious Construction . . . . . . . . . . . . . . 52
         Intellectual Property and Copyright Statements . . . . . 38
   7.3   Rare IP Address Formats  . . 55
























Berners-Lee, et al.    Expires September 1, 2003                [Page 3]

Internet-Draft             URI Generic Syntax                 March 2003


1. Introduction

   A Uniform Resource Identifier (URI) provides a simple and extensible
   means . . . . . . . . . . . . . . . . 39
   7.4   Sensitive Information  . . . . . . . . . . . . . . . . . . . 39
   7.5   Semantic Attacks . . . . . . . . . . . . . . . . . . . . . . 39
   8.    Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . 41
         Normative References . . . . . . . . . . . . . . . . . . . . 42
         Informative References . . . . . . . . . . . . . . . . . . . 43
         Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 45
   A.    Collected ABNF for identifying URI . . . . . . . . . . . . . . . . . . . 46
   B.    Parsing a resource.  This specification of URI syntax
   and semantics is derived from concepts introduced by Reference with a Regular Expression  . . . . . 47
   C.    Embedding the Base URI in HTML documents . . . . . . . . . . 48
   D.    Delimiting a URI in Context  . . . . . . . . . . . . . . . . 49
   E.    Summary of Non-editorial Changes . . . . . . . . . . . . . . 51
   E.1   Additions  . . . . . . . . . . . . . . . . . . . . . . . . . 51
   E.2   Modifications from RFC 2396  . . . . . . . . . . . . . . . . 51
         Index  . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
         Intellectual Property and Copyright Statements . . . . . . . 57



















Berners-Lee, et al.    Expires November 21, 2003                [Page 3]

Internet-Draft             URI Generic Syntax                   May 2003


1. Introduction

   A Uniform Resource Identifier (URI) provides a simple and extensible
   means for identifying a resource.  This specification of URI syntax
   and semantics is derived from concepts introduced by the World Wide
   Web global information initiative, whose use of such objects identifiers
   dates from 1990 and is described in "Universal Resource Identifiers
   in WWW" [RFC1630], and is designed to meet the recommendations laid
   out in "Functional Recommendations for Internet Resource Locators"
   [RFC1736] and "Functional Requirements for Uniform Resource Names"
   [RFC1737].

   This document obsoletes [RFC2396], which merged "Uniform Resource
   Locators" [RFC1738] and "Relative Uniform Resource Locators"
   [RFC1808] in order to define a single, generic syntax for all URIs.
   It excludes those portions of RFC 1738 that defined the specific
   syntax of individual URI schemes; those portions will be updated as
   separate documents. The process for registration of new URI schemes
   is defined separately by [RFC2717].

   All significant changes from RFC 2396 are noted in Appendix G.

1.1 Overview of URIs

   URIs are characterized by the following definitions: as follows:

   Uniform

      Uniformity provides several benefits: it allows different types of
      resource identifiers to be used in the same context, even when the
      mechanisms used to access those resources may differ; it allows
      uniform semantic interpretation of common syntactic conventions
      across different types of resource identifiers; it allows
      introduction of new types of resource identifiers without
      interfering with the way that existing identifiers are used; and,
      it allows the identifiers to be reused in many different contexts,
      thus permitting new applications or protocols to leverage a
      pre-existing, large, and widely-used set of resource identifiers.

   Resource

      A resource

      Anything that can be anything that has identity. named or described can be a resource.
      Familiar examples include an electronic document, an image, a
      service (e.g., "today's weather report for Los Angeles"), and a
      collection of other resources.  Not all resources are network "retrievable"; A resource is not necessarily
      accessible via the Internet; e.g., human beings, corporations, and
      bound books in a library can also be considered resources.

      The resource is Likewise, abstract
      concepts can be resources, such as the conceptual mapping to an entity or set operators and operands of a



Berners-Lee, et al.    Expires September 1, November 21, 2003                [Page 4]

Internet-Draft             URI Generic Syntax                 March                   May 2003


      entities, not necessarily


      mathematical equation or the entity which corresponds to that
      mapping at any particular instance in time.  Thus, a resource can
      remain constant even when its content---the entities to which it
      currently corresponds---changes over time, provided that the
      conceptual mapping is not changed in the process. types of a relationship (e.g.,
      "parent" or "employee").

   Identifier

      An identifier embodies the information required to distinguish
      what is being identified from all other things within its scope of
      identification.

   A URI is an object that can act as a reference to
      something identifier that has identity.  In the case consists of a URI, the object is
      a sequence of characters with a restricted syntax.

   Having identified a resource, a system may perform a variety of
   operations on
   matching the resource, as might be characterized restricted syntax defined by such words
   as `access', `update', `replace', or `find attributes'.

1.2 URI, URL, and URN this specification.  A URI
   can be further classified as used to refer to a locator, resource.  This specification does not
   place any limits on the nature of a name, resource or both.  The
   term "Uniform Resource Locator" (URL) refers to the subset of URIs
   that, in addition reasons why an
   application might wish to refer to identifying the resource, provide a means resource.  URIs have a global
   scope and should be interpreted consistently regardless of
   locating context,
   but that interpretation may be defined in relation to the resource by describing its primary access mechanism user's
   context (e.g., its network "location").  The term "Uniform Resource Name"
   (URN) "http://localhost/" refers to the subset of URIs a resource that are required is
   relative to remain
   globally unique and persistent even when the resource ceases to exist
   or becomes unavailable.

   An individual scheme does user's network interface and yet not need specific to be cast into any
   one of a discrete
   set of URI types such as "URL", "URN", "URC", etc.  Any given user).

1.1.1 Generic Syntax

   Each URI
   scheme may define subspaces that have the characteristics of begins with a scheme name, as defined in Section 3.1, that
   refers to a locator, or both, often depending on the persistence and care in
   the assignment of specification for assigning identifiers by the naming authority, rather than on
   any quality of the URI scheme.  For within that reason, this specification
   deprecates use of the terms URL or URN to distinguish between
   schemes, instead using
   scheme. As such, the term URI throughout.

   Each URI scheme (Section 3.1) defines the namespace of the URI, syntax is a federated and
   thus extensible naming
   system wherein each scheme's specification may further restrict the
   syntax and semantics of identifiers using that scheme.

   This specification defines those elements of the URI syntax that are either
   required of all URI schemes or are common to many URI schemes.  It
   thus defines the syntax and semantics that are needed to implement a
   scheme-independent parsing mechanism for URI references, such that
   the scheme-dependent handling of a URI can be postponed until the
   scheme-dependent semantics are needed.

   Although many URI schemes are named after protocols, this does not
   imply  Likewise, protocols and data
   formats that make use of such a URI will result in access references can refer to this
   specification as defining the resource
   via the named protocol.  URIs are often used in contexts that are



Berners-Lee, et al.    Expires September 1, 2003                [Page 5]

Internet-Draft             URI Generic Syntax                 March 2003


   purely for identification, just like any other identifier.  Even when
   a URI is used to obtain a representation range of a resource, that access
   might be through gateways, proxies, caches, and name resolution
   services syntax allowed for all URIs,
   including those schemes that are independent of the protocol of the resource origin,
   and the resolution of some URIs may require the use of more than one
   protocol (e.g., both DNS and HTTP are typically used have yet to access an
   "http" URI's resource when it can't be found in a local cache). defined.

   A parser of the generic URI syntax is capable of parsing any URI
   reference into its major components; once the scheme is determined,
   further scheme-specific parsing can be performed on the components.
   In other words, the URI generic syntax is a superset of the syntax of
   all URI schemes.

1.3 Example URIs








Berners-Lee, et al.    Expires November 21, 2003                [Page 5]

Internet-Draft             URI Generic Syntax                   May 2003


1.1.2 Examples

   The following examples illustrate URIs that are in common use.

      ftp://ftp.is.co.za/rfc/rfc1808.txt
         -- ftp scheme for File Transfer Protocol services

      gopher://gopher.tc.umn.edu:70/11/Mailing%20Lists/
         -- gopher scheme for Gopher and Gopher+ Protocol services

      http://www.ietf.org/rfc/rfc2396.txt
         -- http scheme for Hypertext Transfer Protocol services

      mailto:John.Doe@example.com
         -- mailto scheme for electronic mail addresses

      news:comp.infosystems.www.servers.unix
         -- news scheme for USENET news groups and articles

      telnet://melvyl.ucop.edu/
         -- telnet scheme for interactive TELNET services


1.4 Hierarchical URIs


1.1.3 URI, URL, and Relative Forms

   An absolute identifier refers to URN

   A URI can be further classified as a resource independent of the
   context in which the identifier is used.  In contrast, locator, a relative
   identifier name, or both.  The
   term "Uniform Resource Locator" (URL) refers to a resource by describing the difference within a
   hierarchical namespace between the current context and an absolute
   identifier subset of URIs
   that, in addition to identifying the resource.

   Some URI schemes support resource, provide a hierarchical naming system, where the
   hierarchy means of
   locating the name is denoted resource by a "/" delimiter separating the
   components in describing its primary access mechanism
   (e.g., its network "location").  The term "Uniform Resource Name"
   (URN) refers to the scheme. This document defines a scheme-independent



Berners-Lee, et al.    Expires September 1, 2003                [Page 6]

Internet-Draft             URI Generic Syntax                 March 2003


   `relative' form subset of URI reference URIs that can are required to remain
   globally unique and persistent even when the resource ceases to exist
   or becomes unavailable.

   An individual scheme does not need to be used in conjunction with
   a `base' URI classified as being just one
   of a hierarchical "name" or "locator".  Instances of URIs from any given scheme to produce may
   have the `absolute' URI
   form characteristics of names or locators or both, often
   depending on the reference. The syntax of a hierarchical URI is described persistence and care in Section 3; the relative URI calculation is described assignment of
   identifiers by the naming authority, rather than any quality of the
   scheme.  This specification deprecates use of the term "URN" for
   anything but URIs in Section 5.

1.5 URI Transcribability the "urn" scheme [RFC2141].  This specification
   also deprecates the term "URL".

1.2 Design Considerations

1.2.1 Transcription

   The URI syntax was has been designed with global transcribability transcription as one of



Berners-Lee, et al.    Expires November 21, 2003                [Page 6]

Internet-Draft             URI Generic Syntax                   May 2003


   its main concerns. considerations.  A URI is a sequence of characters from a
   very limited set, i.e. set: the letters of the basic Latin alphabet, digits,
   and a few special characters.  A URI may be represented in a variety
   of ways: e.g., ink on paper, pixels on a screen, or a sequence of
   octets in a coded character set.  The interpretation of a URI depends
   only on the characters used and not how those characters are
   represented in a network protocol.

   The goal of transcribability transcription can be described by a simple scenario.
   Imagine two colleagues, Sam and Kim, sitting in a pub at an
   international conference and exchanging research ideas.  Sam asks Kim
   for a location to get more information, so Kim writes the URI for the
   research site on a napkin.  Upon returning home, Sam takes out the
   napkin and types the URI into a computer, which then retrieves the
   information to which Kim referred.

   There are several design concerns considerations revealed by the scenario:

   o  A URI is a sequence of characters, which characters that is not always represented
      as a sequence of octets.

   o  A URI may might be transcribed from a non-network source, and thus
      should consist of characters that are most likely to be able to be
      typed
      entered into a computer, within the constraints imposed by
      keyboards (and related input devices) across languages and
      locales.

   o  A URI often needs to be remembered by people, and it is easier for
      people to remember a URI when it consists of meaningful or
      familiar components.

   These design concerns considerations are not always in alignment.  For
   example, it is often the case that the most meaningful name for a URI
   component would require characters that cannot be typed into some
   systems.  The ability to transcribe the a resource identifier from one
   medium to another was has been considered more important than having its a
   URI consist of the most meaningful of components.  In local and or
   regional contexts and with improving technology, users might benefit
   from being able to use a wider range of characters; such use is not
   defined in this document.




Berners-Lee,

1.2.2 Separating Identification from Interaction

   A common misunderstanding of URIs is that they are only used to refer
   to accessible resources.  In fact, the URI alone only provides
   identification; access to the resource is neither guaranteed nor
   implied by the presence of a URI.  Instead, an operation (if any)
   associated with a URI reference is defined by the protocol element,



Berners-Lee, et al.    Expires September 1, November 21, 2003                [Page 7]

Internet-Draft             URI Generic Syntax                 March                   May 2003


1.6 Syntax Notation and Common Elements

   This document uses two conventions


   data format attribute, or natural language text in which it appears.

   Given a URI, a system may attempt to describe and define the syntax
   for URI.  The first, called the layout form, is perform a general description variety of operations
   on the order of components and component separators, resource, as in

       <first>/<second>;<third>?<fourth>

   The component names are enclosed in angle-brackets and any characters
   outside angle-brackets are literal separators.  Whitespace should might be
   ignored.  These descriptions characterized by such words as "denote",
   "access", "update", "replace", or "find attributes".  Such operations
   are used informally and do not define
   the syntax requirements.

   The second convention is a formal grammar defined using the Augmented
   Backus-Naur Form (ABNF) notation of [RFC2234]. Although the ABNF
   defines syntax in terms of the ASCII character encoding [ASCII], the
   URI syntax should be interpreted in terms of by the character protocols that the
   ASCII-encoded octet represents, rather than the octet encoding
   itself.  How make use of URIs, not by this
   specification.  However, we do use a few general terms for describing
   common operations on URIs.  URI "resolution" is represented in terms the process of bits
   determining an access mechanism and bytes the appropriate parameters
   necessary to dereference a URI; such resolution may require several
   iterations.  Using that access mechanism to perform some action on
   the
   wire URI's resource is dependent upon the character encoding termed a "dereference" of the protocol URI.

   When URIs are used within information systems to
   transport it, or the charset identify sources of
   information, the document that contains it.

   The complete most common form of URI syntax dereference is collected in Appendix A.




























Berners-Lee, et al.    Expires September 1, 2003                [Page 8]

Internet-Draft             URI Generic Syntax                 March 2003


2. URI Characters and Escape Sequences

   A URI consists "retrieval":
   making use of a restricted set of characters, primarily chosen
   to aid transcribability and usability both in computer systems and URI in
   non-computer communications. Characters used conventionally as
   delimiters around order to retrieve a URI are excluded.  The restricted set of
   characters consists representation of digits, letters, and its
   associated resource.  A "representation" is a few graphic symbols
   chosen from sequence of octets,
   along with metadata describing those common to most octets, that constitutes a
   record of the character encodings and input
   facilities available to Internet users.

      uric          = reserved / unreserved / escaped

   Within a URI, characters are either used as delimiters or to
   represent strings state of data (octets) within the delimited portions.
   Octets are either represented directly by a character (using resource at the
   US-ASCII character for time that octet [ASCII]) or by an escape encoding.
   This the
   representation is elaborated below.

2.1 URIs and non-ASCII characters

   The relationship between URIs and characters has been generated.  Retrieval is achieved by a source of
   confusion for characters process that are not part of US-ASCII. To describe
   might include using the relationship, it is useful to distinguish between URI as a "character"
   (as cache key to check for a distinguishable semantic entity) and an "octet" (an 8-bit
   byte). There are two mappings, one from locally
   cached representation, resolution of the URI characters to octets, determine an
   appropriate access mechanism (if any), and
   a second from octets to original characters:

   URI character sequence->octet sequence->original character sequence

   A URI is represented as a sequence dereference of characters, not as a sequence the URI for
   the sake of octets. That is because applying a retrieval operation.

   URI might be "transported" by means that references in information systems are not through a computer network, e.g., printed on paper, read over designed to be
   late-binding: the radio, etc.

   Within a delimited component of a URI, a sequence result of characters an access is
   used generally determined at the
   time it is accessed and may vary over time or due to represent a sequence other aspects of octets. For example,
   the character
   "a" represents interaction. When an author creates a reference to such a
   resource, they do so with the octet 97 (decimal), while intention that the character sequence
   "%", "0", "a" represents reference be used in
   the octet 10 (decimal).

   There future; what is a second translation for being identified is not some resources: specific result that
   was obtained in the sequence of
   octets defined past, but rather some characteristic that is
   expected to be true for future results.  In such cases, the resource
   referred to by a component of the URI is subsequently used to
   represent actually a sequence sameness of characters. A 'charset' defines this mapping.
   There are characteristics as
   observed over time, perhaps elucidated by additional comments or
   assertions made by the resource provider.

   Although many charsets in URI schemes are named after protocols, this does not
   imply that use in Internet protocols. For example,
   UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences of characters such a URI will result in access to the repertoire of ISO 10646.

   In the simplest case, resource
   via the original character sequence contains only
   characters that named protocol.  URIs are defined in US-ASCII, and often used simply for the two levels sake of



Berners-Lee, et al.    Expires September 1, 2003                [Page 9]

Internet-Draft
   identification.  Even when a URI Generic Syntax                 March 2003


   mapping are simple and easily invertible: each 'original character' is represented as used to retrieve a representation
   of a resource, that access might be through gateways, proxies,
   caches, and name resolution services that are independent of the octet for
   protocol associated with the US-ASCII code for it, which is,
   in turn, represented as either the US-ASCII character, or else the
   "%" escape sequence for that octet.

   For original character sequences that contain non-ASCII characters,
   however, scheme name, and the situation is more difficult. Internet protocols that
   transmit octet sequences intended to represent character sequences
   are expected to provide some way resolution of identifying some
   URIs may require the charset used, if
   there might be use of more than one [RFC2277].  However, there is currently
   no provision within the generic protocol (e.g., both DNS
   and HTTP are typically used to access an "http" URI's origin server
   when a representation isn't found in a local cache).




Berners-Lee, et al.    Expires November 21, 2003                [Page 8]

Internet-Draft             URI Generic Syntax                   May 2003


1.2.3 Hierarchical Identifiers

   The URI syntax is organized hierarchically, with components listed in
   decreasing order from left to accomplish this
   identification. An individual right.  For some URI scheme may require a single
   charset, define a default charset, or provide a way schemes, the
   visible hierarchy is limited to indicate the
   charset used.  For example, a new scheme "foo" might be defined such
   that any escaped octet itself: everything after
   the scheme component delimiter is keyed considered opaque to URI
   processing. Other URI schemes make the UTF-8 encoding in order hierarchy explicit and visible
   to
   determine generic parsing algorithms.

   The URI syntax reserves the corresponding Unicode character.

   It is expected that a systematic treatment of character encoding
   within URIs will be developed as a future modification slash ("/"), question-mark ("?"), and
   crosshatch ("#") characters for the purpose of this
   specification.

2.2 Reserved Characters

   Many URI include delimiting components consisting of or delimited by, certain
   special characters.  These characters
   that are called "reserved", since
   their usage within significant to the URI component is limited generic parser's hierarchical
   interpretation of an identifier.  In addition to their reserved
   purpose.  If aiding the data for a URI component would conflict with the
   reserved purpose, then
   readability of such identifiers through the conflicting data must consistent use of
   familiar syntax, this uniform representation of hierarchy across
   naming schemes allows scheme-independent references to be escaped before
   forming the URI.

      reserved    = "[" / "]" / ";" / "/" / "?" /
                    ":" / "@" / "&" / "=" / "+" / "$" / ","

   The "reserved" syntax class above refers made
   relative to those characters that are
   allowed within hierarchy.

   An "absolute" URI refers to a URI, but resource independent of the naming
   hierarchy in which may not be allowed the identifier is used.  In contrast, a "relative"
   URI refers to a resource by describing the difference within a
   particular component of
   hierarchical name space between the generic current context and an absolute
   URI syntax; they are used as
   delimiters of the components described in resource.  Section 3.

   Characters in the "reserved" set are not reserved in all contexts.
   The set 4.2 defines a scheme-independent form
   of characters actually reserved within any given relative URI
   component is defined by reference that component. In general, can be used in conjunction with a character is
   reserved if base
   URI of a hierarchical scheme to produce the semantics absolute URI form of that
   reference.

1.3 Syntax Notation

   This document uses the Augmented Backus-Naur Form (ABNF) notation of
   [RFC2234] to define the URI changes if syntax. Although the ABNF defines syntax
   in terms of the character is
   replaced with its escaped US-ASCII encoding.








Berners-Lee, et al.    Expires September 1, 2003               [Page 10]

Internet-Draft character encoding [ASCII], the URI Generic Syntax                 March 2003


2.3 Unreserved Characters

   Data characters that are allowed syntax
   should be interpreted in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set terms of punctuation marks and
   symbols.

      unreserved  = ALPHA / DIGIT / mark

      mark        = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")"

   Unreserved characters can be escaped without changing the semantics
   of character that the URI, but this should not be done unless
   ASCII-encoded octet represents, rather than the URI is being used
   in octet encoding
   itself.  How a context that does not allow the unescaped character to appear. URI normalization processes may unescape sequences is represented in the ranges terms of
   ALPHA (%41-%5A bits and %61-%7A), DIGIT (%30-%39), underscore (%5F), or
   tilde (%7E) without fear of creating a conflict, but unescaping bytes on the
   other mark characters
   wire is usually counterproductive.

2.4 Escape Sequences

   Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to
   a printable dependent upon the character encoding of the US-ASCII coded character set, or that
   corresponds protocol used to any US-ASCII character that is disallowed, as
   explained below.

2.4.1 Escaped Encoding

   An escaped octet is encoded as a character triplet, consisting
   transport it, or the charset of the percent character "%" followed document that contains it.

   The following core ABNF productions are used by the two hexadecimal digits
   representing the octet code in . For example, "%20" is the escaped
   encoding for the US-ASCII space character.

      escaped     = "%" HEXDIG HEXDIG


2.4.2 When to Escape this specification as
   defined by Section 6.1 of [RFC2234]: ALPHA, CR, CTL, DIGIT, DQUOTE,
   HEXDIG, LF, OCTET, and Unescape

   A SP. The complete URI syntax is always collected in an "escaped" form, since escaping or unescaping a
   completed URI might change its semantics.  Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or
   not escaping a character will change its semantics. Likewise, a URI
   must be separated into its components before the escaped characters
   within those components can be safely decoded.
   Appendix A.









Berners-Lee, et al.    Expires September 1, November 21, 2003                [Page 11] 9]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If the
   given


2. Characters

   A URI scheme defines consists of a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead restricted set of "~" characters, primarily chosen
   to aid transcription and usability both in an http computer systems and in
   non-computer communications.  Characters used conventionally as
   delimiters around a URI
   path, but the two are equivalent for an http URI.

   Because excluded.  The set of URI characters
   consists of digits, letters, and a few graphic symbols chosen from
   those common to most of the percent "%" character always has the encodings and input facilities
   available to Internet users.

      uric        = reserved purpose of
   being the escape indicator, it must be / unreserved / escaped as "%25" in order

   Within a URI, reserved characters are used to
   be delimit syntax
   components, unreserved characters are used as to describe registered
   names, and unreserved, non-delimiting reserved, and escaped
   characters are used to represent strings of data (1*OCTET) within a URI.  Implementers should be careful the
   components.

2.1 Encoding of Characters

   As described above (Section 1.3), the URI syntax is defined in terms
   of characters by reference to the US-ASCII encoding of characters to
   octets.  This specification does not mandate the use of any
   particular mapping between its character set and the octets used to
   escape
   store or unescape transmit those characters.

   URI characters representing strings of data within a component may,
   if allowed by the same string more than once, since unescaping component production, represent an already unescaped string arbitrary
   sequence of octets.  For example, portions of a given URI might lead
   correspond to misinterpreting a percent filename on a non-ASCII file system, a query on
   non-ASCII data, numeric coordinates on a map, etc.  Some URI schemes
   define a specific encoding of raw data character to US-ASCII characters as another escaped character, or vice versa part
   of their scheme-specific requirements. Most URI schemes represent
   data octets by the US-ASCII character corresponding to that octet,
   either directly in the
   case form of escaping an already escaped string.

2.4.3 Excluded US-ASCII Characters

   Although they are disallowed within the character's glyph or by use of an
   escape triplet (Section 2.4).

   When a URI syntax, we include here scheme defines a
   description component that represents textual data
   consisting of those US-ASCII characters from the Unicode (ISO 10646) character set,
   we recommend that have been excluded and the reasons for their exclusion.

   The control characters (CTL) in data be encoded first as octets according to
   the US-ASCII coded UTF-8 [UTF-8] character set encoding, and then escaping any octets
   that are not used within a URI, both because they are non-printable in the unreserved character set.

2.2 Reserved Characters

   URIs include components and
   because they sub-components that are likely to be misinterpreted delimited by some control
   mechanisms.

   The space character (SP) is excluded because significant spaces may
   disappear and insignificant spaces may be introduced when a URI is
   transcribed or typeset or subjected to the treatment of
   word-processing programs.  Whitespace is also used to delimit a URI
   in many contexts.

   The angle-bracket "<" and ">" and double-quote (")
   certain special characters.  These characters are
   excluded because they are often used as the delimiters around called "reserved",
   since their usage within a URI
   in text documents and protocol fields.  The character "#" is excluded
   because it component is used limited to delimit a their reserved



Berners-Lee, et al.    Expires November 21, 2003               [Page 10]

Internet-Draft             URI from a fragment identifier in Generic Syntax                   May 2003


   purpose within that component.  If data for a URI reference (Section 4). The percent character "%" is excluded
   because it is used for component would
   conflict with the encoding of reserved purpose, then the conflicting data must be
   escaped characters.

      delims (Section 2.4) before forming the URI.

      reserved    = "<" "/" / ">" "?" / "#" / "%" "[" / DQUOTE

   Other characters are excluded because gateways and other transport
   agents are known to sometimes modify such characters, or they are
   used as delimiters.

      unwise      = "{" "]" / "}" ";" / "|"
                    ":" / "\" "@" / "^" "&" / "`"




Berners-Lee, et al.    Expires September 1, 2003               [Page 12]

Internet-Draft             URI Generic Syntax                 March 2003


   Data corresponding to excluded "=" / "+" / "$" / ","

   Reserved characters must be escaped in order to
   be properly represented within a URI.

















































Berners-Lee, et al.    Expires September 1, 2003               [Page 13]

Internet-Draft             URI Generic Syntax                 March 2003


3. URI Syntactic Components

   The URI syntax is dependent upon the scheme.  In general, absolute
   URIs are written used as follows:

      <scheme>:<scheme-specific-part>

   An absolute URI contains the name delimiters of the scheme being used (<scheme>)
   followed by a colon (":") and then a string (the
   <scheme-specific-part>) whose interpretation depends on the scheme.

   The generic URI
   components described in Section 3, as well as within those components
   for delimiting sub-components.  A component's ABNF syntax does rule will
   not require use the "reserved" production directly; instead, each rule lists
   those reserved characters that are allowed within that component.
   Allowed reserved characters that are not assigned a sub-component
   delimiter role by this specification should be considered reserved
   for special use by whatever software generates the scheme-specific-part have
   any general structure URI (i.e., they
   may be used to delimit or set of semantics which indicate information that is common among all
   URIs.  However, a subset significant to
   interpretation of URI do share a common syntax for
   representing hierarchical relationships within the namespace.  This
   "generic URI" syntax consists identifier, but that significance is outside
   the scope of a sequence this specification).  Outside of four main components:

      <scheme>://<authority><path>?<query>

   each the URI's origin, a
   reserved character cannot be escaped without fear of which, except <scheme>, may changing how it
   will be absent from a particular URI.
   For example, some URI schemes do not allow interpreted; likewise, an <authority> component,
   and others do not use escaped octet that corresponds to a <query> component.

      absolute-URI  = scheme ":" ( hier-part / opaque-part )

   URIs
   reserved character cannot be unescaped outside the software that is
   responsible for interpreting it during URI resolution.

   The slash ("/"), question-mark ("?"), and crosshatch ("#") characters
   are hierarchical reserved in nature use the slash "/" character all URI for
   separating hierarchical components.  For some file systems, a "/"
   character (used the purpose of delimiting components that
   are significant to denote the generic parser's hierarchical structure interpretation
   of an identifier.  The hierarchical prefix of a URI) is URI, wherein the
   delimiter used to construct
   slash ("/") character signifies a file name hierarchy, and thus hierarchy delimiter, extends from
   the URI
   path will look similar scheme (Section 3.1) through to a file pathname.  This does NOT imply that the resource is a file first question-mark ("?"),
   crosshatch ("#"), or that the URI maps to an actual filesystem
   pathname.

      hier-part     = [ net-path / abs-path ] [ "?" query ]

      net-path      = "//" authority [ abs-path ]

      abs-path      = "/"  path-segments

   URIs that do not make use end of the URI string. In other words, the
   slash "/" ("/") character for separating is not treated as a hierarchical separator
   within the query (Section 3.4) and fragment (Section 3.5) components are
   of a URI, but is still considered opaque by reserved within those components
   for purposes outside the generic scope of this specification.

2.3 Unreserved Characters

   Data characters that are allowed in a URI
   parser.

      opaque-part   = uric-no-slash *uric

      uric-no-slash = but do not have a reserved
   purpose are called unreserved.  These include uppercase and lowercase
   letters, decimal digits, and a limited set of punctuation marks and
   symbols.

      unreserved  = ALPHA / escaped / "[" / "]" DIGIT / ";" mark

      mark        = "-" / "?" "_" /
                      ":" "." / "@" "!" / "&" "~" / "=" "*" / "+" "'" / "$" "(" / "," ")"

   Unreserved characters can be escaped without changing the semantics
   of a URI, but this should not be done unless the URI is being used in



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 14] 11]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   We use


   a context that does not allow the term <path> to refer unescaped character to both the <abs-path> and
   <opaque-part> constructs, since they are mutually exclusive for any
   given appear. URI
   normalization processes may unescape sequences in the ranges of ALPHA
   (%41-%5A and can be parsed as %61-%7A), DIGIT (%30-%39), hyphen (%2D), underscore
   (%5F), or tilde (%7E) without fear of creating a single component.

3.1 Scheme Component

   Just as there are many different methods of access conflict, but
   unescaping the other mark characters is usually counterproductive.

2.4 Escaped Characters

   Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to resources,
   there are
   a variety of schemes for identifying such resources.  The
   URI syntax consists printable character of the US-ASCII coded character set or
   corresponds to a sequence of components separated by reserved
   characters, with US-ASCII character that delimits the first component defining the semantics from
   others, is reserved in that component for the
   remainder of the delimiting sub-components,
   or is excluded from any use within a URI string.

   Scheme names consist of (Section 2.5).

2.4.1 Escaped Encoding

   An escaped octet is encoded as a sequence character triplet, consisting of characters beginning with a
   lower case letter and
   the percent character "%" followed by any combination of lower case
   letters, digits, plus ("+"), period ("."), or hyphen ("-"). the two hexadecimal digits
   representing that octet's numeric value.  For
   resiliency, programs interpreting a URI should treat upper case
   letters example, "%20" is the
   escaped encoding for the US-ASCII space character (SP).  This is
   sometimes referred to as "percent-encoding" the octet.

      escaped     = "%" HEXDIG HEXDIG

   The uppercase hexadecimal digits 'A' through 'F' are equivalent to lower
   the lowercase digits 'a' through 'f', respectively.  Two URIs that
   differ only in the case of hexadecimal digits used in scheme names (e.g., allow
   "HTTP" as well as "http").

      scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

   Relative URI references escaped octets
   are distinguished from absolute equivalent.  For consistency, we recommend that uppercase digits
   be used by URI in generators and normalizers.

2.4.2 When to Escape and Unescape

   Under normal circumstances, the only time that
   they do not begin with characters within a scheme name.  Instead, the scheme is
   inherited from the base URI, as described in Section 5.2.

3.2 Authority Component

   Many
   URI schemes include a top hierarchical element for a naming
   authority, such that the namespace defined by string are escaped is during the remainder process of generating the URI is governed by
   from its component parts.  Each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that authority.  This authority component is
   typically defined by an Internet-based server can determine whether or
   not escaping a scheme-specific
   registry of naming authorities.

      authority     = server / reg-name character will change its semantics.  The authority component exception is preceded by
   when a double slash "//" and URI is
   terminated by the next slash "/", question-mark "?", or by the end of
   the URI.  Within the authority component, being used within a context where the unreserved "mark"
   characters ";", ":",
   "@", "?", "/", "[", and "]" are reserved.

   An authority component is not required might need to be escaped, such as when used for a
   command-line argument or within a single-quoted attribute.

   Once generated, a URI scheme to make use
   of relative references.  A base URI without is always in an authority component
   implies escaped form.  When a URI is
   resolved, the components significant to that any relative reference will also scheme-specific
   resolution process (if any) must be without an authority
   component. parsed and separated before the
   escaped characters within those components can be safely unescaped.




Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 15] 12]

Internet-Draft             URI Generic Syntax                 March                   May 2003


3.2.1 Registry-based Naming Authority

   The structure


   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of a registry-based naming authority is specific to
   the URI scheme, but constrained to the allowed characters for an
   authority component.

      reg-name      = 1*( unreserved /
   "mark" characters are automatically escaped / ";" /
                          ":" / "@" / "&" / "=" / "+" / "$" / "," )


3.2.2 Server-based Naming Authority by some systems.  A URI schemes
   normalizer may unescape escaped octets that involve are represented by
   characters in the direct use unreserved set.  For example, "%7E" is sometimes
   used instead of tilde ("~") in an IP-based protocol "http" URI path and can be
   converted to "~" without changing the interpretation of the URI.

   Because the percent ("%") character serves as the escape indicator,
   it must be escaped as "%25" in order for that octet to be used as
   data within a
   specified server on URI.  Implementers should be careful not to escape or
   unescape the Internet use same string more than once, since unescaping an already
   unescaped string might lead to misinterpreting a common syntax for percent data
   character as another escaped character, or vice versa in the server
   component case of
   escaping an already escaped string.

2.5 Excluded Characters

   Although they are disallowed within the URI's scheme-specific data:

      <userinfo>@<host>:<port>

   where <userinfo> may consist URI syntax, we include here
   a description of those characters that have been excluded and the
   reasons for their exclusion.

      excluded    = invisible / delims / unwise

   The control characters (CTL) in the US-ASCII coded character set are
   not used within a user name and, optionally,
   scheme-specific information about how to gain authorization URI, both because they are non-printable and
   because they are likely to access
   the server. be misinterpreted by some control
   mechanisms. The parts "<userinfo>@" space character (SP) is excluded because significant
   spaces may disappear and ":<port>" insignificant spaces may be omitted. If
   <host> is omitted, the default host introduced when
   a URI is defined by transcribed, typeset, or subjected to the scheme-specific
   semantics treatment of the URI (e.g., the "file" URI scheme defaults
   word-processing programs.  Whitespace is also used to
   "localhost", whereas the "http" delimit a URI scheme does not allow host to be
   omitted).

      server
   in many contexts. Characters outside the US-ASCII set are excluded as
   well.

      invisible   = [ [ userinfo "@" ] hostport ] CTL / SP / %x80-FF

   The user information, if present, is followed by angle-bracket ("<" and ">") and double-quote (") characters are
   excluded because they are often used as the delimiters around a commercial
   at-sign "@".

      userinfo URI
   in text documents and protocol fields.  The percent character ("%")
   is excluded because it is used for the encoding of escaped (Section
   2.4) characters.

      delims      = *( unreserved "<" / escaped ">" / ";" "%" /
                         ":" DQUOTE

   Other characters are excluded because gateways and other transport
   agents are known to sometimes modify such characters.

      unwise      = "{" / "&" "}" / "=" "|" / "+" "\" / "$" "^" / "," )

   Some "`"



Berners-Lee, et al.    Expires November 21, 2003               [Page 13]

Internet-Draft             URI schemes use the format "user:password" in the userinfo
   field. This practice is NOT RECOMMENDED, because the passing of
   authentication information in clear text has proven Generic Syntax                   May 2003


   Data octets corresponding to excluded characters must be a security
   risk escaped in almost every case where it has been used. Note also that
   userinfo which is crafted
   order to look like a trusted domain name might be
   used to mislead users, as described in Section 7.5. represented within a URI.

















































Berners-Lee, et al.    Expires November 21, 2003               [Page 14]

Internet-Draft             URI Generic Syntax                   May 2003


3. Syntax Components

   The server is identified by generic URI syntax consists of a network host --- hierarchical sequence of
   components referred to as described by an
   IPv6 literal encapsulated within square brackets, an IPv4 address in
   dotted-decimal form, or a domain name --- and an optional port
   number. The server's port, if any is required by the URI scheme, can
   be specified by a port number in decimal following the host authority, path, query, and
   delimited from it by a colon (":") character.  If no explicit port
   number is given, the default port number, as defined by the URI



Berners-Lee, et al.    Expires September 1, 2003               [Page 16]

Internet-Draft             URI Generic Syntax                 March 2003


   scheme, is assumed.  The type of network port identified by the URI
   (e.g., TCP, UDP, SCTP, etc.) is defined by the scheme-specific
   semantics of the
   fragment.

      URI scheme.

      hostport         = host [ scheme ":" port hier-part [ "?" query ]
      host [ "#" fragment ]

      hier-part   = IPv6reference net-path / IPv4address abs-path / hostname
      port rel-path

      net-path    = *DIGIT

   A hostname takes the form described in Section 3 of [RFC1034] and
   Section 2.1 of [RFC1123]: a sequence of domain labels separated by
   ".", each domain label starting and ending with an alphanumeric
   character and possibly also containing "-" characters. "//" authority [ abs-path ]
      abs-path    = "/"  path-segments
      rel-path    = path-segments

   The rightmost
   domain label of a fully qualified domain name will never start with a
   digit, thus syntactically distinguishing domain names from IPv4
   addresses, scheme and path components are required, though path may be followed empty
   (no characters).  An ABNF-driven parser of hier-part will find that
   the three productions in the rule are ambiguous: they are
   disambiguated by the "first-match-wins" (a.k.a. "greedy") algorithm.
   In other words, if the string begins with two slash characters ("//
   "), then it is a single "." net-path; if it begins with only one slash
   character, then it is necessary to
   distinguish between the complete domain name and any local domain.

      hostname      = domainlabel qualified
      qualified     = *( "." domainlabel ) [ "." toplabel "." ]
      domainlabel   = alphanum [ 0*61( alphanum | "-" ) alphanum ]
      toplabel      = alpha    [ 0*61( alphanum | "-" ) alphanum ]
      alphanum      = ALPHA / DIGIT

   A host identified by an IPv4 literal address abs-path; otherwise, it is represented in
   dotted-decimal notation (a sequence of four decimal numbers in the
   range 0 to 255, separated by "."), as described in [RFC1123] by
   reference to [RFC0952]. a rel-path.  Note
   that other forms of dotted notation may rel-path does not necessarily contain any slash ("/")
   characters; a non-hierarchical path will be interpreted on some platforms, treated as described in Section 7.3, but opaque data by
   a generic URI parser.

   The authority component is only present when a string matches the dotted-decimal form
   net-path production.  Since the presence of four octets is allowed by this
   grammar.

      IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
      dec-octet   = DIGIT /                         ; 0-9
                    ( %x31-39 DIGIT ) /             ; 10-99
                    ( "1" 2DIGIT ) /                ; 100-199
                    ( "2" %x30-34 DIGIT ) /         ; 200-249
                    ( "25" %x30-35 )                ; 250-255














Berners-Lee, et al.    Expires September 1, 2003               [Page 17]

Internet-Draft             URI Generic Syntax                 March 2003


   A host identified by an IPv6 literal address [RFC2373] is
   distinguished by enclosing authority component
   restricts the IPv6 literal within square-brakets
   ("[" and "]").  This remaining syntax for path, we have not included a
   specific "path" rule in the syntax.  Instead, what we refer to as the
   URI path is that part of the only place where square-bracket
   characters are allowed parsed URI string matching the abs-path
   or rel-path production in the hierarchical URI syntax.

      IPv6reference = "[" IPv6address "]"

      IPv6address   = (                          6( h4 ":" ) ls32 )
                    / (                     "::" 5( h4 ":" ) ls32 )
                    / ( [              h4 ] "::" 4( h4 ":" ) ls32 )
                    / ( [ *1( h4 ":" ) h4 ] "::" 3( h4 ":" ) ls32 )
                    / ( [ *2( h4 ":" ) h4 ] "::" 2( h4 ":" ) ls32 )
                    / ( [ *3( h4 ":" ) h4 ] "::"    h4 ":"   ls32 )
                    / ( [ *4( h4 ":" ) h4 ] "::"             ls32 )
                    / ( [ *5( h4 ":" ) h4 ] "::"             h4   )
                    / ( [ *6( h4 ":" ) h4 ] "::"                  )

      ls32          = ( h4 ":" h4 ) / IPv4address
                    ; least-significant 32 bits of address

      h4            = 1*4HEXDIG


3.3 Path Component

   The path component contains data, specific syntax above, since they are mutually
   exclusive for any given URI and can be parsed as a single component.

3.1 Scheme

   Each URI begins with a scheme name that refers to a specification for
   assigning identifiers within that scheme. As such, the authority (or the
   scheme if there URI syntax is no authority component), identifying the resource
   within
   a federated and extensible naming system wherein each scheme's
   specification may further restrict the scope syntax and semantics of
   identifiers using that scheme.

   Scheme names consist of a sequence of characters beginning with a
   letter and followed by any combination of letters, digits, plus
   ("+"), period ("."), or hyphen ("-").  Although scheme is
   case-insensitive, the canonical form is lowercase and authority.

      path          = [ abs-path / opaque-part ]

      path-segments = segment *( "/" segment )
      segment       = *pchar

      pchar         = unreserved / escaped / ";" /
                      ":" / "@" / "&" / "=" / "+" / "$" / ","

   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters "/
   ", ";", "=", and "?" are reserved.  The semicolon (";") and equals
   ("=") characters have the reserved purpose of delimiting parameters
   and parameter values within a path segment.  However, parameters are
   not significant documents that
   specify schemes must do so using lowercase letters.  An
   implementation should accept uppercase letters as equivalent to the parsing of relative references.
   lowercase in scheme names (e.g., allow "HTTP" as well as "http"), for



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 18] 15]

Internet-Draft             URI Generic Syntax                 March                   May 2003


3.4 Query Component

   The query component is a string of information to be interpreted by


   the resource.

      query sake of robustness, but should only generate lowercase scheme
   names, for consistency.

      scheme      = ALPHA *( pchar ALPHA / "/" DIGIT / "?" "+" / "-" / "." )

   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$"

   Individual schemes are reserved.










































Berners-Lee, et al.    Expires September 1, 2003               [Page 19]

Internet-Draft             URI Generic Syntax                 March 2003


4. URI References not specified by this document. The term "URI-reference" is used here to denote the common usage process
   for registration of
   a resource identifier.  A new URI reference may be absolute or relative,
   and may have additional information attached in schemes is defined separately by
   [RFC2717].  The scheme registry maintains the form of mapping between scheme
   names and their specifications.

3.2 Authority

   Many URI schemes include a
   fragment identifier.  However, "the URI" that results from such hierarchical element for a
   reference includes only the absolute URI after naming
   authority, such that governance of the name space defined by the
   remainder of the fragment
   identifier (if any) is removed and after any relative URI is resolved delegated to its absolute form.  Although that authority (which may, in
   turn, delegate it is possible to limit the
   discussion of URI further).  The generic syntax provides a common
   means for distinguishing an authority based on a registered domain
   name or server address, along with optional port and semantics to that of the absolute
   result, most usage of URI user
   information.

   The authority component is within general URI references, preceded by a double slash ("//") and it is
   impossible to obtain
   terminated by the URI from such a reference without also
   parsing next slash ("/"), question-mark ("?"), or
   crosshatch ("#") character, or by the fragment and resolving end of the relative form.

      URI-reference URI.

      authority   = [ absolute-URI / relative-URI userinfo "@" ] host [ "#" fragment ":" port ]

   Many protocol elements

   The parts "<userinfo>@" and ":<port>" may be omitted.

   Some schemes do not allow only the absolute form of a URI userinfo and/or port sub-components.
   When presented with an
   optional fragment identifier.

      absolute-URI-reference = absolute-URI [ "#" fragment ]

   The syntax for a relative URI is a shortened form of that for an
   absolute URI, where some prefix of violates one or more scheme-specific
   restrictions, the scheme-specific URI is missing and certain
   path components ("." and "..") have a special meaning when, and only
   when, interpreting a relative path.  The relative URI syntax is
   defined in Section 5.

4.1 Fragment Identifier

   When a URI reference is used to perform a retrieval action on resolution process should flag
   the
   identified resource, reference as an error rather than ignore the optional fragment identifier, separated from unused parts; doing
   so reduces the URI by a crosshatch ("#") character, consists number of equivalent URIs and helps detect abuses of additional
   reference information to be interpreted by
   the user agent after generic syntax that might indicate the
   retrieval action URI has been successfully completed.  As such, it is not
   part constructed
   to mislead the user (Section 7.5).

3.2.1 User Information

   The userinfo sub-component may consist of a URI, but user name and,
   optionally, scheme-specific information about how to gain
   authorization to access the server.  The user information, if
   present, is often used in conjunction with followed by a URI.

      fragment commercial at-sign ("@") that delimits it
   from the host.

      userinfo    = *( pchar unreserved / "/" escaped / "?" ";" /
                       ":" / "&" / "=" / "+" / "$" / "," )

   The semantics of a fragment identifier is a property of the data
   resulting from a retrieval action, regardless of the type of

   Some URI used
   in the reference.  Therefore, schemes use the format and interpretation of
   fragment identifiers is dependent on the media type [RFC2046] of the
   retrieval result.  The character restrictions described in Section 2
   for a URI also apply to the fragment "user:password" in a URI-reference.  Individual
   media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that
   can be identified within that media type. userinfo



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 20] 16]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   A fragment identifier is only meaningful when a URI reference


   field. This practice is
   intended for retrieval and NOT RECOMMENDED, because the result passing of that retrieval is
   authentication information in clear text has proven to be a document
   for which the identified fragment is consistently defined.

4.2 Same-document References

   A URI reference security
   risk in almost every case where it has been used. Note also that does not contain a URI is a reference
   userinfo might be crafted to the
   current document.  In other words, an empty URI reference within a
   document is interpreted as look like a reference trusted domain name in order
   to the start mislead users, as described in Section 7.5.

3.2.2 Host

   The host sub-component of that document,
   and a reference containing only a fragment identifier authority is a reference
   to the identified fragment of that document.  Traversal of such a
   reference should not result in by an additional retrieval action.
   However, if the URI reference occurs IPv6 literal
   encapsulated within square brackets, an IPv4 address in
   dotted-decimal form, or a context that domain name.

      host        = [ IPv6reference / IPv4address / hostname ]

   If host is always
   intended to result in omitted, a new request, as in default may be defined by the case scheme-specific
   semantics of HTML's FORM
   element [HTML], then an empty URI reference represents the base URI
   of URI.  For example, the current document and should be replaced by that URI when
   transformed into a request.

4.3 Parsing a URI Reference

   A "file" URI reference is typically parsed according scheme defaults to
   "localhost", whereas the four main
   components and fragment identifier in order "http" URI scheme does not allow host to determine what
   components are present and whether the reference is relative or
   absolute. be
   omitted.

   The individual components are then parsed production for their
   subparts and, if not opaque, to verify their validity.

   Although the BNF defines what is allowed in each component, it host is ambiguous in terms of differentiating because it does not completely
   distinguish between an authority component IPv4address and a path component that begins with two slash characters.  The
   greedy algorithm is used for disambiguation: hostname.  Again, the left-most matching
   rule soaks up as much of
   "first-match-wins" algorithm applies: If host matches the URI reference string as production
   for IPv4address, then it is capable of
   matching.  In other words, the authority component wins.

   Readers familiar with regular expressions should see Appendix B for a
   concrete parsing example be considered an IPv4 address literal
   and test oracle.
















Berners-Lee, et al.    Expires September 1, 2003               [Page 21]

Internet-Draft             URI Generic Syntax                 March 2003


5. Relative URI References

   It is often the case that a group or "tree" of documents has been
   constructed to serve not a common purpose; hostname.

   A hostname takes the vast majority of URIs form described in
   these documents point to resources within the tree rather than
   outside of it.  Similarly, documents located at a particular site are
   much more likely to refer to other resources at that site than to
   resources at remote sites.

   Relative addressing of URIs allows document trees to be partially
   independent Section 3 of their location [RFC1034] and access scheme.  For instance, it is
   possible for
   Section 2.1 of [RFC1123]: a single set sequence of hypertext documents to be simultaneously
   accessible and traversable via domain labels separated by
   ".", each of the "file", "http", domain label starting and "ftp"
   schemes if the documents refer to each other using relative URIs.
   Furthermore, such document trees can be moved, as a whole, without
   changing any of the relative references.  Experience within the WWW
   has demonstrated that the ability to perform relative referencing is
   necessary for the long-term usability of embedded URIs. ending with an alphanumeric
   character and possibly also containing "-" characters.  The relative URI syntax takes advantage of the <hier-part> syntax rightmost
   domain label of
   <absolute-URI> (Section 3) in order to express a reference that fully qualified domain name may be followed by a
   single "." if it is
   relative necessary to distinguish between the namespace of another hierarchical URI.

      relative-URI complete
   domain name and some local domain.

      hostname    = domainlabel qualified
      qualified   = *( "." domainlabel ) [ net-path / abs-path / rel-path "." ]
      domainlabel = alphanum [ "?" query 0*61( alphanum / "-" ) alphanum ]
      alphanum    = ALPHA / DIGIT

   A relative reference beginning with two slash characters host identified by an IPv4 literal address is termed a
   network-path reference, as defined represented in
   dotted-decimal notation (a sequence of four decimal numbers in the
   range 0 to 255, separated by <net-path> "."), as described in Section 3.  Such
   references are rarely used.

   A relative [RFC1123] by
   reference beginning with a single slash character is
   termed an absolute-path reference, to [RFC0952].  Note that other forms of dotted notation may
   be interpreted on some platforms, as defined by <abs-path> described in Section 3.

   A relative reference that does not begin with a scheme name or a
   slash character 7.3, but
   only the dotted-decimal form of four octets is termed a relative-path reference.

      rel-path      = rel-segment [ abs-path ]

      rel-segment allowed by this
   grammar.

      IPv4address = 1*( unreserved / escaped / ";" /
                          "@" / "&" / "=" / "+" / "$" / "," )

   Within a relative-path reference, the complete path segments dec-octet "." and
   ".." have special meanings: "the current hierarchy level" and "the
   level above this hierarchy level", respectively.  Although this is
   very similar to their use within Unix-based filesystems to indicate
   directory levels, these path components are only considered special
   when resolving a relative-path reference to its absolute form
   (Section 5.2). dec-octet "." dec-octet "." dec-octet




Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 22] 17]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   Authors should be aware that a path segment which contains a colon
   character cannot be used as the first segment of a relative URI path
   (e.g., "this:that"), because it would be mistaken for a scheme name.
   It


      dec-octet   = DIGIT                 ; 0-9
                  / %x31-39 DIGIT         ; 10-99
                  / "1" 2DIGIT            ; 100-199
                  / "2" %x30-34 DIGIT     ; 200-249
                  / "25" %x30-35          ; 250-255

   A host identified by an IPv6 literal address [RFC3513] is therefore necessary to precede such segments with other
   segments (e.g., "./this:that") in order for them to be referenced as
   a relative path.

   It
   distinguished by enclosing the IPv6 literal within square-brackets
   ("[" and "]").  This is not necessary for all the only place where square-bracket
   characters are allowed in the URI syntax.

      IPv6reference = "[" IPv6address "]"

      IPv6address =                          6( h4 ":" ) ls32
                  /                     "::" 5( h4 ":" ) ls32
                  / [              h4 ] "::" 4( h4 ":" ) ls32
                  / [ *1( h4 ":" ) h4 ] "::" 3( h4 ":" ) ls32
                  / [ *2( h4 ":" ) h4 ] "::" 2( h4 ":" ) ls32
                  / [ *3( h4 ":" ) h4 ] "::"    h4 ":"   ls32
                  / [ *4( h4 ":" ) h4 ] "::"             ls32
                  / [ *5( h4 ":" ) h4 ] "::"             h4
                  / [ *6( h4 ":" ) h4 ] "::"

      ls32        = ( h4 ":" h4 ) / IPv4address
                  ; least-significant 32 bits of address

      h4          = 1*4HEXDIG

   The presence of host within a given URI does not imply that the scheme to be
   restricted
   requires access to the <hier-part> syntax, since given host on the hierarchical
   properties of that Internet.  In many cases,
   the host syntax are only necessary when a relative URI is used within a particular document.  Documents can only make use for the sake of a
   relative URI when their base URI fits within reusing the <hier-part> syntax.
   It is assumed that any document which contains a relative reference
   will also have existing
   registration process created and deployed for DNS, thus obtaining a base URI that obeys
   globally unique name without the syntax.  In other words, a
   relative URI cannot be used within a document that has an unsuitable
   base URI.

   Some URI schemes do cost of deploying another registry.
   However, such use comes with its own costs: domain name ownership may
   change over time for reasons not allow a hierarchical syntax matching anticipated by the
   <hier-part> syntax, and thus cannot use relative references.

5.1 Establishing a Base URI creator.

3.2.3 Port

   The term "relative URI" implies that there exists some absolute "base
   URI" against which the relative reference port sub-component of authority is applied.  Indeed, designated by an optional
   port number in decimal following the
   base URI host and delimited from it by a
   single colon (":") character.

      port        = *DIGIT

   If port is necessary to define omitted, a default may be defined by the scheme-specific
   semantics of any relative URI
   reference; without it, a relative reference the URI.  Likewise, the type of network port designated
   by the port number (e.g., TCP, UDP, SCTP, etc.) is meaningless.  In order
   for relative URI to be usable within a document, defined by the base URI of that
   document must be known to
   scheme. For example, the parser.

   The base "http" URI of scheme defines a document can be established in one of four ways,
   listed below in order of precedence.  The order of precedence can be
   thought of in terms default of layers, where the innermost defined base URI
   has the highest precedence.  This can be visualized graphically as:


















Berners-Lee, et al.    Expires September 1, 2003               [Page 23]

Internet-Draft TCP



Berners-Lee, et al.    Expires November 21, 2003               [Page 18]

Internet-Draft             URI Generic Syntax                 March                   May 2003


      .----------------------------------------------------------.
      |  .----------------------------------------------------.  |
      |  |  .----------------------------------------------.  |  |
      |  |  |  .----------------------------------------.  |  |  |
      |  |  |  |  .----------------------------------.  |  |  |  |
      |  |  |  |  |       <relative-reference>       |  |  |  |  |
      |  |  |  |  `----------------------------------'  |  |  |  |
      |  |  |  | (5.1.1) Base URI embedded


   port 80.

3.3 Path

   The path component contains hierarchical data that, along with data
   in the       |  |  |  |
      |  |  |  |         document's content             |  |  |  |
      |  |  |  `----------------------------------------'  |  |  |
      |  |  | (5.1.2) Base URI optional query (Section 3.4) component, serves to identify a
   resource within the scope of that URI's scheme and naming authority
   (if any).  There is no specific "path" syntax production in the encapsulating entity |  |  |
      |  |  |         (message, document, or none).        |  |  |
      |  |  `----------------------------------------------'  |  |
      |  | (5.1.3)
   generic URI used syntax.  Instead, what we refer to retrieve as the entity            |  |
      |  `----------------------------------------------------'  |
      | (5.1.4) Default Base URI path is application-dependent        |
      `----------------------------------------------------------'


5.1.1 Base URI within Document Content

   Within certain document media types,
   that part of the base parsed URI of string matching either the document can
   be embedded within abs-path or
   the content itself such that it can be readily
   obtained by a parser.  This rel-path production, since they are mutually exclusive for any
   given URI and can be useful for descriptive documents,
   such parsed as tables of content, which may be transmitted to others through
   protocols other than their usual retrieval context (e.g., E-Mail or
   USENET news).

   It a single component. The path is beyond
   terminated by the scope first question-mark ("?") or crosshatch ("#")
   character, or by the end of this document to specify how, for each
   media type, the base URI can be embedded.  It URI.

      path-segments = segment *( "/" segment )
      segment       = *pchar

      pchar         = unreserved / escaped / ";" /
                      ":" / "@" / "&" / "=" / "+" / "$" / ","

   The path consists of a sequence of path segments separated by a slash
   ("/") character.  A path is assumed that user
   agents manipulating such media types will always defined for a URI, though the
   defined path may be able to obtain empty (zero length) or opaque (not containing any
   "/" delimiters).  For example, the
   appropriate syntax from that media type's specification.  An example URI <mailto:fred@example.com> has
   a path of how "fred@example.com".

   Within a path segment, the base semicolon (";") and equals ("=") reserved
   characters are often used for delimiting parameters and parameter
   values applicable to that segment.  The comma (",") reserved
   character is often used for similar purposes.  For example, one URI can
   generator might use a segment like "name;v=1.1" to indicate a
   reference to version 1.1 of "name", whereas another might use a
   segment like "name,1.1" to indicate the same. Parameter types may be embedded
   defined by scheme-specific semantics, but in most cases the Hypertext Markup Language
   (HTML) [HTML] meaning
   of a parameter is provided in Appendix D.

   A mechanism for embedding specific to the base URI within MIME container types
   (e.g., the message and multipart types) is defined by MHTML
   [RFC2110].  Protocols that do originator. Parameters are not use
   significant to the MIME message header syntax,
   but which do allow some form parsing of tagged metainformation to be included relative references.

   The path segments "." and ".." are defined for relative reference
   within messages, may define their own syntax the path name hierarchy.  They are intended for defining use at the base
   URI as part
   beginning of a message.

5.1.2 Base URI from the Encapsulating Entity

   If no base URI is embedded, relative path reference (Section 4.2) for indicating
   relative position within the base URI hierarchical tree of names, with a document is defined by
   similar effect to how they are used within some operating systems'
   file directory structure to indicate the document's retrieval context.  For current directory and parent
   directory, respectively.  Unlike a document that is enclosed file system, however, these
   dot-segments are only interpreted within another entity (such as a message or another document), the
   retrieval context is that entity; thus, the default base URI path hierarchy and
   must be removed as part of the URI normalization or resolution
   process, in accordance with the process described in Section 5.2.



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 24] 19]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   document is the base URI of the entity


3.4 Query

   The query component contains non-hierarchical data that, along with
   data in which the document is
   encapsulated.

5.1.3 Base URI from the Retrieval URI

   If no base URI is embedded and the document is not encapsulated path (Section 3.3) component, serves to identify a
   resource within some other entity (e.g., the top level scope of a composite entity),
   then, if a URI was used to retrieve the base document, that URI shall
   be considered the base URI.  Note that if URI's scheme and naming authority
   (if any). The query component is indicated by the retrieval was first question-mark
   ("?") character and terminated by a crosshatch ("#") character or by
   the
   result end of a redirected request, the last URI used (i.e., that which
   resulted in the actual retrieval of the document) is the base URI.

5.1.4 Default Base URI

   If none of the conditions described in Sections 5.1.1--5.1.3 apply,
   then the base URI is defined by the context of the application. Since
   this definition is necessarily application-dependent, failing

      query       = *( pchar / "/" / "?" )

   The characters slash ("/") and question-mark ("?") are allowed to
   define the base URI using one of the other methods may result in
   represent data within the
   same content being interpreted differently by different types of
   application.

   It query component, but such use is the responsibility of the distributor(s)
   discouraged; incorrect implementations of a document
   containing a relative URI resolution
   often fail to ensure that the base URI for that
   document can be established.  It must be emphasized that a distinguish them from hierarchical separators, thus
   resulting in non-interoperable results while parsing relative
   URI cannot be
   references.  However, since query components are often used reliably to carry
   identifying information in situations where the document's base
   URI form of "key=value" pairs, and one
   frequently used value is not well-defined.

5.2 Resolving Relative References a reference to Absolute Form

   This section describes an example algorithm another URI, it is sometimes
   better for resolving URI
   references that might be relative usability to a given base URI. include those characters unescaped.

3.5 Fragment

   The algorithm
   is intended fragment identifier component allows indirect identification of
   a secondary resource by reference to provide a definitive result primary resource and
   additional identifying information that can is selective within that
   resource. The identified secondary resource may be used to test
   the output some portion or
   subset of other implementations.  Implementation the primary resource, some view on representations of the algorithm
   itself
   primary resource, or some other resource that is not required, but merely named within
   the result given primary resource.  A fragment identifier component is indicated
   by an implementation
   must match the result that would be given presence of a crosshatch ("#") character and terminated by this algorithm.

   The base URI is established according to the rules
   end of Section 5.1 and
   parsed into the four main components as described in Section 3.  Note URI string.

      fragment    = *( pchar / "/" / "?" )

   The semantics of a fragment identifier are defined by the set of
   representations that only might result from a retrieval action on the scheme
   primary resource.  Therefore, the format and interpretation of a
   fragment identifier component is required to be present in the base
   URI; dependent on the other components media type
   [RFC2046] of a potential retrieval result.  Individual media types
   may be empty define their own restrictions on, or undefined.  A component is
   undefined if its preceding separator does not appear in structure within, the URI
   reference;
   fragment identifier syntax for specifying different types of subsets,
   views, or external references that are identifiable as fragments by
   that media type.  If the path component is never undefined, though it may be
   empty.  The base URI's query component primary resource is not used represented by multiple
   media types, as is often the resolution
   algorithm and may be discarded.

   For each URI reference (R), the following pseudocode describes an
   algorithm case for transforming R into its target (T), which resources whose representation
   is either an
   absolute URI or selected based on attributes of the current document, and R's optional fragment: retrieval request, then
   interpretation of the given fragment identifier must be consistent
   across all of those media types in order for it to be viable as an



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 25] 20]

Internet-Draft             URI Generic Syntax                 March                   May 2003


      (R.scheme, R.authority, R.path, R.query, fragment) = parse(R);
         -- The


   identifier.

   As with any URI, use of a fragment identifier component does not
   imply that a retrieval action will take place.  A URI reference is parsed into with a fragment
   identifier may be used to refer to the four components secondary resource without any
   implication that the primary resource is accessible.  However, if
   that URI is used in a context that does call for retrieval and
         -- is not
   a same-document reference (Section 4.4), the fragment identifier, identifier is
   only valid as described in Section 4.3. a reference if ((not validating) a retrieval action on the primary
   resource succeeds and (R.scheme == Base.scheme)) then
         -- A non-validating parser may ignore results in a scheme representation that defines the
   fragment.

   Fragment identifiers have a special role in information systems as
   the
         -- reference if it is identical primary form of client-side indirect referencing, allowing an
   author to specifically identify those aspects of an existing resource
   that are only indirectly provided by the base URI's scheme.
         undefine(R.scheme);
      endif;

      if defined(R.scheme) then
         T.scheme    = R.scheme;
         T.authority = R.authority;
         T.path      = R.path;
         T.query     = R.query;
      else
         if defined(R.authority) then
            T.authority = R.authority;
            T.path      = R.path;
            T.query     = R.query;
         else
            if (R.path == "") then
               if defined(R.query) then
                  T.path  = Base.path;
                  T.query = R.query;
               else
                  -- An empty reference refers to resource owner. As such,
   interpretation of the current document
                  return (current-document, fragment);
               endif;
            else
               if (R.path starts-with "/") then
                  T.path = R.path;
               else
                  T.path = merge(Base.path, R.path);
               endif;
               T.query = R.query;
            endif;
            T.authority = Base.authority;
         endif;
         T.scheme = Base.scheme;
      endif;

      return (T, fragment);

   The pseudocode above refers to a merge routine for merging fragment identifier during a
   relative-path reference with retrieval action
   is performed solely by the path of user agent; the base URI fragment identifier is not
   passed to obtain other systems during the
   target path. process of retrieval. Although there are many ways
   this is often perceived to do this, we will
   describe be a simple method using loss of information, particularly in
   regards to accurate redirection of references as content moves over
   time, it also serves to prevent information providers from denying
   reference authors the right to selectively refer to information
   within a separate string buffer: resource.

   The characters slash ("/") and question-mark ("?") are allowed to
   represent data within the fragment identifier, but such use is
   discouraged for the same reasons as described above for query.























Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 26] 21]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   1.  All but


4. Usage

   When applications make reference to a URI, they do not always use the last segment
   full form of reference defined by the base URI's path component is
       copied "URI" syntax production. In
   order to save space and take advantage of hierarchical locality, many
   Internet protocol elements and media type formats allow an
   abbreviation of a URI, while others restrict the buffer.  In other words, any characters after syntax to a
   particular form of URI.  We define the
       last (right-most) slash character, if any, are excluded. If most common forms of reference
   syntax in this specification because they impact and depend upon the
       base URI's path component is
   design of the empty string, then generic syntax, requiring a single
       slash character ("/") is copied uniform parsing algorithm
   in order to the buffer.

   2. be interpreted consistently.

4.1 URI Reference

   The reference's path component ABNF rule URI-reference is appended used to denote the buffer string.

   3.  All occurrences most common usage
   of "./", where "." is a complete path segment,
       are removed from the buffer string.

   4.  If resource identifier.

      URI-reference = URI / relative-URI

   A URI-reference may be absolute or relative: if the buffer string ends with "." as a complete path segment,
       that "." is removed.

   5.  All occurrences reference
   string's prefix matches the syntax of "<segment>/../", where <segment> a scheme followed by its colon
   separator, then the reference is a complete
       path segment not equal URI rather than a relative-URI.

   A URI-reference is typically parsed first into the five URI
   components, in order to "..", determine what components are removed from present and
   whether the buffer
       string.  Removal of these path segments reference is performed iteratively,
       removing the leftmost matching pattern on relative or absolute, and then each iteration, until
       no matching pattern remains.

   6.  If the buffer string ends
   component is parsed for its subparts and their validation.  The ABNF
   of URI-reference, along with "<segment>/..", where <segment> the "first-match-wins" disambiguation
   rule, is
       a complete path segment not equal sufficient to "..", that "<segment>/.." is
       removed.

   7.  If define a validating parser for the resulting buffer string still begins generic
   syntax.  Readers familiar with one or more
       complete path segments regular expressions should see
   Appendix B for an example of a non-validating URI-reference parser
   that will take any given string and extract the URI components.

4.2 Relative URI

   A relative URI reference takes advantage of "..", then the hier-part syntax
   (Section 3) in order to express a reference that is considered relative to be in error.  Implementations may handle this error by
       retaining these components in the resolved path (i.e., treating
       them as part
   name space of the final URI), another hierarchical URI.

      relative-URI  = hier-part [ "?" query ] [ "#" fragment ]

   The URI referred to by removing them from the
       resolved path (i.e., discarding a relative levels above the root),
       or URI reference is obtained by avoiding traversal of
   applying the reference.

   8.  The remaining buffer string is the target URI's path component.

   Some systems may find it more efficient to implement the merge relative resolution algorithm as a pair of path segment stacks being merged, rather than
   as Section 5.

   A relative reference that begins with two slash characters is termed
   a series of string pattern replacements.

      Note: Some WWW client applications will fail to separate the
      reference's query component from its path component before merging
      the base and network-path reference; such references are rarely used. A relative
   reference paths.  This may result in that begins with a loss of
      information if the query component contains the strings "/../" or
      "/./". single slash character is termed an
   absolute-path reference.  A relative reference that does not begin



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 27] 22]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   The resulting target URI components and fragment can


   with a slash character is termed a relative-path reference.

   A path segment that contains a colon character (e.g., "this:that")
   cannot be recombined used as the first segment of a relative-path reference
   because it might be mistaken for a scheme name.  Such a segment must
   be preceded by a dot-segment (e.g., "./this:that") to
   provide make a
   relative-path reference.

4.3 Absolute URI

   Some protocol elements allow only the absolute form of a URI without
   a fragment identifier.  For example, defining the base URI reference. Using pseudocode,
   this would be:

      result for later
   use by relative references calls for an absolute-URI production that
   does not allow a fragment.

      absolute-URI  = ""

      if defined(T.scheme) then
         append T.scheme to result;
         append scheme ":" to result;
      endif;

      if defined(T.authority) then
         append "//" to result;
         append T.authority to result;
      endif;

      append T.path to result;

      if defined(T.query) then
         append hier-part [ "?" query ]


4.4 Same-document Reference

   When a URI reference occurring within a document or message refers to result;
         append T.query to result;
      endif;

      if defined(fragment) then
         append "#" to result;
         append
   a URI that is, aside from its fragment component (if any), identical
   to result;
      endif;

      return result;

   Note the base URI (Section 5), that we must be careful to preserve reference is called a
   "same-document" reference.  The most frequent examples of
   same-document references are relative references that are empty or
   include only the distinction between crosshatch ("#") separator followed by a
   component fragment
   identifier.

   When a same-document reference is dereferenced for the purpose of a
   retrieval action, the target of that reference is undefined, meaning defined to be
   within that its separator was current document or message; the dereference should not
   present
   result in a new retrieval.

4.5 Suffix Reference

   The URI syntax is designed for unambiguous reference to resources and
   extensibility via the reference, URI scheme.  However, as URI identification and
   usage have become commonplace, traditional media (television, radio,
   newspapers, billboards, etc.) have increasingly used a component that is empty, meaning that suffix of the separator was present
   URI as a reference, consisting of only the authority and was immediately followed by path
   portions of the next
   component separator URI, such as

      www.w3.org/Addressing/

   or simply the end of DNS hostname on its own.  Such references are primarily
   intended for human interpretation rather than machine, with the reference.

   Resolution examples
   assumption that context-based heuristics are provided in Appendix C. sufficient to complete
   the URI (e.g., most hostnames beginning with "www" are likely to have



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 28] 23]

Internet-Draft             URI Generic Syntax                 March                   May 2003


6.


   a URI Normalization and Comparison

   One prefix of the most common operations on URIs "http://").  Although there is simple comparison:
   determining if two URIs are equivalent without using the URIs to
   access their respective resource(s).  A comparison is performed every
   time a response cache is accessed, a browser checks its history to
   color a link, or an XML parser processes tags within a namespace.
   Extensive normalization prior to comparison no standard set of URIs is often used by
   spiders and indexing engines to prune
   heuristics for disambiguating a search space or reduce
   duplication of request actions and response storage. URI comparison is performed in respect suffix, many client
   implementations allow them to some particular purpose,
   and software with differing purposes will often be subject to
   differing design trade-offs in regards to how much effort entered by the user and
   heuristically resolved. It should be
   spent in reducing duplicate identifiers.  This section describes a
   variety of methods noted that such heuristics may be used to compare URIs, the trade-offs
   between them, and the types of applications that might use them.

6.1
   change over time, particularly when new URI Equivalence schemes are introduced.

   Since URIs exist to identify resources, presumably they should be
   considered equivalent when they identify a URI suffix has the same resource.  However,
   such syntax as a definition of equivalence is not relative path reference,
   a suffix reference cannot be used in contexts where relative URIs are
   expected. This limits use of much practical use, since suffix references to those places where
   there is no way for software defined base URI, such as dialog boxes and off-line
   advertisements.








































Berners-Lee, et al.    Expires November 21, 2003               [Page 24]

Internet-Draft             URI Generic Syntax                   May 2003


5. Relative Resolution

   It is often the case that a group or "tree" of documents has been
   constructed to compare two resources without
   knowledge of their origin.  For this reason, determination of
   equivalence or difference serve a common purpose; the vast majority of URIs is based on string comparison,
   perhaps augmented by reference to additional rules provided by URI
   scheme definitions. We use the terms "different" and "equivalent" in
   these documents point to
   describe resources within the possible outcomes tree rather than
   outside of such comparisons, but there it.  Similarly, documents located at a particular site are
   many application-dependent versions of equivalence.

   Even though it is possible
   much more likely to determine that two URIs are equivalent,
   it is never possible refer to be sure other resources at that two site than to
   resources at remote sites.

   Relative referencing of URIs identify different
   resources. Therefore, comparison methods are designed allows document trees to minimize
   false negatives while strictly avoiding false positives.

   In testing for equivalence, be partially
   independent of their location and access scheme.  For instance, it is generally unwise
   possible for a single set of hypertext documents to directly
   compare relative URI references; they should be converted simultaneously
   accessible and traversable via each of the "file", "http", and "ftp"
   schemes if the documents refer to their
   absolute forms before comparison. each other using relative URIs.
   Furthermore, when URI references
   are being compared for the purpose of selecting (or avoiding) a
   network action, such document trees can be moved, as retrieval of a representation, it whole, without
   changing any of the relative references.  Experience within the WWW
   has demonstrated that the ability to perform relative referencing is often
   necessary to separate fragment identifiers from for the URIs prior to
   comparison.

6.2 Comparison Ladder

   A variety long-term usability of methods are used in practice to test URI equivalence.
   These methods fall into embedded URIs.

5.1 Establishing a range, distinguished by the amount of



Berners-Lee, et al.    Expires September 1, 2003               [Page 29]

Internet-Draft Base URI Generic Syntax                 March 2003


   processing required and

   The term "relative URI" implies that there exists some absolute "base
   URI" against which the degree relative reference is applied.  Indeed, the
   base URI is necessary to which define the probability semantics of false
   negatives any relative URI
   reference; without it, a relative reference is reduced.  As noted above, false negatives cannot in
   principle be eliminated. meaningless.  In practice, their probability can be
   reduced, but this reduction requires more processing and is not
   cost-effective order
   for all applications.

   If this range of comparison practices is considered as relative URI to be usable within a ladder, document, the
   following discussion will climb base URI of that
   document must be known to the ladder, starting with those parser.

   A document that
   are cheap but contains relative references must have a relatively higher chance of producing false
   negatives, and proceeding to those base URI
   that have higher computational
   cost and lower risk of false negatives.

6.2.1 Simple String Comparison

   If two URIs, considered as character strings, are identical, then it
   is safe to conclude contains a hierarchical path component.  In other words, a
   relative-URI cannot be used within a document that they are equivalent.  This type of
   equivalence test has very low computational cost an unsuitable
   base URI. Some URI schemes do not allow a hierarchical path component
   and are thus restricted to full URI references.

   An authority component is in wide use
   in not required for a variety of applications, particularly in the domain of parsing.

   Testing strings for equivalence requires some basic precautions. This
   procedure is often referred URI scheme to as "bit-for-bit" or "byte-for-byte"
   comparison, which is potentially misleading.  Testing of strings for
   equality is normally based on pairwise comparison of the characters
   that make up the strings, starting from the first and proceeding
   until both strings are exhausted and all characters found to be
   equal, or a pair of characters compares unequal or one use
   of the strings
   is exhausted before the other.

   Such character comparisons require relative references.  A base URI without an authority component
   implies that each pair of characters any relative reference will also be
   put in comparable form.  For example, should one without an authority
   component.

   The base URI be stored in of a
   byte array in EBCDIC encoding, and the second document can be established in a Java String
   object, bit-for-bit comparisons applied naively will produce both
   false-positive and false-negative errors.  Thus, one of four ways,
   listed below in principle, it is
   better to speak order of equality on a character-for-character rather than
   byte-for-byte or bit-for-bit basis.

   Unicode defines a character as being identified by number
   ("codepoint") with an associated bundle precedence.  The order of precedence can be
   thought of visual and other
   semantics. At the software level, it is not practical to compare
   semantic bundles, so in practical terms, character-by-character
   comparisons are done codepoint-by-codepoint. terms of layers, where the innermost defined base URI
   has the highest precedence.  This can be visualized graphically as:

      .----------------------------------------------------------.
      |  .----------------------------------------------------.  |
      |  |  .----------------------------------------------.  |  |
      |  |  |  .----------------------------------------.  |  |  |



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 30] 25]

Internet-Draft             URI Generic Syntax                 March                   May 2003


6.2.2 Syntax-based Normalization

   Software may use logic based on the definitions provided by this
   specification to reduce the probability of false negatives.  Such
   processing is (moderately) higher


      |  |  |  |  .----------------------------------.  |  |  |  |
      |  |  |  |  |       <relative-reference>       |  |  |  |  |
      |  |  |  |  `----------------------------------'  |  |  |  |
      |  |  |  | (5.1.1) Base URI embedded in cost than
   character-for-character string comparison.  For example, an
   application using this approach could reasonably consider the
   following two URIs equivalent:

      example://a/b/c/%7A
      eXAMPLE://a/./b/../b/c/%7a

   Web user agents, such as browsers, typically apply this type       |  |  |  |
      |  |  |  |         document's content             |  |  |  |
      |  |  |  `----------------------------------------'  |  |  |
      |  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
      |  |  |         (message, document, or none).        |  |  |
      |  |  `----------------------------------------------'  |  |
      |  | (5.1.3) URI used to retrieve the entity            |  |
      |  `----------------------------------------------------'  |
      | (5.1.4) Default Base URI
   normalization when determining whether a cached response is
   available. Syntax-based normalization includes such techniques as
   case normalization, escape normalization, and removal of leftover
   relative path segments.

6.2.2.1 Case Normalization

   When a application-dependent        |
      `----------------------------------------------------------'


5.1.1 Base URI within Document Content

   Within certain document media types, the base URI scheme uses elements of the common syntax, it will also
   use document can
   be embedded within the common syntax equivalence rules, namely content itself such that the scheme and
   hostname are case insensitive and therefore it can be normailized to
   lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
   equivalent to <http://www.example.com/>.

6.2.2.2 Escape Normalization

   The %-escape mechanism described in Section 2.4 is readily
   obtained by a frequent source parser.  This can be useful for descriptive documents,
   such as tables of variance among otherwise identical URIs.  One cause content, which may be transmitted to others through
   protocols other than their usual retrieval context (e.g., E-Mail or
   USENET news).

   It is beyond the choice scope of upper-case or lower-case letters for the hexadecimal digits within
   the escape sequence (e.g., "%3a" versus "%3A").  Such sequences are
   always equivalent; this document to specify how, for each
   media type, the sake of uniformity, base URI generators and
   normalizers are strongly encouraged can be embedded.  It is assumed that user
   agents manipulating such media types will be able to use upper-case letters for obtain the
   hex digits A-F.

   Only characters that are excluded
   appropriate syntax from or reserved within that media type's specification.  An example
   of how the base URI
   syntax must can be escaped when used as data.  However, some embedded in the Hypertext Markup Language
   (HTML) [HTML] is provided in Appendix D.

   A mechanism for embedding the base URI
   generators go beyond that within MIME container types
   (e.g., the message and escape characters multipart types) is defined by MHTML
   [RFC2110].  Protocols that do not require
   escaping, resulting in URIs that are equivalent use the MIME message header syntax,
   but do allow some form of tagged metadata to their unescaped
   counterparts. Such URIs can be normalized by unescaping sequences
   that represent included within
   messages, may define their own syntax for defining the unreserved characters, base URI as described in Section
   2.3.

6.2.2.3 Path Segment Normalization

   The complete path segments "." and ".." have
   part of a special meaning within
   hierarchical message.

5.1.2 Base URI schemes.  As such, they should not appear from the Encapsulating Entity

   If no base URI is embedded, the base URI of a document is defined by
   the document's retrieval context.  For a document that is enclosed
   within another entity (such as a message or another document), the
   retrieval context is that entity; thus, the default base URI of the
   document is the base URI of the entity in which the document is
   encapsulated.





Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 31] 26]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   absolute


5.1.3 Base URI paths; if they are found, they can be removed by
   splitting from the Retrieval URI just after the "/" that starts

   If no base URI is embedded and the path, using document is not encapsulated
   within some other entity (e.g., the
   left half as top level of a composite entity),
   then, if a URI was used to retrieve the base document, that URI and shall
   be considered the right as base URI.  Note that if the retrieval was the
   result of a relative reference, and
   normalizing redirected request, the last URI by merging the two in used (i.e., that which
   resulted in accordance with the
   relative actual retrieval of the document) is the base URI.

5.1.4 Default Base URI processing algorithm (Section 5).

6.2.3 Scheme-based Normalization

   The syntax and semantics

   If none of URIs vary from scheme to scheme, as the conditions described in above apply, then the base URI
   is defined by the defining specification for each scheme.  Software
   may use scheme-specific rules, at further processing cost, context of the application. Since this definition
   is necessarily application-dependent, failing to reduce define the probability base URI
   using one of false negatives.  For example, Web spiders that
   populate most large search engines would consider the following two
   URIs to be equivalent:

      http://example.com/
      http://example.com:80/

   This behavior is based on other methods may result in the rules provided same content being
   interpreted differently by different types of application.

   It is the syntax and
   semantics responsibility of the "http" distributor(s) of a document
   containing a relative URI scheme, which defines an empty port
   component as being equivalent to ensure that the default TCP port for HTTP (port
   80).  In general, a base URI scheme for that uses the generic syntax of
   hostport is defined such
   document can be established.  It must be emphasized that a relative
   URI with an explicit ":port", cannot be used reliably in situations where the port document's base
   URI is not well-defined.

5.2 Obtaining the default Referenced URI

   This section describes an example algorithm for the scheme, is equivalent resolving URI
   references that might be relative to one where
   the port a given base URI.  The algorithm
   is elided.

6.2.4 Protocol-based Normalization

   Web spiders, for which substantial effort intended to reduce provide a definitive result that can be used to test
   the incidence output of
   false negatives other implementations.  Implementation of the algorithm
   itself is often cost-effective, are observed to implement
   even more aggressive techniques in URI comparison.  For example, if
   they observe not required, but the result given by an implementation
   must match the result that a would be given by this algorithm.

   The base URI such as

      http://example.com/data

   redirects (Base) is established according to

      http://example.com/data/

   they will likely regard the two as equivalent rules of Section
   5.1 and parsed into the five main components described in Section 3.
   Note that only the future.
   Obviously, this kind of technique scheme component is only appropriate required to be present in special
   situations.

6.3 Good Practice When Using URIs

   It the
   base URI; the other components may be empty or undefined.  A
   component is undefined if its preceding separator does not appear in
   the best interests of everyone to avoid false-negatives in
   comparing URIs, and to only require URI reference; the minimum amount of software
   processing for such comparisons.  Those who generate and make



Berners-Lee, et al.    Expires September 1, 2003               [Page 32]

Internet-Draft path component is never undefined, though it
   may be empty.

   For each URI Generic Syntax                 March 2003 reference to URIs can reduce the cost of processing and (R), the risk of
   false negatives by consistently providing them in a form that following pseudocode describes an
   algorithm for transforming R into its target URI (T):

      (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);
         -- The URI reference is
   reasonably canonical with respect to their scheme.  Specifically:

      Always provide parsed into the five URI scheme in lower-case characters.

      Always provide the hostname, components

      if any, in lower-case characters.

      Only perform %-escaping where it is essential.

      Always use upper-case A-through-F characters when %-escaping.

      Use the UTF-8 character-to-octet mapping, whenever possible.

      Prevent /./ ((not validating) and /../ from appearing in absolute URI paths.

   The choices listed above are motivated by observations that (R.scheme == Base.scheme)) then
         -- A non-validating parser may ignore a high
   proportion of deployed software already use these techniques scheme in
   practice for the purposes of normalization.



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 33] 27]

Internet-Draft             URI Generic Syntax                 March                   May 2003


7. Security Considerations

   A URI does not in itself pose a security threat.  However, since URIs
   are often used to provide a compact set of instructions for access to
   network resources, care must be taken


         -- reference if it is identical to properly interpret the data
   within a URI, to prevent that data from causing unintended access,
   and base URI's scheme.
         undefine(R.scheme);
      endif;

      if defined(R.scheme) then
         T.scheme    = R.scheme;
         T.authority = R.authority;
         T.path      = R.path;
         T.query     = R.query;
      else
         if defined(R.authority) then
            T.authority = R.authority;
            T.path      = R.path;
            T.query     = R.query;
         else
            if (R.path == "") then
               T.path = Base.path;
               if defined(R.query) then
                  T.query = R.query;
               else
                  T.query = Base.query;
               endif;
            else
               if (R.path starts-with "/") then
                  T.path = R.path;
               else
                  T.path = merge(Base.path, R.path);
               endif;
               T.query = R.query;
            endif;
            T.authority = Base.authority;
         endif;
         T.scheme = Base.scheme;
      endif;

      T.fragment = R.fragment;

   The pseudocode above refers to avoid including data that should not be revealed in plain
   text.

7.1 Reliability and Consistency

   There is no guarantee that, having once used a given merge routine for merging a
   relative-path reference with the path of the base URI to retrieve
   some information, that obtain the same information
   target path.  Although there are many ways to do this, we will be retievable by
   that URI in
   describe a simple method using a separate string buffer:

   1.  All but the future. Nor last segment of the base URI's path component is there
       copied to the buffer.  In other words, any guarantee that characters after the
   information retrievable via that URI in
       last (right-most) slash character, if any, are excluded. If the future will be observably
   similar to that retrieved in
       base URI's path component is the past.  The URI syntax does not
   constrain how a given scheme or authority apportions its namespace or
   maintains it over time.  Such empty string, then a guarantee can only be obtained from
   the person(s) controlling that namespace and single
       slash character ("/") is copied to the resource in
   question.  A specific buffer.




Berners-Lee, et al.    Expires November 21, 2003               [Page 28]

Internet-Draft             URI scheme may define additional semantics,
   such as name persistence, if those semantics are required of all
   naming authorities for that scheme.

7.2 Malicious Construction

   It Generic Syntax                   May 2003


   2.  The reference's path component is sometimes possible to construct a URI such that an attempt appended to
   perform a seemingly harmless, idempotent operation, such as the
   retrieval buffer string.

   3.  All occurrences of a representation associated with a resource, will in
   fact cause a possibly damaging remote operation to occur.  The unsafe
   URI "./", where "." is typically constructed by specifying a port number other than
   that reserved for complete path segment,
       are removed from the network protocol in question.  The client
   unwittingly contacts buffer string.

   4.  If the buffer string ends with "." as a site complete path segment,
       that "." is in fact running a different
   protocol.  The content removed.

   5.  All occurrences of the URI contains instructions that, when
   interpreted according "<segment>/../", where <segment> is a complete
       path segment not equal to this other protocol, cause an unexpected
   operation.  An example has been "..", are removed from the use buffer
       string.  Removal of these path segments is performed iteratively,
       removing the leftmost matching pattern on each iteration, until
       no matching pattern remains.

   6.  If the buffer string ends with "<segment>/..", where <segment> is
       a gopher URI complete path segment not equal to cause an
   unintended "..", that "<segment>/.." is
       removed.

   7.  If the resulting buffer string still begins with one or impersonating message more
       complete path segments of "..", then the reference is considered
       to be sent via a SMTP server.

   Caution should be used when using any URI that specifies a TCP port
   number other than in error.  Implementations may handle this error by
       removing them from the default for resolved path (i.e., discarding relative
       levels above the protocol, especially when it root) or by avoiding traversal of the reference.

   8.  The remaining buffer string is
   a number within the reserved space.

   Care should be taken when target URI's path component.

   Some systems may find it more efficient to implement the merge
   algorithm as a URI contains escaped delimiters for pair of path segment stacks being merged, rather than
   as a
   given protocol (for example, CR and LF characters for telnet
   protocols) that these are not unescaped series of string pattern replacements.

      Note: Some WWW client applications will fail to separate the
      reference's query component from its path component before transmission. merging
      the base and reference paths.  This
   might violate may result in a loss of
      information if the protocol, but avoids query component contains the potential for such
   characters to strings "/../" or
      "/./".


5.3 Recomposition of a Parsed URI

   Parsed URI components can be used recombined to simulate an extra operation or parameter in
   that protocol, which might lead obtain the referenced URI.
   Using pseudocode, this would be:

      result = ""

      if defined(T.scheme) then
         append T.scheme to an unexpected and possibly harmful
   remote operation being performed. result;
         append ":" to result;
      endif;



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 34] 29]

Internet-Draft             URI Generic Syntax                 March                   May 2003


7.3 Rare IP Address Formats

   Although the URI syntax for IPv4address only allows the common,
   dotted-decimal form of IPv4 address literal, many implementations
   that process URIs make use of platform-dependent system routines,
   such as gethostbyname() and inet_aton(),


      if defined(T.authority) then
         append "//" to translate the string
   literal result;
         append T.authority to an actual IP address.  Unfortunately, such system routines
   often allow and process a much larger set of formats than those
   described in Section 3.2.2.

   For example, many implementations allow dotted forms of three
   numbers, wherein the last part is interpreted as a 16-bit quantity
   and placed in the right-most two bytes of result;
      endif;

      append T.path to result;

      if defined(T.query) then
         append "?" to result;
         append T.query to result;
      endif;

      if defined(fragment) then
         append "#" to result;
         append fragment to result;
      endif;

      return result;

   Note that we are careful to preserve the network address (e.g.,
   a Class B network). Likewise, distinction between a dotted form of two numbers means the
   last part
   component that is interpreted as a 24-bit quantity and placed undefined, meaning that its separator was not
   present in the right
   most three bytes of the network address (Class A), reference, and a single
   number (without dots) component that is interpreted as a 32-bit quantity and stored
   directly in empty, meaning that
   the network address.  Adding further to separator was present and was immediately followed by the confusion,
   some implementations allow each dotted part to be interpreted as
   decimal, octal, next
   component separator or hexadecimal, as specified in the C language (i.e., end of the reference.

5.4 Examples of Relative Resolution

   Within an object with a leading 0x or 0X implies hexadecimal; otherwise, well-defined base URI of

      http://a/b/c/d;p?q

   a leading 0
   implies octal; otherwise, the number is interpreted as decimal).

   These additional IP address formats are not allowed in the relative URI syntax
   due to differences between platform implementations.  However, they
   can become a security concern if an application attempts to filter
   access to resources based on the IP address in string literal format.
   If such filtering is performed, it is recommended that literals be
   converted to numeric form and filtered based on the numeric value,
   rather than a prefix or suffix of the string form.

7.4 Sensitive Information

   It is clearly unwise to use a URI that contains a password which is
   intended to be secret. In particular, the use of a password within
   the userinfo component of a URI is strongly discouraged except in
   those rare cases where the 'password' parameter is intended to reference would be
   public. resolved as follows:

5.4.1 Normal Examples

      "g:h"           =  "g:h"
      "g"             =  "http://a/b/c/g"
      "./g"           =  "http://a/b/c/g"
      "g/"            =  "http://a/b/c/g/"
      "/g"            =  "http://a/g"
      "//g"           =  "http://g"
      "?y"            =  "http://a/b/c/d;p?y"
      "g?y"           =  "http://a/b/c/g?y"
      "#s"            =  "http://a/b/c/d;p?q#s"
      "g#s"           =  "http://a/b/c/g#s"
      "g?y#s"         =  "http://a/b/c/g?y#s"
      ";x"            =  "http://a/b/c/;x"
      "g;x"           =  "http://a/b/c/g;x"



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 35] 30]

Internet-Draft             URI Generic Syntax                 March                   May 2003


7.5 Semantic Attacks

   Because


      "g;x?y#s"       =  "http://a/b/c/g;x?y#s"
      "."             =  "http://a/b/c/"
      "./"            =  "http://a/b/c/"
      ".."            =  "http://a/b/"
      "../"           =  "http://a/b/"
      "../g"          =  "http://a/b/g"
      "../.."         =  "http://a/"
      "../../"        =  "http://a/"
      "../../g"       =  "http://a/g"


5.4.2 Abnormal Examples

   Although the userinfo component is rarely used and appears before following abnormal examples are unlikely to occur in
   normal practice, all URI parsers should be capable of resolving them
   consistently.  Each example uses the
   hostname same base as above.

   An empty reference refers to the current base URI.

      ""              =  "http://a/b/c/d;p?q"

   Parsers must be careful in handling the authority component, it can case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URI's path.  Note that the ".." syntax cannot be used to construct a
   URI that is intended to mislead change
   the authority component of a human user by appearing to identify
   one (trusted) naming authority while actually identifying a different
   authority hidden behind the noise.  For example

      http://www.example.com&story=breaking_news@10.0.0.1/top_story.htm

   might lead a human user to assume that the authority is
   'www.example.com', whereas it is actually '10.0.0.1'.  Note that the
   misleading userinfo could be much longer than the example above.

   A misleading URI, such as the one above, is an attack on the user's
   preconceived notions about URI.

      "../../../g"    =  "http://a/g"
      "../../../../g" =  "http://a/g"

   Similarly, parsers should remove the meaning dot-segments "." and ".." when
   they are complete components of a URI, rather than an
   attack on the software itself.  User agents may be able to reduce the
   impact of such attacks by visually distinguishing the various
   components path, but not when they are only
   part of a segment.

      "/./g"          =  "http://a/g"
      "/../g"         =  "http://a/g"
      "g."            =  "http://a/b/c/g."
      ".g"            =  "http://a/b/c/.g"
      "g.."           =  "http://a/b/c/g.."
      "..g"           =  "http://a/b/c/..g"

   Less likely are cases where the relative URI when rendered, such as by using a different
   color uses unnecessary or tone to render userinfo if any is present, though there is
   no general panacea. More information on URI-based semantic attacks
   can be found in [Siedzik].
   nonsensical forms of the "." and ".." complete path segments.

      "./../g"        =  "http://a/b/g"
      "./g/."         =  "http://a/b/c/g/"
      "g/./h"         =  "http://a/b/c/g/h"
      "g/../h"        =  "http://a/b/c/h"
      "g;x=1/./y"     =  "http://a/b/c/g;x=1/y"



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 36] 31]

Internet-Draft             URI Generic Syntax                 March                   May 2003


8. Acknowledgements


      "g;x=1/../y"    =  "http://a/b/c/y"

   Some applications fail to separate the reference's query and/or
   fragment components from a relative path before merging it with the
   base path.  This document error is derived from RFC 2396 [RFC2396], RFC 1808 [RFC1808],
   and RFC 1738 [RFC1738]; rarely noticed, since typical usage of a
   fragment never includes the acknowledgements in those specifications
   still apply. It also incorporates hierarchy ("/") character, and the update (with corrections) for
   IPv6 literals in query
   component is not normally used within relative references.

      "g?y/./x"       =  "http://a/b/c/g?y/./x"
      "g?y/../x"      =  "http://a/b/c/g?y/../x"
      "g#s/./x"       =  "http://a/b/c/g#s/./x"
      "g#s/../x"      =  "http://a/b/c/g#s/../x"

   Some parsers allow the host syntax, scheme name to be present in a relative URI if
   it is the same as defined by Robert M. Hinden,
   Brian E. Carpenter, and Larry Masinter the base URI scheme.  This is considered to be a
   loophole in [RFC2732]. In addition,
   contributions by Reese Anschultz, Tim Bray, Dan Connolly, Adam M.
   Costello, Jason Diamond, Martin Duerst, Henry Holtzman, Graham Klyne,
   Dan Kohn, Bruce Lilly, Michael Mealling, Julian Reschke, Tomas
   Rokicki, Miles Sabin, Ronald Tschalaer, Marc Warne, Henry Zongaro,
   and Zefram are gratefully acknowledged.







































Berners-Lee, et al.    Expires September 1, 2003               [Page 37]

Internet-Draft prior specifications of partial URI Generic Syntax                 March 2003


Normative References

   [ASCII]    American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code [RFC1630]. Its use
   should be avoided, but is allowed for Information
              Interchange", ANSI X3.4, 1986.

   [RFC2234]  Crocker, D. and P. Overell, "Augmented BNF backward compatibility.

      "http:g"        =  "http:g"         ; for Syntax
              Specifications: ABNF", RFC 2234, November 1997. validating parsers
                      /  "http://a/b/c/g" ; for backward compatibility































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 38] 32]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Non-normative References

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC1630]  Berners-Lee, T., "Universal Resource Identifiers in WWW: A
              Unifying Syntax for the Expression of Names


6. Normalization and Addresses Comparison

   One of Objects on the Network as used in most common operations on URIs is simple comparison:
   determining if two URIs are equivalent without using the World-Wide Web",
              RFC 1630, June 1994.

   [RFC1738]  Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform
              Resource Locators (URL)", RFC 1738, December 1994.

   [RFC2396]  Berners-Lee, T., Fielding, R. URIs to
   access their respective resource(s).  A comparison is performed every
   time a response cache is accessed, a browser checks its history to
   color a link, or an XML parser processes tags within a namespace.
   Extensive normalization prior to comparison of URIs is often used by
   spiders and L. Masinter, "Uniform
              Resource Identifiers (URI): Generic Syntax", RFC 2396,
              August 1998.

   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application indexing engines to prune a search space or reduce
   duplication of request actions and Support", STD 3, RFC 1123, October 1989.

   [RFC1808]  Fielding, R., "Relative Uniform Resource Locators", RFC
              1808, June 1995.

   [RFC2046]  Freed, N. response storage.

   URI comparison is performed in respect to some particular purpose,
   and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part Two: Media Types", RFC 2046,
              November 1996.

   [RFC2518]  Goland, Y., Whitehead, E., Faizi, A., Carter, S. software with differing purposes will often be subject to
   differing design trade-offs in regards to how much effort should be
   spent in reducing duplicate identifiers.  This section describes a
   variety of methods that may be used to compare URIs, the trade-offs
   between them, and D.
              Jensen, "HTTP Extensions the types of applications that might use them.

6.1 Equivalence

   Since URIs exist to identify resources, presumably they should be
   considered equivalent when they identify the same resource.  However,
   such a definition of equivalence is not of much practical use, since
   there is no way for Distributed Authoring --
              WEBDAV", RFC 2518, February 1999.

   [RFC0952]  Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet
              host table specification", RFC 952, October 1985.

   [RFC2373]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 2373, July 1998.

   [RFC2732]  Hinden, R., Carpenter, B. software to compare two resources without
   knowledge of their origin.  For this reason, determination of
   equivalence or difference of URIs is based on string comparison,
   perhaps augmented by reference to additional rules provided by URI
   scheme definitions. We use the terms "different" and L. Masinter, "Format for
              Literal IPv6 Addresses in URL's", RFC 2732, December 1999.

   [RFC1736]  Kunze, J., "Functional Recommendations "equivalent" to
   describe the possible outcomes of such comparisons, but there are
   many application-dependent versions of equivalence.

   Even though it is possible to determine that two URIs are equivalent,
   it is never possible to be sure that two URIs identify different
   resources. Therefore, comparison methods are designed to minimize
   false negatives while strictly avoiding false positives.

   In testing for Internet
              Resource Locators", RFC 1736, February 1995.

   [RFC1737]  Masinter, L. and K. Sollins, "Functional Requirements equivalence, it is generally unwise to directly
   compare relative URI references; they should be converted to their
   absolute forms before comparison.  Furthermore, when URI references
   are being compared for
              Uniform Resource Names", RFC 1737, December 1994.

   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, the purpose of selecting (or avoiding) a
   network action, such as retrieval of a representation, it is often
   necessary to remove fragment identifiers from the URIs prior to
   comparison.

6.2 Comparison Ladder

   A variety of methods are used in practice to test URI equivalence.
   These methods fall into a range, distinguished by the amount of



Berners-Lee, et al.    Expires November 1987. 21, 2003               [Page 33]

Internet-Draft             URI Generic Syntax                   May 2003


   processing required and the degree to which the probability of false
   negatives is reduced.  As noted above, false negatives cannot in
   principle be eliminated.  In practice, their probability can be
   reduced, but this reduction requires more processing and is not
   cost-effective for all applications.

   If this range of comparison practices is considered as a ladder, the
   following discussion will climb the ladder, starting with those that
   are cheap but have a relatively higher chance of producing false
   negatives, and proceeding to those that have higher computational
   cost and lower risk of false negatives.

6.2.1 Simple String Comparison

   If two URIs, considered as character strings, are identical, then it
   is safe to conclude that they are equivalent.  This type of
   equivalence test has very low computational cost and is in wide use
   in a variety of applications, particularly in the domain of parsing.

   Testing strings for equivalence requires some basic precautions. This
   procedure is often referred to as "bit-for-bit" or "byte-for-byte"
   comparison, which is potentially misleading.  Testing of strings for
   equality is normally based on pairwise comparison of the characters
   that make up the strings, starting from the first and proceeding
   until both strings are exhausted and all characters found to be
   equal, a pair of characters compares unequal, or one of the strings
   is exhausted before the other.

   Such character comparisons require that each pair of characters be
   put in comparable form.  For example, should one URI be stored in a
   byte array in EBCDIC encoding, and the second be in a Java String
   object, bit-for-bit comparisons applied naively will produce both
   false-positive and false-negative errors.  Thus, in principle, it is
   better to speak of equality on a character-for-character rather than
   byte-for-byte or bit-for-bit basis.

   Unicode defines a character as being identified by number
   ("codepoint") with an associated bundle of visual and other
   semantics. At the software level, it is not practical to compare
   semantic bundles, so in practical terms, character-by-character
   comparisons are done codepoint-by-codepoint.










Berners-Lee, et al.    Expires November 21, 2003               [Page 34]

Internet-Draft             URI Generic Syntax                   May 2003


6.2.2 Syntax-based Normalization

   Software may use logic based on the definitions provided by this
   specification to reduce the probability of false negatives.  Such
   processing is moderately higher in cost than character-for-character
   string comparison.  For example, an application using this approach
   could reasonably consider the following two URIs equivalent:

      example://a/b/c/%7A
      eXAMPLE://a/./b/../b/c/%7a

   Web user agents, such as browsers, typically apply this type of URI
   normalization when determining whether a cached response is
   available. Syntax-based normalization includes such techniques as
   case normalization, escape normalization, and removal of leftover
   relative path segments.

6.2.2.1 Case Normalization

   When a URI scheme uses components of the generic syntax, it will also
   use the common syntax equivalence rules, namely that the scheme and
   hostname are case insensitive and therefore can be normalized to
   lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
   equivalent to <http://www.example.com/>.

6.2.2.2 Escape Normalization

   The percent-escape mechanism described in Section 2.4 is a frequent
   source of variance among otherwise identical URIs. One cause is the
   choice of uppercase or lowercase letters for the hexadecimal digits
   within the escape sequence (e.g., "%3a" versus "%3A"). Such sequences
   are always equivalent; for the sake of uniformity, URI generators and
   normalizers are strongly encouraged to use uppercase letters for the
   hex digits A-F.

   Only characters that are excluded from or reserved within the URI
   syntax must be escaped when used as data.  However, some URI
   generators go beyond that and escape characters that do not require
   escaping, resulting in URIs that are equivalent to their unescaped
   counterparts. Such URIs can be normalized by unescaping sequences
   that represent the unreserved characters, as described in Section
   2.3.

6.2.2.3 Path Segment Normalization

   The complete path segments "." and ".." have a special meaning within
   hierarchical URI schemes.  As such, they should not appear in
   absolute URI paths; if they are found, they can be removed by



Berners-Lee, et al.    Expires November 21, 2003               [Page 35]

Internet-Draft             URI Generic Syntax                   May 2003


   splitting the URI just after the "/" that starts the path, using the
   left half as the base URI and the right as a relative reference, and
   normalizing the URI by merging the two in in accordance with the
   relative URI processing algorithm (Section 5).

6.2.3 Scheme-based Normalization

   The syntax and semantics of URIs vary from scheme to scheme, as
   described by the defining specification for each scheme.  Software
   may use scheme-specific rules, at further processing cost, to reduce
   the probability of false negatives.  For example, Web spiders that
   populate most large search engines would consider the following two
   URIs to be equivalent:

      http://example.com/
      http://example.com:80/

   This behavior is based on the rules provided by the syntax and
   semantics of the "http" URI scheme, which defines an empty port
   component as being equivalent to the default TCP port for HTTP (port
   80).  In general, a URI scheme that uses the generic syntax for
   authority is defined such that a URI with an explicit ":port", where
   the port is the default for the scheme, is equivalent to one where
   the port is elided.

6.2.4 Protocol-based Normalization

   Web spiders, for which substantial effort to reduce the incidence of
   false negatives is often cost-effective, are observed to implement
   even more aggressive techniques in URI comparison.  For example, if
   they observe that a URI such as

      http://example.com/data

   redirects to

      http://example.com/data/

   they will likely regard the two as equivalent in the future.
   Obviously, this kind of technique is only appropriate in special
   situations.

6.3 Canonical Form

   It is in the best interests of everyone to avoid false-negatives in
   comparing URIs and to minimize the amount of software processing for
   such comparisons.  Those who generate and make reference to URIs can
   reduce the cost of processing and the risk of false negatives by



Berners-Lee, et al.    Expires November 21, 2003               [Page 36]

Internet-Draft             URI Generic Syntax                   May 2003


   consistently providing them in a form that is reasonably canonical
   with respect to their scheme.  Specifically:

      Always provide the URI scheme in lowercase characters.

      Always provide the hostname, if any, in lowercase characters.

      Only perform percent-escaping where it is essential.

      Always use uppercase A-through-F characters when percent-escaping.

      Prevent /./ and /../ from appearing in non-relative URI paths.

   The good practices listed above are motivated by observations that a
   high proportion of deployed software use these techniques for the
   purposes of normalization.



































Berners-Lee, et al.    Expires November 21, 2003               [Page 37]

Internet-Draft             URI Generic Syntax                   May 2003


7. Security Considerations

   A URI does not in itself pose a security threat.  However, since URIs
   are often used to provide a compact set of instructions for access to
   network resources, care must be taken to properly interpret the data
   within a URI, to prevent that data from causing unintended access,
   and to avoid including data that should not be revealed in plain
   text.

7.1 Reliability and Consistency

   There is no guarantee that, having once used a given URI to retrieve
   some information, that the same information will be retrievable by
   that URI in the future. Nor is there any guarantee that the
   information retrievable via that URI in the future will be observably
   similar to that retrieved in the past.  The URI syntax does not
   constrain how a given scheme or authority apportions its name space
   or maintains it over time.  Such a guarantee can only be obtained
   from the person(s) controlling that name space and the resource in
   question.  A specific URI scheme may define additional semantics,
   such as name persistence, if those semantics are required of all
   naming authorities for that scheme.

7.2 Malicious Construction

   It is sometimes possible to construct a URI such that an attempt to
   perform a seemingly harmless, idempotent operation, such as the
   retrieval of a representation, will in fact cause a possibly damaging
   remote operation to occur.  The unsafe URI is typically constructed
   by specifying a port number other than that reserved for the network
   protocol in question.  The client unwittingly contacts a site that is
   running a different protocol service.  The content of the URI
   contains instructions that, when interpreted according to this other
   protocol, cause an unexpected operation.  An example has been the use
   of a gopher URI to cause an unintended or impersonating message to be
   sent via a SMTP server.

   Caution should be used when dereferencing a URI that specifies a TCP
   port number other than the default for the scheme, especially when it
   is a number within the reserved space.

   Care should be taken when a URI contains escaped delimiters for a
   given protocol (for example, CR and LF characters for telnet
   protocols) that these octets are not unescaped before transmission.
   This might violate the protocol, but avoids the potential for such
   characters to be used to simulate an extra operation or parameter in
   that protocol which might lead to an unexpected and possibly harmful
   remote operation being performed.



Berners-Lee, et al.    Expires November 21, 2003               [Page 38]

Internet-Draft             URI Generic Syntax                   May 2003


7.3 Rare IP Address Formats

   Although the URI syntax for IPv4address only allows the common,
   dotted-decimal form of IPv4 address literal, many implementations
   that process URIs make use of platform-dependent system routines,
   such as gethostbyname() and inet_aton(), to translate the string
   literal to an actual IP address.  Unfortunately, such system routines
   often allow and process a much larger set of formats than those
   described in Section 3.2.2.

   For example, many implementations allow dotted forms of three
   numbers, wherein the last part is interpreted as a 16-bit quantity
   and placed in the right-most two bytes of the network address (e.g.,
   a Class B network). Likewise, a dotted form of two numbers means the
   last part is interpreted as a 24-bit quantity and placed in the right
   most three bytes of the network address (Class A), and a single
   number (without dots) is interpreted as a 32-bit quantity and stored
   directly in the network address.  Adding further to the confusion,
   some implementations allow each dotted part to be interpreted as
   decimal, octal, or hexadecimal, as specified in the C language (i.e.,
   a leading 0x or 0X implies hexadecimal; otherwise, a leading 0
   implies octal; otherwise, the number is interpreted as decimal).

   These additional IP address formats are not allowed in the URI syntax
   due to differences between platform implementations.  However, they
   can become a security concern if an application attempts to filter
   access to resources based on the IP address in string literal format.
   If such filtering is performed, it is recommended that literals be
   converted to numeric form and filtered based on the numeric value,
   rather than a prefix or suffix of the string form.

7.4 Sensitive Information

   It is clearly unwise to use a URI that contains a password which is
   intended to be secret. In particular, the use of a password within
   the userinfo component of a URI is strongly discouraged except in
   those rare cases where the 'password' parameter is intended to be
   public.

7.5 Semantic Attacks

   Because the userinfo component is rarely used and appears before the
   hostname in the authority component, it can be used to construct a
   URI that is intended to mislead a human user by appearing to identify
   one (trusted) naming authority while actually identifying a different
   authority hidden behind the noise.  For example

      http://www.example.com&story=breaking_news@10.0.0.1/top_story.htm



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 39]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   [RFC2110]  Palme, J. and A. Hopmann, "MIME E-mail Encapsulation of
              Aggregate Documents,


   might lead a human user to assume that the host is 'www.example.com',
   whereas it is actually '10.0.0.1'.  Note that the misleading userinfo
   could be much longer than the example above.

   A misleading URI, such as HTML (MHTML)", RFC 2110,
              March 1997.

   [RFC2717]  Petke, R. and I. King, "Registration Procedures for URL
              Scheme Names", BCP 35, RFC 2717, November 1999.

   [HTML]     Raggett, D., Le Hors, A. and I. Jacobs, "Hypertext Markup
              Language (HTML 4.01) Specification", December 1999.

   [Siedzik]  Siedzik, R., "Semantic Attacks: What's in a URL?", April
              2001.

   [UTF-8]    Yergeau, F., "UTF-8, the one above, is an attack on the user's
   preconceived notions about the meaning of a transformation format URI, rather than an
   attack on the software itself.  User agents may be able to reduce the
   impact of ISO
              10646", RFC 2279, January 1998.


Authors' Addresses

   Tim Berners-Lee
   World Wide Web Consortium
   MIT/LCS, Room NE43-356
   200 Technology Square
   Cambridge, MA  02139
   USA

   Phone: +1-617-253-5702
   Fax:   +1-617-258-5999
   EMail: timbl@w3.org
   URI:   http://www.w3.org/People/Berners-Lee/


   Roy T. Fielding
   Day Software
   2 Corporate Plaza, Suite 150
   Newport Beach, CA  92660
   USA

   Phone: +1-949-999-2523
   Fax:   +1-949-644-5064
   EMail: roy.fielding@day.com
   URI:   http://www.apache.org/~fielding/ such attacks by visually distinguishing the various
   components of the URI when rendered, such as by using a different
   color or tone to render userinfo if any is present, though there is
   no general panacea. More information on URI-based semantic attacks
   can be found in [Siedzik].







































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 40]

Internet-Draft             URI Generic Syntax                 March                   May 2003


8. Acknowledgments

   This document is derived from RFC 2396 [RFC2396], RFC 1808 [RFC1808],
   and RFC 1738 [RFC1738]; the acknowledgments in those specifications
   still apply. It also incorporates the update (with corrections) for
   IPv6 literals in the host syntax, as defined by Robert M. Hinden,
   Brian E. Carpenter, and Larry Masinter
   Adobe Systems Incorporated
   345 Park Ave
   San Jose, CA  95110
   USA

   Phone: +1-408-536-3024
   EMail: LMM@acm.org
   URI:   http://larry.masinter.net/ in [RFC2732]. In addition,
   contributions by Reese Anschultz, Tim Bray, Rob Cameron, Dan
   Connolly, Adam M. Costello, Jason Diamond, Martin Duerst, Stefan
   Eissing, Clive D.W. Feather, Pat Hayes, Henry Holtzman, Graham Klyne,
   Dan Kohn, Bruce Lilly, Andrew Main, Michael Mealling, Julian Reschke,
   Tomas Rokicki, Miles Sabin, Ronald Tschalaer, Marc Warne, Stuart
   Williams, and Henry Zongaro are gratefully acknowledged.






































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 41]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix A. Collected


Normative References

   [ASCII]    American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

   [RFC2234]  Crocker, D. and P. Overell, "Augmented BNF for URI

   To be filled-in later. Syntax
              Specifications: ABNF", RFC 2234, November 1997.











































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 42]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix B. Parsing a URI Reference with a Regular Expression

   As described in Section 4.3, the generic URI syntax is not sufficient
   to disambiguate the components of some forms of URI.  Since the
   "greedy algorithm" described in that section is identical to the
   disambiguation method used by POSIX regular expressions, it is
   natural


Informative References

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and commonplace to use a regular expression for parsing the
   potential four components
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC1630]  Berners-Lee, T., "Universal Resource Identifiers in WWW: A
              Unifying Syntax for the Expression of Names and fragment identifier Addresses
              of a URI reference.

   The following line is Objects on the regular expression for breaking-down a URI
   reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9

   The numbers Network as used in the second line above are only to assist readability;
   they indicate the reference points World-Wide Web",
              RFC 1630, June 1994.

   [RFC1738]  Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform
              Resource Locators (URL)", RFC 1738, December 1994.

   [RFC2396]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
              Resource Identifiers (URI): Generic Syntax", RFC 2396,
              August 1998.

   [RFC1123]  Braden, R., "Requirements for each subexpression (i.e., each
   paired parenthesis).  We refer to the value matched Internet Hosts - Application
              and Support", STD 3, RFC 1123, October 1989.

   [RFC1808]  Fielding, R., "Relative Uniform Resource Locators", RFC
              1808, June 1995.

   [RFC2046]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part Two: Media Types", RFC 2046,
              November 1996.

   [RFC2518]  Goland, Y., Whitehead, E., Faizi, A., Carter, S. and D.
              Jensen, "HTTP Extensions for subexpression
   <n> as $<n>.  For example, matching the above expression to

      http://www.ics.uci.edu/pub/ietf/uri/#Related

   results in the following subexpression matches:

      $1 = http:
      $2 = http
      $3 = //www.ics.uci.edu
      $4 = www.ics.uci.edu
      $5 = /pub/ietf/uri/
      $6 = <undefined>
      $7 = <undefined>
      $8 = #Related
      $9 = Related

   where <undefined> indicates that the component is not present, as is
   the case Distributed Authoring --
              WEBDAV", RFC 2518, February 1999.

   [RFC0952]  Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet
              host table specification", RFC 952, October 1985.

   [RFC3513]  Hinden, R. and S. Deering, "Internet Protocol Version 6
              (IPv6) Addressing Architecture", RFC 3513, April 2003.

   [RFC2732]  Hinden, R., Carpenter, B. and L. Masinter, "Format for the query component
              Literal IPv6 Addresses in the above example.  Therefore, we
   can determine the value of the four components URL's", RFC 2732, December 1999.

   [RFC1736]  Kunze, J., "Functional Recommendations for Internet
              Resource Locators", RFC 1736, February 1995.

   [RFC1737]  Masinter, L. and fragment K. Sollins, "Functional Requirements for
              Uniform Resource Names", RFC 1737, December 1994.

   [RFC2141]  Moats, R., "URN Syntax", RFC 2141, May 1997.




Berners-Lee, et al.    Expires November 21, 2003               [Page 43]

Internet-Draft             URI Generic Syntax                   May 2003


   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, November 1987.

   [RFC2110]  Palme, J. and A. Hopmann, "MIME E-mail Encapsulation of
              Aggregate Documents, such as

      scheme    = $2
      authority = $4
      path      = $5
      query     = $7
      fragment  = $9

   and, going HTML (MHTML)", RFC 2110,
              March 1997.

   [RFC2717]  Petke, R. and I. King, "Registration Procedures for URL
              Scheme Names", BCP 35, RFC 2717, November 1999.

   [HTML]     Raggett, D., Le Hors, A. and I. Jacobs, "Hypertext Markup
              Language (HTML 4.01) Specification", December 1999.

   [Siedzik]  Siedzik, R., "Semantic Attacks: What's in the opposite direction, we can recreate a URI reference
   from its components using the algorithm URL?", April
              2001.

   [UTF-8]    Yergeau, F., "UTF-8, a transformation format of Section 5.2. ISO
              10646", RFC 2279, January 1998.

































Berners-Lee, et al.    Expires November 21, 2003               [Page 44]

Internet-Draft             URI Generic Syntax                   May 2003


Authors' Addresses

   Tim Berners-Lee
   World Wide Web Consortium
   MIT/LCS, Room NE43-356
   200 Technology Square
   Cambridge, MA  02139
   USA

   Phone: +1-617-253-5702
   Fax:   +1-617-258-5999
   EMail: timbl@w3.org
   URI:   http://www.w3.org/People/Berners-Lee/


   Roy T. Fielding
   Day Software
   2 Corporate Plaza, Suite 150
   Newport Beach, CA  92660
   USA

   Phone: +1-949-999-2523
   Fax:   +1-949-644-5064
   EMail: roy.fielding@day.com
   URI:   http://www.apache.org/~fielding/


   Larry Masinter
   Adobe Systems Incorporated
   345 Park Ave
   San Jose, CA  95110
   USA

   Phone: +1-408-536-3024
   EMail: LMM@acm.org
   URI:   http://larry.masinter.net/















Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 43] 45]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix C. Examples of Resolving Relative URI References

   Within an object with a well-defined base URI of

      http://a/b/c/d;p?q

   the relative URI would be resolved as follows:

C.1 Normal Examples

      g:h           =  g:h
      g             =  http://a/b/c/g
      ./g           =  http://a/b/c/g
      g/            =  http://a/b/c/g/
      /g            =  http://a/g
      //g           =  http://g
      ?y            =  http://a/b/c/d;p?y
      g?y           =  http://a/b/c/g?y
      #s            =  (current document)#s
      g#s           =  http://a/b/c/g#s
      g?y#s         =  http://a/b/c/g?y#s
      ;x            =  http://a/b/c/;x
      g;x           =  http://a/b/c/g;x
      g;x?y#s       =  http://a/b/c/g;x?y#s
      .             =  http://a/b/c/
      ./            =  http://a/b/c/
      ..            =  http://a/b/
      ../           =  http://a/b/
      ../g          =  http://a/b/g
      ../..         =  http://a/
      ../../        =  http://a/
      ../../g       =  http://a/g


C.2 Abnormal Examples

   Although the following abnormal examples are unlikely to occur in
   normal practice, all A. Collected ABNF for URI parsers should be capable of resolving them
   consistently.  Each example uses the same base as above.

   An empty reference refers to the start of the current document.

      <>            =  (current document)

   Parsers must be careful in handling the case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URI's path.  Note that the ".." syntax cannot

   To be used to change
   the authority component of a URI. filled-in later.
















































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 44] 46]

Internet-Draft             URI Generic Syntax                 March                   May 2003


      ../../../g    =  http://a/../g
      ../../../../g =  http://a/../../g

   In practice, some implementations strip leading relative symbolic
   elements (".", "..") after applying


Appendix B. Parsing a relative URI calculation, based
   on Reference with a Regular Expression

   Since the "first-match-wins" algorithm is identical to the "greedy"
   disambiguation method used by POSIX regular expressions, it is
   natural and commonplace to use a regular expression for parsing the
   potential five components of a URI reference.

   The following line is the regular expression for breaking-down a
   well-formed URI reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9

   The numbers in the theory that compensating for obvious author errors is better
   than allowing second line above are only to assist readability;
   they indicate the request reference points for each subexpression (i.e., each
   paired parenthesis).  We refer to fail.  Thus, the above two references
   will be interpreted as "http://a/g" by some implementations.

   Similarly, parsers must avoid treating "." and ".." value matched for subexpression
   <n> as special when
   they are not complete components of a relative path.

      /./g          =  http://a/./g
      /../g         =  http://a/../g
      g.            =  http://a/b/c/g.
      .g $<n>.  For example, matching the above expression to

      http://www.ics.uci.edu/pub/ietf/uri/#Related

   results in the following subexpression matches:

      $1 =  http://a/b/c/.g
      g.. http:
      $2 =  http://a/b/c/g..
      ..g http
      $3 =  http://a/b/c/..g

   Less likely are cases where the relative URI uses unnecessary or
   nonsensical forms of the "." and ".." complete path segments.

      ./../g //www.ics.uci.edu
      $4 =  http://a/b/g
      ./g/. www.ics.uci.edu
      $5 =  http://a/b/c/g/
      g/./h /pub/ietf/uri/
      $6 =  http://a/b/c/g/h
      g/../h <undefined>
      $7 =  http://a/b/c/h
      g;x=1/./y <undefined>
      $8 =  http://a/b/c/g;x=1/y
      g;x=1/../y #Related
      $9 =  http://a/b/c/y

   Some applications fail to separate the reference's query and/or
   fragment components from a relative path before merging it with Related

   where <undefined> indicates that the
   base path.  This error component is not present, as is rarely noticed, since typical usage of a
   fragment never includes
   the hierarchy ("/") character, and case for the query component is not normally used within relative references.

      g?y/./x       =  http://a/b/c/g?y/./x
      g?y/../x      =  http://a/b/c/g?y/../x
      g#s/./x       =  http://a/b/c/g#s/./x
      g#s/../x      =  http://a/b/c/g#s/../x

   Some parsers allow the scheme name to be present in a relative URI if
   it is the same as above example.  Therefore, we
   can determine the base URI scheme.  This is considered to be a
   loophole in prior specifications value of partial URI [RFC1630]. Its use
   should be avoided, but is allowed for backwards compatibility.

      http:g the four components and fragment as

      scheme    = $2
      authority = $4
      path      = $5
      query     =  http:g           ; for validating parsers
                    /  http://a/b/c/g   ; for backwards compatibility $7
      fragment  = $9

   and, going in the opposite direction, we can recreate a URI reference
   from its components using the algorithm of Section 5.3.







Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 45] 47]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix D. C. Embedding the Base URI in HTML documents

   It is useful to consider an example of how the base URI of a document
   can be embedded within the document's content.  In this appendix, we
   describe how documents written in the Hypertext Markup Language
   (HTML) [HTML] can include an embedded base URI.  This appendix does
   not form a part of the URI specification and should not be considered
   as anything more than a descriptive example.

   HTML defines a special element "BASE" which, when present in the
   "HEAD" portion of a document, signals that the parser should use the
   BASE element's "HREF" attribute as the base URI for resolving any
   relative URI.  The "HREF" attribute must be an absolute URI.  Note
   that, in HTML, element and attribute names are case-insensitive. For
   example:

      <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN">
      <HTML><HEAD>
      <TITLE>An example HTML document</TITLE>
      <BASE href="http://www.example.com/Test/a/b/c">
      </HEAD><BODY>
      ... <A href="../x">a hypertext anchor</A> ...
      </BODY></HTML>

   A parser reading the example document should interpret the given
   relative URI "../x" as representing the absolute URI

      <http://www.example.com/Test/a/x>

   regardless of the context in which the example document was obtained.





















Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 46] 48]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix E. Recommendations for D. Delimiting a URI in Context

   URIs are often transmitted through formats that do not provide a
   clear context for their interpretation.  For example, there are many
   occasions when a URI is included in plain text; examples include text
   sent in electronic mail, USENET news messages, and, most importantly,
   printed on paper.  In such cases, it is important to be able to
   delimit the URI from the rest of the text, and in particular from
   punctuation marks that might be mistaken for part of the URI.

   In practice, URI are delimited in a variety of ways, but usually
   within double-quotes "http://example.com/", angle brackets <http://
   example.com/>, or just using whitespace

      http://example.com/

   These wrappers do not form part of the URI.

   In the case where a fragment identifier is associated with a URI
   reference, the fragment would be placed within the brackets as well
   (separated from the URI with a "#" character).

   In some cases, extra whitespace (spaces, linebreaks, line-breaks, tabs, etc.) may
   need to be added to break a long URI across lines. The whitespace
   should be ignored when extracting the URI.

   No whitespace should be introduced after a hyphen ("-") character.
   Because some typesetters and printers may (erroneously) introduce a
   hyphen at the end of line when breaking a line, the interpreter of a
   URI containing a line break immediately after a hyphen should ignore
   all unescaped whitespace around the line break, and should be aware
   that the hyphen may or may not actually be part of the URI.

   Using <> angle brackets around each URI is especially recommended as
   a delimiting style for a URI that contains whitespace.

   The prefix "URL:" (with or without a trailing space) was formerly
   recommended as a way to help distinguish a URI from other bracketed
   designators, though it is not commonly used in practice and is no
   longer recommended.

   For robustness, software that accepts user-typed URI should attempt
   to recognize and strip both delimiters and embedded whitespace.








Berners-Lee, et al.    Expires September 1, 2003               [Page 47]

Internet-Draft             URI Generic Syntax                 March 2003

   For example, the text:

      Yes, Jim, I found it under "http://www.w3.org/Addressing/",
      but you can probably pick it up from <ftp://ds.internic.
      net/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
      ietf/uri/historical.html#WARNING>.

   contains the URI references

      http://www.w3.org/Addressing/
      ftp://ds.internic.net/rfc/
      http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 48] 49]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix F. Abbreviated URIs

   The URI syntax was designed for unambiguous reference to network
   resources and extensibility via the URI scheme.  However, as URI
   identification and usage have become commonplace, traditional media
   (television, radio, newspapers, billboards, etc.) have increasingly
   used abbreviated URI references.  That is, a reference consisting of
   only the authority and path portions of the identified resource, such
   as

      www.w3.org/Addressing/

   or simply the DNS hostname on its own.  Such references are primarily
   intended for human interpretation rather than machine, with the
   assumption that context-based heuristics are sufficient to complete
   the URI (e.g., most hostnames beginning with "www" are likely to have
   a URI prefix of "http://").  Although there is no standard set of
   heuristics for disambiguating abbreviated URI references, many client
   implementations allow them to be entered by the user and
   heuristically resolved.  It should be noted that such heuristics may
   change over time, particularly when new URI schemes are introduced.

   Since an abbreviated URI has


      net/rfc/>.  Note the same syntax as a relative URI path,
   abbreviated URI references cannot be used warning in contexts where relative
   URIs are expected.  This limits <http://www.ics.uci.edu/pub/
      ietf/uri/historical.html#WARNING>.

   contains the use of abbreviated URIs to places
   where there is no defined base URI, such as dialog boxes and off-line
   advertisements. URI references

      http://www.w3.org/Addressing/
      ftp://ds.internic.net/rfc/
      http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING











































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 49] 50]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Appendix G. E. Summary of Non-editorial Changes

G.1

E.1 Additions

   IPv6 literals have been added to the list of possible identifiers for
   the host portion of a server authority component, as described by [RFC2732],
   with the addition of "[" and "]" to the reserved, uric, reserved and
   uric-no-slash uric sets.
   Square brackets are now specified as reserved
   for the authority component, allowed within the opaque part of an
   opaque URI, authority
   component and not allowed in the hierarchical syntax except for outside their use as delimiters for an
   IPv6reference within host.  In order to make this change without
   changing the technical definition of the path, query, and fragment
   components, those rules were redefined to directly specify the
   characters allowed rather than continuing to be defined in terms of uric.

   Since [RFC2732] defers to [RFC2373] [RFC3513] for definition of an IPv6 literal
   address, which unfortunately has lacks an incorrect ABNF description of
   IPv6address, we created a new ABNF rule for IPv6address that matches
   the text representations defined by Section 2.2 of [RFC2373]. [RFC3513].
   Likewise, the definition of IPv4address has been improved in order improved in order to
   limit each decimal octet to the range 0-255, and the definition of
   hostname has been improved to better specify length limitations and
   partially-qualified domain names.

   Section 6 (Section 6) on URI normalization and comparison has been
   completely rewritten and extended using input from Tim Bray and
   discussion within the W3C Technical Architecture Group.  Likewise,
   Section 2.1 on the encoding of characters has been replaced.

   An ABNF production for URI has been introduced to correspond to the
   common usage of the term: an absolute URI with optional fragment.

E.2 Modifications from RFC 2396

   The ad-hoc BNF syntax has been replaced with the ABNF of [RFC2234].
   This change required all rule names that formerly included underscore
   characters to be renamed with a dash instead.

   Section 2.2 on reserved characters has been rewritten to clearly
   explain what characters are reserved, when they are reserved, and why
   they are reserved even when not used as delimiters by the generic
   syntax. Likewise, the section on escaped characters has been
   rewritten, and URI normalizers are now given license to unescape any
   octets corresponding to unreserved characters.  The crosshatch ("#")
   character has been moved back from the excluded delims to the
   reserved set.

   The ABNF for URI and URI-reference has been redesigned to
   limit each decimal octet make them
   more friendly to the range 0-255, LALR parsers and significantly reduce complexity. As



Berners-Lee, et al.    Expires November 21, 2003               [Page 51]

Internet-Draft             URI Generic Syntax                   May 2003


   a result, the definition layout form of
   hostname syntax description has been improved removed,
   along with the uric-no-slash, opaque-part, and rel-segment
   productions. All references to "opaque" URIs have been replaced with
   a better specify length limitations and
   partially-qualified domain names.

   Section 6 on URI normalization and comparison description of how the path component may be opaque to
   hierarchy. The fragment identifier has been completely
   rewritten and extended using input from Tim Bray moved back into the
   section on generic syntax components and discussion within the W3C Technical Architecture Group.

G.2 Modifications URI and
   relative-URI productions, though it remains excluded from RFC 2396
   absolute-URI. The ad-hoc BNF syntax has been replaced with ambiguity regarding the ABNF parsing of [RFC2234].
   This change required all rule names that formerly included underscore
   characters to be renamed URI-reference as
   a URI or a relative-URI with a dash instead. Likewise, absoluteURI
   and relativeURI have been changed to absolute-URI colon in the first segment is now
   explained and relative-URI,
   respectively, for consistency. disambiguated in the section defining relative-URI.

   The ABNF of hier-part and relative-URI (Section 3) has been corrected to allow a
   relative URI path to be empty.  This also allows an absolute-URI to
   consist of nothing after the "scheme:", as is present in practice
   with the "DAV:" namespace [RFC2518] and the "about:" URI used by many
   browser implementations. The ambiguity regarding the parsing of
   net-path, abs-path, and rel-path is now explained and disambiguated
   in the same section.

   Registry-based naming authorities that use the hierarchical authority
   syntax component are now limited to DNS hostnames, since those have
   been the only such URIs in deployment.  This change was necessary to
   enable internationalized domain names to be processed in their native
   character encodings at the application layers above URI processing.
   The reg_name, server, and hostport productions have been removed to
   simplify parsing of the URI syntax.

   The ABNF of qualified has been simplified to remove a parsing
   ambiguity without changing the allowed syntax. allowed syntax.  The toplabel
   production has been removed because it served no useful purpose. The
   ambiguity regarding the parsing of host as IPv4address or hostname is
   now explained and disambiguated in the same section.

   The resolving relative references algorithm of [RFC2396] has been
   rewritten using pseudocode for this revision to improve clarity and



Berners-Lee, et al.    Expires September 1, 2003               [Page 50]

Internet-Draft             URI Generic Syntax                 March 2003
   fix the following issues:

   o  [RFC2396] section 5.2, step 6a, failed to account for a base URI
      with no path.

   o  Restored the behavior of [RFC1808] where, if the the reference
      contains an empty path and a defined query component, then the
      target URI inherits the base URI's path component.

   o  Removed the special-case treatment of same-document references in
      favor of a section that explains that a new retrieval action
      should not be made if the target URI and base URI, excluding
      fragments, match.



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 51] 52]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Index

A
   ABNF  9
   abs-path  14  15
   absolute  9
   absolute-path  22
   absolute-URI  14
   absolute-URI-reference  20  23
   access  7
   alphanum  17
   authority  15  15, 16

D
   dec-octet  17
   delims  12  13
   dereference  8
   domainlabel  17
   dot-segments  19

E
   escaped  11  12
   excluded  13

F
   fragment  20

G
   generic syntax  5

H
   h4  18
   hier-part  14  15
   hierarchical  9
   host  16  17
   hostname  17
   hostport  16

I
   identifier  5
   invisible  13
   IPv4  17
   IPv4address  17
   IPv6  18
   IPv6address  18
   IPv6reference  18

L
   locator  6
   ls32  18

M
   mark  11

N
   net-path  14

O
   opaque-part  14

P
   path  18



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 52] 53]

Internet-Draft             URI Generic Syntax                 March                   May 2003


M
   mark  11

N
   name  6
   net-path  15
   network-path  22

P
   path  15, 19
   path-segments  18  19
   pchar  18  19
   port  16  18

Q
   qualified  17
   query  19  20

R
   reg-name  16
   rel-path  22
   rel-segment  15
   relative  9
   relative-path  22
   relative-URI  22
   representation  8
   reserved  10
   resolution  8
   resource  4
   retrieval  8

S
   same-document  23
   sameness  8
   scheme  15
   segment  18
   server  16  19
   suffix  23

T
   toplabel  17
   transcription  6

U
   uniform  4
   unreserved  11
   unwise  12  13
   URI grammar
      abs-path  14  15
      absolute-URI  14
      absolute-URI-reference  20  23
      ALPHA  9
      alphanum  17



Berners-Lee, et al.    Expires November 21, 2003               [Page 54]

Internet-Draft             URI Generic Syntax                   May 2003


      authority  15  15, 16
      CR  9
      CTL  9
      dec-octet  17
      delims  12
      DIGIT  9
      domainlabel  17
      DQUOTE  9
      escaped  11  12
      fragment  20  15, 20, 22
      h4  18
      HEXDIG  9
      hier-part  14  15, 22, 23
      host  16, 17
      hostname  17
      hostport  17
      IPv4address  17
      IPv6address  18
      IPv6reference  18
      LF  9
      ls32  18
      mark  11
      net-path  14



Berners-Lee, et al.    Expires September 1, 2003               [Page 53]

Internet-Draft             URI Generic Syntax                 March 2003


      opaque-part  14
      path  18  15
      OCTET  9
      path-segments  18  15, 19
      pchar  18  19, 20, 20
      port  17  16, 18
      qualified  17
      query  19
      reg-name  16  15, 20, 22, 23
      rel-path  22
      rel-segment  22  15
      relative-URI  22, 22
      reserved  10  11
      scheme  15  15, 16, 23
      segment  18
      server  16
      toplabel  17  19
      SP  9
      unreserved  11
      unwise  12
      URI  15, 22
      URI-reference  20  22
      uric  9
      uric-no-slash  14  10
      userinfo  16, 16
   URI  15
   URI-reference  20  22
   uric  9
   uric-no-slash  14  10
   URL  6
   URN  6
   userinfo  16







Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 54] 55]

Internet-Draft             URI Generic Syntax                 March                   May 2003


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION



Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 55] 56]

Internet-Draft             URI Generic Syntax                 March                   May 2003


   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.











































Berners-Lee, et al.    Expires September 1, November 21, 2003               [Page 56] 57]
----