view Side-By-Side changes
June 18,October 10, 2002 ExpiresDecember 2002April 2003 Role of the Domain Name Systemdraft-klensin-dns-role-03.txtdraft-klensin-dns-role-04.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This document representsa summary of the personal opinionsan overview ofthe author on the subject coveredan evolving technology area and is not intended to evolve into a standard of any kind. Copyright Notice Copyright(C)(c) The Internet Society(2000).(2000, 2001, 2002). All Rights Reserved. 0. Abstract The original function and purpose of the DNS isreviewed,reviewed and contrasted with some of thefunctions intopurposes for which itis being forced todayhas recently been applied and some of the newer demands being placed upon it or suggested for it. A framework for an alternative to placing these additional stresses on the DNS is then outlined. This document and that framework are not a proposed solution, only a strong suggestion that the time has come to begin thinking more broadly about the problems we are encountering and possible approaches to solving them. A mailing list has been initiated for discussion of this draft, its successors, and closely-related issues at ietf-irnss@lists.elistx.com. See http://lists.elistx.com/archives/ for subscription and archival information. Table of Contents 0. Abstract 1. Introduction and History 1.1 Context for DNS development 1.2 Review of the DNS and its role as designed 1.3 The web and user-visible domain names 1.4 A pessimistic history of the evolution of Internet applications protocols. 2. Signs of DNS overloading 3.The search system story.Searching, Directories, and the DNS 3.1 Overview 3.2 Some details and comments. 4. Examining internationalization 4.1. ASCII isn't just because of English 4.2. The "ASCII Encoding" approaches 4.3. "Stringprep" and its complexities 4.4 The UCS Stability Problem 4.5. Audiences, end users, and the UI problem 4.6 Business cards and other natural uses of natural languages 4.7 ASCII encodings and the Roman keyboard assumption 4.84.8 A pessimistic summary of intra-DNSIntra-DNS approaches for "multilingualnames"names": A Pessimistic Summary 5. Search-based Systems: The Key Controversies 5.1. One directory or many 5.2 Why not a proposal? 6. Security Considerations 7. References 7.1. Normative References 7.2. Explanatory and Informative References 8. Acknowledgements10. Culprit9. Author's address 1. Introduction and History Several of the comments that follow are somewhat revisionist. Good design and engineering often requires a level of intuition by the designers about things that will be necessary in the future; the reasons for some of these design decisions are not made explicit at the time because no one is able to articulate them. The discussion below reconstructs some of the decisions about the Internet's primary namespace (the "Class=IN" DNS) in the light of subsequent development and experience. In addition, the historical reasons for particular decisions about the Internet were often severely underdocumented contemporaneously and, not surprisingly, different participants have different recollections about what happened and what was considered important. Consequently, the quasi-historical story below is just one story. There may be (indeed, almost certainly are) other stories about howwe gotthe DNS evolved towhere we are today,its present state, butthey probably don't, of themselves,those variants do not invalidate the inferences and conclusions. this document presumes a general understanding of the terminology of RFC 1034 [RFC1034] or of any good DNS tutorial (see, e.g., [Albitz]). 1.1 Context for DNS development During the entire post-startup-period life of the ARPANET and nearly the first decade or so of operation of the Internet, the list of host names and their mapping to and from addresses was maintained in a frequently-updated "host table" [RFC625,811, 952].RFC811, RFC952]. The names themselves were restricted to a subset of ASCII chosen to avoid ambiguities in printed form, to permit interoperation with systems using other character codings (notably EBCDIC), and to avoid the "national use" code positions of ISO 646 [IS646]. This table was just a list with a common format that was eventually agreed-upon; sites were expected to frequently obtain copies of, and install, new versions. The host tables themselves were introduced to * Eliminate the requirement for people to remember host numbers (addresses). Despite apparent experience to the contrary in the conventional telephone system, numeric numbering systems, including the numeric host number strategy, did not (and do not) work well for more than a (large) handful of hosts. * Provide stability when addresses changed. Since addresses --to some degree in the ARPANET and more importantly in the contemporary Internet-- are a function of network topology and routing, they often had to be changed when connectivity or topology changed. The names could be kept stable even as addresses changed. * Some hosts (so-called "multihomed" ones) needed multiple addresses to reflect different types of connectivity and topology. Again, the names were very useful for avoiding the requirement that would otherwise exist for users and other hosts to track these multiple host numbers and addresses and the topological considerations for selecting one over others.Toward the endAfter many years ofthat long (in network time) period,using the host table approach, the community concluded thatthe host tablemodel did not scale adequately and that it would not adequately support new service variations. Aworkinggroupwas created,came together to draw several ideas andtheincomplete proposals together and to design a replacement. The DNS was the result of that effort. Therole ofgoals for the DNSwas to preserveincluded preservation of the capabilities of the host table arrangements (especially unique, unambiguous, host names),provideprovision for addition of additional services (e.g., the special record types for electronic mail routing which quickly followed introduction of the DNS), and todo soaccomplish this on the base of a robust, hierarchical, distributed, name lookup system.That systemThe DNS design also permitted distribution of name administration, rather than requiring that each host be entered into a single, central, table by a central administration. 1.2 Review of the DNS and its role as designed The DNS was designed primarily to identify network resources. Although there was speculation about including, e.g., personal names and email addresses, it was not designed primarily to identify people, brands, etc. At the same time, the system was designed with the flexibility to accomodate new data types and structures, both through the addition of new record types to the initial "INternet" class, and, potentially, through the introduction of new classes. Since the appropriate identifiers and content of those future extensions could not be anticipated, the design provided that these fields could contain any (binary) information, not justthe`the restricted text forms of the host table. However, the DNS as-used is intimately tied to the applications and application protocols that utilize it, often at a fairly low level. In particular, despite the ability of the protocols and data structures themselves to accomodate any binary representation, DNS names as usedarewere historically not[even]even unrestricted ASCII, but a very restricted subset of it, a subset that derives primarily from the original host table naming rules. Selection of that subset was driven in part by human factors considerations, including a desire to eliminate possible ambiguities in an international context. Hence character codes that had international variations in interpretation were excluded, the underscore character and case distinctions were eliminated as being confusing (in the underscore's case, with the hyphen character) when written or read by people, and so on. These considerations appear to be very similar to those that resulted in similarly restricted character sets being used as protocol elements in many ITU and ISO protocols (cf. [X29]). Another assumption was that there would be a high ratio of physical hosts to second level domains and, more generally, that the system would be deeply hierarchical, with most systems (and names) at the third level or below and a very largeratiopercentage of the total names representing physicalhosts to total names.hosts. There are domains that follow this model: many university and corporate domains use fairly deep hierarchies, as do a fewcountry code TLDs (".US" iscountry-oriented top level domains ("ccTLDs"). Historically, the "US." domain has been an excellentexample).example of the deeply hierarchical approach. However, by 1998, comparison of several efforts to survey theRIPE hostcount list is now showingDNS showed a count of SOA records thatis approachingapproached (and may have passed) the number of distinct hosts. I.e., due to synomyms or aliases of one form or another, the number of delegated domains on the Internet was approaching or exceeding the number of hosts. Whilerecentexperience up to this time has shown that the DNS is robust enough --given contemporary machines as servers and current bandwidth norms-- to be able to continue to operate reasonably well when those historical assumptions are not met (e.g., with ahuge,flat, structure under".COM"),".COM" containing well over ten million of delegated subdomains [COMSIZE]), it is still useful to remember that the system could have been designed to work optimally with a flat structure (and very large zones) rather than a deeply hierarchical one, and was not. Similarly, despite some early speculation about entering people's names and email addresses into the DNS directly, with the sole exception (at least in the "IN" class) of one field of the SOA record, electronic mail addresses in the Internet have preserved the original, pre-DNS, "user (or mailbox) at location" conceptual format rather than a flatter or strictly faceted one. Location, in that instance, is a reference to a host. Both the DNS architecture itself and the two-level (host name and mailbox name) provisions for email and similar functions (e.g., see the fingerprotocol),protocol [FINGER]), also anticipated a relatively high ratio of users to actual hosts. Despite the observation in RFC 1034 that the DNS was expected to grow to be proportional to the number of users (section 2.3), it has never been clear that the DNS was seriously designed for, or could, scale to the order of magnitude of number of users (or, more recently, products or document objects), rather than that of physical hosts.LikeJust as was the case for the host table before it, the DNShasprovided criticial uniqueness fornamesnames, and universal accessibility tothemthem, as part of overall "single internet" and "end to end" models (cf [RFC2826]). However, there are many signs that, as new usesevolveevolved and originalassmumptions are abused,assumptions were abused (if not violated outright), the systemiswas being stretched to, or beyond, its practical limits. The original design effort that led to the DNS included examination of the directory technologies available at the time. Theworkingdesign group concluded that the DNS design, with its simplifying assumptions and restricted capabilities, would be feasible to deploy and make adequately robust, which the more comprehensive directory approaches were not. At the same time, some of the participants feared that the limitations might cause future problems; this document essentially takes the position that they were probably correct. On the other hand, directory technology and implementations have evolved significantly in the ensuing years: it may be time to revisit the assumptions, either in the context of the two- (or more) level mechanism contemplated by the rest of this document or, even more radically, as a path toward a DNS replacement. 1.3 The web and user-visible domain namesFrom>From the standpoint of the integrity of the domain name system --and scaling of the Internet, including optimal accessibility to content-- the web design decision to use "A record" domainnames,names directly in URLs, rather than some system ofindirection rather than putting domain names directly into URLs,indirection, has proven to be a serious mistake in several respects. Convenience of typing, and the desire to make domain names out of easily-remembered product names, has led to a flattening of the DNS, with many people now perceiving that second-level names under COM (or in some countries, second- or third-level names under the relevant ccTLD) are all that ismeaningful (thismeaningful. This perception has been reinforced by some domain name registrars [REGISTRAR] who have been anxious to "sell" additionalnames).names. And, of course, the perception that oneneedsneeded atop-levelsecond-level (or even top-level) domain per product, rather than having names associated with a (usually organizational) collection of networkresourcesresources, has led to a rapid acceleration in the number of names beingregistered, a phenomenon that hasregistered. That acceleration has, in turn, clearly benefited registrars charging on a per-name basis, "cybersquatters", and others in the business of "selling" names, but has not obviouslybenefittedbeen beneficial for the Internet as a whole.TheThis emphasis on second-level domain names has also created a problem for the trademark community. Since the Internet is international, and names are being populated in a flat and unqualified space, similarly-named entities are in conflict even if there would ordinarily be no chance of confusing them in the marketplace. The problem appears to be unsolvable except by a choice between draconianmeasures --possibly includingmeasures. These might include significant changes to theunderlyinglegislation andconventions--conventions that govern disputes over "names" and "marks". Or they might result in a situation in which the "rights" to a name are typically not settled using the subtle and traditional product (or industry) type and geopolitical scope rules of the trademark system but by depending largely on main force, e.g., the organization with the greatest resources to invest in defending (or attacking) names will ultimately win out. The latter raises not only important issues of equity, but the risk of backlash as the numerous small players are forced to relinquish names they find attractive and to adopt less-desirable naming conventions. Independent of these sociopolitical problems, content distribution issues have made it clear that it should be possible for an organization to have copies of data it wishes to make available distributed around the network, with a user who asks for the information by name getting the topologically-closest copy. This is not possible with simple, as-designed, use of the DNS: DNS names identify target resources or, in the case of email "MX" records, a preferentially-ordered list of resources "closest" to a target (not to the source/user). Several technologies (and, in some cases, corresponding business models) have arisen to work around these problems, including intercepting and altering DNS requests so as to point to other locations. Additional implications are still being discovered and evaluated.Rewriting dnsApproaches that involve interception of DNS queries and rewriting of DNS namesor(or otherwise altering the resolution process based on the topological location of theuser seems,user) seem, however, to risk disrupting end-to-end applications in the generalcase.case and raise many of the issues discussed by the IAB in [IAB-OPES]. These problems occur even if the rewriting machinery is accompanied by additional workarounds for particularapplications:applications. For example, security associations and applications that need to identify "the same host"as the applications for which these tools have been designedoften run into one problem oranother.another if DNS names or other references are changed in the network, without participation of the applications trying to invoke the associated services. 1.4 A pessimistic history of the evolution of Internet applications protocols. At the applications level, few of the protocols in active,widespreadwidespread, use on the Internet reflect eitherthecontemporary knowledge in computer science or human factors or experience accumulated through deployment and use. Instead, protocols tend to be deployed at a just-past-prototype level, typically including the types of expedient compromises typical with prototypes. If they prove useful, the nature of the network permits very rapid dissemination (i.e., they fill a vacuum, even if a vacuum that no one previously knew existed). But, once the vacuum is filled, the installed base provides its own inertia: unless the design is so seriously faulty as to prevent effective use (or there is a widely-perceived sense of impending disaster unless the protocol is replaced), future developments must maintain backward compatibility and workarounds for problematic characteristics rather than benefiting from redesign in the light of experience. Applications that are "almost good enough" prevent development and deployment of high-quality replacements. There are many, perhaps obvious, examples of this. Despite many known deficiencies and weaknesses of definition, the "finger" and "whois" [WHOIS] protocols have not been replaced (despite many efforts to update or replace thelatter).latter [WHOIS-UPDATE]). The telnet protocol and its many options drove out thesupdupSUPDUP [RFC734] one, which was arguably much better designed for a diverse collection of network hosts. A number of efforts to replace the email or file transfer protocols with models which their advocates considered much better have failed. And, more recently and below the applications level, there is some reason to believe that this resistance to change has been one of the factors impeding IPv6 deployment. 2. Signs of DNS overloading Parts of the historical discussion above identify areas in whichit is becoming clear thatthe DNSis becominghas become overloaded (semantically if not in the mechanical ability to resolve names).While we seem toAt the time this document was written, it appears that DNS performance and reliability are stillbe wellwithin the"just about good enough" range -- current mechanisms andacceptable range. Recent proposals and mechanisms todeal with these problems arebetter respond to overloading and scaling issues have all focused on patching or working around limitationswithinof the DNS when it is utilized for out-of-design functions, rather than dramatic rethinking-- theof either its design or those uses. The number of these issues thatare arisinghave arisen at much the same time may argue forrethinging mechanismsjust that type of rethinking, andrelationships,not justmore patchesadding complexity andkludges.attempting to incrementally alter the design (see, for example, the discussion of simplicity in [Bush-Arch]). For example: o While technical approaches such as larger and higher-powered servers and more bandwidth, and legal/political mechanisms such as dispute resolution policies, have arguably kept the problems from becoming critical, the DNS has not proven adequately responsive to business and individual needs to describe or identify things (such as product names and names of individuals) other than strict network resources. o While stacks have been modified to better handle multiple addresses on a physical interface and some protocols have been extended to include DNS names for determining context, the DNSdoesn'tdoes not deal especially well withhigh-multiplemany namesperassociated with a given host (neededforfor, e.g. web hosting facilities with multiple domains on a server). o Efforts to add names deriving from languages or character sets based on other than simple ASCII and English-like names (see below), or even to utilize complex company or product names without the use ofhierarchyhierarchy, have created apparent requirements for names (labels) that are over 63 octets long. This requirement will undoubtedly increase over time; while there are workarounds to accomodate longer names, they impose their own restrictions and cause their own problems. o Increasing commercialization of the Internet, and visibility of domain names that are assumed to match names of companies or products, has turned the DNS and DNS names into a trademark battleground. The traditional trademark system in (at least) most countries makes careful distinctions about fields of applicability. When the space is flattened, withoutdifferentiatorsdifferentation by either geography or industry sector, not only are there likely conflicts between "Joe's Pizza" (of Boston) and "Joe's Pizza" (of San Francisco) but between both and "Joe's Auto Repair" (of LosAngeles): allAngeles). All three would like to control "Joes.com" (and would prefer, if it were permitted by DNS naming rules, to spell it as "Joe's.com" and have both resolve the same way) and may claim trademark rights to do so, even though conflict or confusion would not occcur with traditional trademark principles. o Many organizations wish to have different web sites under the same URL and domain name. Sometimes this is to create local variations --the Widget Company might want to present different material to a UK user relative to a US one-- and sometimes it is to provide higher performance by supplying information from the server topologically closest to the user. If the name resolution mechanism is expected to provide this functionality,it should arguably providethree are three possible models (which might be combined): - supply information about multiple sites (or locations orreferences) that canreferences). Those sites would, in turn, provide information associated with thesamename and sufficient site-specific attributesassociated with each of those sitesto permitapplicationsthe application to make a sensiblechoices,choice of destination, orshould- accept client-site attributes and utilize them in the searchprocess. Or, it should be able toprocess, or - return different answers based on the location or identity of the requestor. While there are some tricks that can provide partial simulations ofthis typethese types of function, DNS responses cannot be reliably conditioned in this way.TheseThese, and similar, issues of performance or content choices can, of course, be thought of as not involving the DNS at all. For example,athe commonly-cited alternateapproach,approach of coupling these issues to HTTP contentnegotiation,negotiation (cf. [RFC2295]), requires that an HTTP connection first be opened to some "common" or "primary" host so thatthese issuespreferences can be negotiated and then the clientredirected.redirected or sent alternate data. At least from the standpoint of improving performance by accessing a "closer" location, both initially and thereafter, thisis to loseapproach sacrifices the desired result before the client initiates any action. It could even be argued that some of the characteristics of common content negotiation approaches are workarounds for the non-optimal use of the DNS in web URLs. o Many existing and proposed systems for "finding things on the Internet" require a true search capability in which near matches can be reported to theuser, oruser (or to some user agent with an apppropriaterule-set,rule-set) and to which queries may beslightlyambiguous or fuzzy. TheDNSDNS, by contrast, can accomodate only one set of (quite rigid) matching rules.Current proposalsProposals to permit different rules in different localities (e.g., matching rules that are TLD or zone-specific) help to identify theproblem, but, ifproblem. But they cannot be applied directly to the DNS, without eitherdon't provideabandoning the desired level of flexibilitythat would be desirableortend to isolateisolating different parts of the Internet from each other (or both). Fuzzy or ambiguous searches are desirable for(at least)resolution ofbusinessnames that might have spelling variations and for names that can be resolved into different sets of glyphs depending on context.This goesEspecially when internationalization is considered, variant name problems go beyond"mere" canonicalizationsimple differences(different waysin representation ofrepresenting the samea character or orderingthe same string)of a string. Instead, avoiding user astonishment andinto suchconfusion requires consideration of relationships such asthe use oflanguages that can be written with differentalphabets for the same language,alphabets, Kanji-Hiragana relationships, Simplified and Traditional Chinese, etc. See [Seng] for a discussion and suggestions for addressing a subset of these issues in the context of characters based on Chinese ones. But that document essentially illustrates the difficulty of providing the type of flexible matching that would be anticipated by users; instead, it tries to protect against the worst types of confusion (and opportunities for fraud). o The historicalDNSDNS, and applications that make assumptions about how itworksworks, impose significant risk (or forces technical kludges and consequent odd restrictions), when one considers adding mechanisms for use with various multi-character-set and multilingual "internationalization" systems.Cf RFC 2825.See the IAB's discussion of some of these issues [RFC2825] for more information. o In order to provide proper functionality to the Internet, the DNS must have a single unique root(see RFC 2826 for a(the IAB provides more discussion of thisissue).issue [RFC2826]). There are many desires for local treatment of names or character sets that cannot be accomodated without either multiple roots (e.g., a separate root for multilingualnames)names, proposed at various times by MINC [MINC] and others), or mechanisms that would have similar effects in terms of Internet fragmentation and isolation. o For some purposes, it is desirable to be able to search targets (i.e., by value, not just by name (label)). One might, for example, want to locate all of the host (and virtual host) names which cause mail to be directed to a given server via MX records. The DNS does not support this capability (see the discussion in [IQUERY]) and it can be simulated only by extracting all of the relevant records (perhaps by zone transfer if the sourcedoesn't prohibit that through access lists)permits doing so -- which is becoming less frequently available) and then searching a file built from those records. o Finally, as additional types of personal or identifying information are added to the DNS, issuesofarise with protection of thatinformation and makinginformation. There are increasing calls to make different information available based on the credentials and authorization of the source of the inquiry. As with information keyed to sitelocational andlocations or proximityinformation(as discussed above), the DNS protocols makethe mechanisms needed to do thisproviding these differentiated services quite difficult if not impossible. In each of these cases, it is, or might be, possible to devise ways to trick the DNS system into supporting mechanisms that were not designed into it. Several ingenious solutions have been proposed in many of these areas already, and some have been deployed into the marketplace with some success. But the price of each of these changes is added complexity and, with it, added risk of unexpected and destabilizing problems. Several of the above problems are addressed well by a good directory system (supported by the LDAP protocol or some protocol more precisely suited to these specific applications) or searching environment (such as common web search engines) although not by the DNS. Given the difficulty of deploying new applications discussed above, an important question is whether the tricks and kludges are bad enough, or willscale up tobecome badenough,enough as usage grows, that new solutions are needed and can be deployed. 3.The search system story.Searching, Directories, and the DNS 3.1 Overview The discussion above, and the constraints of theDNS argue for introducingDNS, suggest the introduction of an intermediate protocol mechanism, referred toherebelow as a "searchlayer".layer" or "searchable system". The terms "directory" and "directory system" are usedinterchangablyinterchangeably with "searchable system" in this document although the latter is far more precise. Search layer proposals would use a two (or more) -stage lookup, not unlike several of the proposals for internationalized names in the DNS (see section 4), but all operations but the final one wouldinvolvinginvolve searching other systems, rather than looking up identifiers in the DNS itself.ThisAs explained below, this would permitus to relaxrelaxation of severalconstraints and produceconstraints, leading to a more capable and comprehensive overall system. Ultimately, many of the issues with domain names arise as the result ofpeople attemptingefforts to use the DNS as a directory.While there has not been enough pressure/demand to justify a changeWhile, at the time this document was written, sufficient pressure or demand had not occurred todate,justify a change, ithaswas alreadybeenquite clear that, as a directory system, the DNS is a good deal less than ideal. This document suggests that there actually is a requirement for a directory system, and that the right solution to a searchable system requirement is a searchable system, not a series of DNS patches, kludges, or workarounds.In particular...The following points illustrate particular aspects of this conclusion. o A directory system would not require imposition of particular length limits on names. o A directory system could permit explicit association of attributes of, e.g., language and country, with a name, without having to utilize trick encodings to incorporate that information in DNS labels (or creating artificial hierarchy for doing so). o There is considerable experience (albeit not much of it very successful) in doing fuzzy and "sonex" (similar-sounding) matching in directory systems. Moreover, it is plausible to think about different matching rules for different areas and sets of names so that these can be adapted to local cultural requirements. Specifically, it might be possible to have a single form of a name in a directory, but to have great flexibility about what queries matched that name (and even have different variations in different areas). Of course, the more flexibilityonethat a system provides, the greater the possibility of real or imagined trademarkconflicts, but we would haveconflicts. But the opportunity would exist to design a directory structure that dealt with those issues in an intelligent way, while DNS constraintsarguablyalmost certainly make a general and equitable DNS-only solution impossible. o If a directory system is used to translate to DNS names, and then DNS names are looked up in the normal fashion, it may be possible to relax several of the constraints that have been traditional (and perhaps necessary) with the DNS. For example, reverse-mapping of addresses to directory names may not be a requirement even if mapping of addresses to DNS names continues to be, since the DNS name(s) would (continue to) uniquely identify the host. o Solutions to multilingual transcription problems that are common in "normal life" (e.g., two-sided business cards to be sure thata recipientrecipients trying to contact a person can access romanized spellings and numberswhenif the original languagemayis not be comprehensible tothat recipient)them) can be easily handled in a directory system by inserting both sets of entries. oOne can easily imagine aA directory system could be designed that would return, not a single name, but a set of names paired with network-locational information or other context-establishing attributes. This type of information might be of considerable use in resolving the "nearest (or best) server for a particular named resource" problems that are a significant concern for organizations hosting web and other sites that are accessed from a wide range of locations and subnets. o Names bound to countries and languages might help to manage trademark realities,whilewhile, as discussed in section 1.3 above, use of the DNS in trademark-significantareascontexts tends to require worldwide "flattening" of the trademark system. Many of these issues are a consequence of another property of the DNS: names must be unique across the Internet. The need to have a system of unique identifiers is fairly obvious (see[RFC2826]), but,[RFC2826]). However, if that requirementcanwere to be eliminated in a search or directory system thatlies on topwas visible to users instead of the DNS, many difficult problems -- of both an engineering and a policy nature --arewould be likely to vanish. 3.2 Some details and comments. Almost any internationalization(i18n)proposal for names that are in, or map into, the DNS will require changing DNS resolver API calls ("gethostbyname" orequivalentequivalent), or adding some pre-resolution preparationmechanism)mechanism, in almost all Internet applications -- whether to cause the API to take a different character set (no matter how it is then mapped into the bits used in the DNS or another system), to accept or return more arguments with qualifying or identifying information, or otherwise. Once applications must be opened to make such changes, it is a relatively small matter to switch from calling into the DNS to calling a directory service and then the DNS (in many situations, both actions could be accomplished in a single API call). A directory approach can be consistent both with "flat" models and multi-attribute ones. The DNS requires strict hierarchies, limiting its ability to handle differentiation among names by their properties. By contrast, modern directories can utilize independently-searched attributes and other structured schema to provide flexibilities not present in a strictly hierarchical system. There is a strong historical argument for a single directory structure (implying a need for mechanisms for registration, delegation, etc.).But itBut, unlike the DNS, a single structure is not a strict requirement, especially if in-depth case analysis and design work leads to the conclusion that reverse-mapping to directory names is not a requirement (see section 4).Conversely, thereIf a single structure is not needed, then, unlike the DNS, there would be no requirement for acaseglobal organization to authorize or delegate operation of portions of the structure. The "no single structure" concept could bemade for,taken further by moving away from simple "names" in favor of, e.g., faceted systems in which most of the facets use restricted vocabularies. Such systems could be designed to avoid the need for procedures to ensure uniqueness across, or even within, providers and databases of the faceted entities being searched for. (Cf. [DNS-Search] for further discussion.) While the discussion above includes very general comments about attributes, it appears that only a very small number of attributes would be needed. The list would almost certainly include country and language forIDN purposes andinternationalization purposes. It might require "charset" if we cannot agree on a character set andencoding.encoding, although there are strong arguments for simply using ISO 10646 coding in interchange. Trademark issues might motivate "commercial" and "non-commercial" (or other) attributes if they would be helpful in bypassing trademark problems. And applications to resource location might argue for a few other attributes (as outlined above). 4. Examining internationalization Much of the thinking underlying this documenthas beenwas driven by considerations of internationalizing the DNS or, more specifically, providing access to the functions of the DNS from languages and naming systems that cannot be accurately expressed inASCII (or inthe traditional DNS subset ofASCII).ASCII. Much ofthisthe relevant workhas beenwas done in the"IETF InternationalizedIETF's "Internationalized Access to Domain Names"(IDN)WorkingGroup.Group (IDN-WG), although this document also draws on extensive parallel discussions in other forums. This section contains an evaluation of whatthat group haswas learned as an "internationalized DNS" or "multilingual DNS" was explored andhow that learning might reasonably impact IETF's next steps. It assumes familiarity with the work and terminology ofsuggests future steps based on thatworking group.evaluation. When theIDN effort started,IDN-WG was initiated, it was obvious to several ofus madetheobservationparticipants thattheits first important taskfor the WGwas an undocumented one: to increase the understanding of the complexities of the problem sufficiently that naive solutions could be rejected and people could go to work on the harder problems.That hasThe IDN-WG clearlybeen accomplished. Withaccomplished that task. The beliefs that theexception of some continuing background noise,problems were simple, and in the corresponding simplisticapproaches, withapproaches and their promises ofone-yearquick and painless deployment,have justeffectively disappearedand almost no one thinksas theissues are simple any more.WG's efforts matured. But some of the lessons learnedare quite painful andshouldgive us pause,be taken as cautions by the wider community, both generally and in the context of the remarks above: 4.1. ASCII isn't just because of English The hostname rules chosen in the mid-70s weren't just "ASCII because English uses ASCII", although that was a starting point. We have discovered that almost every other script(and, I think,(and even ASCII if we permit the rest of the characters specified intothe ISO 646 International Reference Version) is more complex thanhostname- restricted-ASCII. In some cases, with a broader selection of scripts,hostname-restricted-ASCII [ASCII] (the "LDH" form, see the next section). And ASCII isn't sufficient to completely represent English -- there are several words in the language that are correctly spelled only with characters or diacritical marks that do not appear in ASCII. With a broader selection of scripts, in some examples, case mapping works from one case to the other, but is not reversible. In others, there are conventions about alternate ways to represent characters (in the language, not [only] in character coding) that work most of the time, but not always. And there are issues in coding, with Unicode/10646 [UNICODE, IS10646] providing different ways to represent the same character(I am using that word,("character", rather than "glyph", is used deliberately here). And, in still others, there are questions as to whether two glyphs "match", which may be a distance-function question, not one with a binary answer.We have triedThe IETF approach tosolve this set ofthese problemswith "stringprep"is to require pre-matching canonicalization (see the "stringprep" discussion below). The IETF has resisted thetemptationtemptations to either try to specify an entirely new coded character set, or to pick and choose Unicode/10646 characters on a per-character basis. While it may appear that a character set designed to meet Internet-specific needs would be very attractive, the IETFlackshas never had the expertise, resources, and representation from critically-important communities to actually take on that job. Perhaps more important, a new effort mightchoosehave chosen to make some of the many complex tradeoffs differently than the Unicode committeedid. That would probably producedid, producing a code with somewhat different characteristics. But there is no evidence that doing so would produce a code with fewer problems and side-effects.In all likelihood, weIt is much more likely that making tradeoffs differently would simplyend up withresult in a different set of (equally difficult) problems. 4.2. The "ASCII Encoding" approaches While the DNS can handle arbitrary binary strings without known internal problems (see [RFC2181]), some restrictions are imposed by the requirement that text be interpreted in a case-independent way ([RFC1034], [RFC1035]). More important, most internet applications assume the hostname-restricted (so-called "LDH", for "letter-digit- hyphen") syntax specified in the hosttable RFCs and as "prudent" in RFC 1035. Many conforming implementations of those applications may exhibit unpredicted behavior if those assumptions are not met. To avoid these potential problems,theIETF internationalization workof the IDN WGhas focused on "ASCII-Compatible Encodings" (ACE), which preserve the LDH conventions in the DNS itself (and for implementations of applications that have not been upgraded) while permitting newer implementations to recognize the special codings and map them into non-ASCII characters. These approaches are, however, not problem-free. Among other issues, they rely on what is ultimately a heuristic to determine whether a DNSlablellabel is to be considered as anIDNinternationlized name (i.e., encoded Unicode) or interpreted as an actual LDH name in its own right. And, while all determination of whether a particular query matches a stored object are traditionally made by DNS servers, the ACE systems, when combined with the complexities of international scripts and names, require that much of the matching work beabstractedseparated into a separate, client-side, canonicalization or "preparation"process.process before the DNS matching mechanisms are invoked [STRINGPREP]. 4.3. "Stringprep" and its complexitiesTheAs outlined above, the model forgetting around the variousavoiding problemsdescribed aboveassociated with putting non-ASCII names in the DNS and elsewherehasevolved intoa notionthe principle thatallstrings are to be placed into the DNS only after being passed through a string preparation function that eliminates or rejects spurious character codes, maps some characters onto others, performs some sequence canonicalization, and generally creates forms that can be accurately compared. The impact of this process onhost-table-subsethostname-restricted ASCII (i.e., "LDH") strings is trivial and essentially adds only overhead. For other scripts, the impact is, of necessity, quite significant.Defining that process was quite complex and, as of the time of this writing, some of the details remain controversial.Although the general notionwas simple, the devilunderlying stringprep isoften insimple, thedetails,many details are quite subtle andtherethe associated tradeoffs aremany details.complex. A design team worked on it for months, with considerable effort placed into clarifying and fine-tuning theprotocol.protocol and tables. Despite general agreement that the IETF would avoid getting into the business of defining character sets, character codings, and the associated conventions, the group several times considered and rejected special treatment of code positions to more nearly match the distinctionsofmade by Unicode withuser-perceptionsuser perceptions about similarities and differences between characters.The IETF-specific code position work has been removed fromBut there were intense temptations (and pressures) to incorporate language-specific or country-specific rules. Those temptations, even when resisted, were indicative of parts of thedraftsongoing controversy or ofboththe"stringprep" protocol which specifies conversions, normalizations, and mappings andbasic unsuitability of the"nameprep" one that profiles it forDNSuse. But the factfor fully internationalized names thatthe temptation has been strong may indicate problems we haven't solved to everyone's satisfaction.are visible, comprehensible, and predictable for end users. There have also been controversies about how far one should go in these processes of preparation and transformation and, ultimately, about the validity of various analogies.IsFor example, each of the following operations has been claimed to be similar to case-mapping in ASCII: o stripping of vowels in Arabic or Hebrewanalogous to case-mapping? Matchingo matching of "look-alike" charactersthat appear to be the same but that are assigned to different code points? Mappingsuch as upper-case Alpha in Greek and upper-case A in Roman-based alphabets o matching of Traditional and Simplified Chinesecharacters? Matchingcharacters that represent the same words, o matching of Serbo-Croatian words whether written in Roman-derived or Cyrilliccharacters? Atcharacters A decision to support any of these operations would have implications for other scripts or languages and would increase the overall complexity of the process. For example, unless language-specific information is somehow available, performing matching between Traditional and Simplified Chinese has impacts on Japanese and Korean uses of the sametime,"traditional" characters: e.g., it would not be appropriate to map Kanji into Simplified Chinese. Even were thenameprepIDN-WG's other workhasto have been abandoned completely or if it fails in the marketplace, the stringprep and nameprep work will continue to be extremely useful, both in identifying many of the problem code points and issues and in providing a reasonable set of basic rules.The problem isWhere problems remain, they are arguably not with nameprep, but with the DNS-imposed requirement thatnameprep,it, as with all other parts of the matching and comparison process, yield a binary "match or no match" answer, rather than, e.g., a value on a similarity scale that can be evaluated by the user or by user-driven heuristic functions. 4.4 The UCS Stability Problem ISO 10646 basically defines only code points, and not rules for using or comparing the characters. This is part of a long-standingissuetradition withstandards coming outthe work of what is now ISO/IECJTC1/SC2; internationalization issues, as contrasted with character-listing andJTC1/SC2: they have done code pointassignment issues,assignments and have typically treated the ways in which characters are used as beyond their scope. Consequently, they havejustnotbeen effectivelydealt effectively within that group. Thethe broader range of internationalization issues. By constrast, the Unicode Technical Committee (UTC) hasdefineddefined, in technical reports, some rules for canonicalization andcomparision, manycomparision. Many ofwhichthose rules and conventions have been factored into the "stringprep" and "nameprep" work, but it is not straightforward to make or definethose rulesthem in a fashion that is sufficiently precise and permanentfashion that the DNS can dependto be relied onthem.by the DNS. Perhaps more important,ourthe discussions of nameprepefforts havealso identified several areas in which the UTCrules do not adequately define thingsdefinitions are inadequate, at least without additional information, to make matching precise andunambiguous.unambiguous or in which there are still choices to be made by IETF or other bodies. For example, it is tempting to define some rules on the basis of membership in particular scripts, or for punctuation characters, but there isnotno precise definition of what characters belong to which script or which ones are, or are not, punctuation.ThatThe existence of these areas of vagueness raises two issues: whether trying to do precise matching at the character set level is actually possible (addressed below) and whether driving toward more precision could create issues that cause instability in the implementation and resolutionmodels.models for the DNS. The Unicode definition also evolves.VersionAt the time this document was written, version 3.2hashad recently appeared, with some added characters and functionality and a few minor incompatible code point changes. IETF has secured an agreement about constraints on future changes,bebut it remains to be seen how that agreement will workout.out in practice. However, some members of the community considerthissome of the changes between Unicode 3.0, 3.1, and 3.2 to be evidence of instability; instability that is betterdealt withhandled in a system that can be more flexible about handling ofcharacterscharacters, scripts, andscriptsancillary information than the DNS. In addition, because the systems implications of internationalization are considered out of scope in SC2, ISO/IEC JTC1 hasrecentlyassigned some ofthesethose issues toJTC1/SC22/WG20its SC22/WG20 (the InternationalizationWGworking group within the subcommittee that deals with programming languages, systems, and environments). WG20 has historicallybeen strong and dealsdealt with internationalization issues thoughtfully and indepth althoughdepth, but its status has several times been in doubtmore recently. Whether or not they get it right,in recent years. However, assignment of these matters to WG20significantlyincreases the risk ofaneventual ISOstandardinternationalization standards thatspecifiesspecify different behaviorfromthan the UTCspecification.specifications. 4.5. Audiences, end users, and the UI problem Part of what has "caused" the DNS i18n problem, as well as the DNS trademark problem and several others, is that we have stopped thinking about "identifiers forobjects",objects" -- which normal people are not expected tosee,see -- and started thinking about "names" -- strings that are expected not only to be readable, but to havelinguistically-sensiblelinguistically- sensible and culturally-dependent meaning to non-specialist users.The IDN WG,Within the IETF, the IDN-WG, andothers, have attempted to avoidsometimes other groups, avoided addressing the implications of that transition by taking"someone"outside our scope -- someone else's problem" approaches or by suggesting thatwe can adopt conventions to whichpeople will just becomeaccustomed. Iaccustomed to whatever conventions are adopted. The realities of user and vendor behavior suggest thatneitherthese approaches willwork acceptably:not serve the Internet community well in the long term: * If we want to make it a problem in a different part of the UI structure, we need to figure out where it goes in order to have proof of concept of our solution. Unlike those whose sole [business] model is the selling or registering of names, any solution IETF produces actually needs to work, in applications context, as seen by the end user. * The "they will get used to our conventions and adapt" principle is fine if we are writing rules for programming languages or an API. But the conventionsweunder discussion aretalking about aren'tnot part of asemi-mathematicalsemi- mathematical system, they are deeply ingrained in culture. No matter how oftenwe tellan English-speaking American is told that the Internet requires that the correct spelling of "colour" be used, he or she isn't going to be convinced. Getting a French-speaker in Lyon to use exactly the same lexical conventions as a French-speaker in Quebec in order to accomodate the decisions of the IETF or of a registrar or registry is just not likely. "Montreal" is either a misspelling or an anglicization (anglicisation?) ofMontr‹al (witha similar word with an acute accent mark over the"e"), but we are as unlikely to get"e" (i.e., using the Unicode character U+00E9 or one of its equivalents). But global agreement on a rule that will determine whether the two forms should match --and that won't astonish end users and speakers of one language or the other-- is as unlikely aswe are to getagreement on whether "misspelling" or "anglicization" is the greater travesty. More generally, it is not clear that the outcome of any conceivable nameprep-like process is going to be goodenough.enough for practical, user-level, use. In the use of human languages by humans,we havethere are many cases in which things that do not match are nonetheless interpreted as matching. The Norwegian/Danishglyph "ù" (lowercharacter that appears in U+00F8 (visually, a lower case 'o' overstruck with a forward slash) and the Germanglyph "º" (lowercharacter that appears in U+00F6 (visually, a lower case 'o' withumlaut)diaeresis (or umlaut)) are clearly different and no matching program should yield an "equal" comparison. But they are more similar to each other than either of them is to, e.g.,"e", and humans"e". Humans are able to mentally make the correction incontextcontext, and do so easily, and can be surprised if computers cannot do so. Worse, there is a Swedish character whose appearance is identical to the German o-umlaut, and which shares code point U+00F6, but that, if the languages are known and the sounds of the letters or meanings of words including the character are considered, actually should match the Norwegian/Danish use of U+00F8. This text uses examples in Roman scripts because it is being written in English and those examples are relatively easy to render. But one of the important lessons of theIDNdiscussionsof theabout domain name internationalization in recent years is that problemslike thissimilar to those described above exist in almost every language and script. Each one has its idiosyncracies, and each set of idiosyncracies is tied to common usage and cultural issues that are very familiar in the relevant group, and often deeplyembedded.held as cultural values. As long as a schoolchild in the US can get a bad grade on a spelling test for using a perfectly valid British spelling, or one in France or Germany can get a poor grade for leaving off a diacritical mark,or onethere are issues with the relevant language. Similarly, if children in Egypt or Israelwill findare taught that it is acceptable to write a word with or without vowels or stress marks,but,but that, iftheythose marks are included,thatthey must be the correct ones,there are issues with the relevant language. We are dealingor a user in Korea is potentially offended or astonished by out-of-order sequences of Jamo, systems based on character-at-a-time processing and simplistic matching, withculture,no contextual information, are not going to satisfy user needs. Users are demanding solutions that deal with language and culture. Systems of identifier symbol-stringsfor geeksthat serve specialists orcomputers, andcomputers are, at best, a solution to a rather different (and, at the time this document was written, somewhat ill-defined), problem. The recent efforts have made it ever more clear that, if we ignorethat distinction,the distinction between the user requirements and narrowly-defined identifiers, we are solving an insufficient problem. And, conversely, the approaches that have been proposed to approximate solutions to the user requirement may be far more complex than simple identifiers require. 4.6 Business cards and other natural uses of natural languagesWe have some establishedOver the last few centuries, local conventions have been established in various parts of the world for dealing with multilingual situations.Looking at themIt may behelpful. Ifhelpful to examine some of these. For example, if one visits a country where the language is different from ones own, business cards are often printed on two sides, one side in each language.ThisThe conventions are not completely consistent and the technique assumes that recipients will be tolerant. Translations of names or places are attempted in some situations and transliterations in others. Since it isusually a high-tolerance situation:widely understood that exact translations or transliterations are often not possible,andpeople typically smile at errors, appreciate the effort, and move on. The DNS situation differs fromthisthese practices in at least two ways: sincewe needa globalsolution,solution is required, the business card would need a number of sides approximating the number of languages in the world, which is probably impossible without violating laws of physics.AndMore important, the opportunities for tolerance don't exist: the DNS requires a exact match or the lookup fails. 4.7 ASCII encodings and the Roman keyboard assumption Part of the argument for ACE-based solutions is that they provide an escape for multilingual environments when applications have not been upgraded. When an older application encounters an ACE-based name, the assumption is that the (admittedly ugly)ASCIIASCII-coded string will be displayed and can be typed in. This argument is reasonable from the standpoint of mixtures of Latin-based alphabets, but may not be relevant if user-level systems and devices are involved that do not support the entry of Roman-based characters or which cannot conveniently render such characters. Such systems are few in the world today, but the number can reasonably be expected to rise as the Internet is increasingly used by populations whose primary concern is with local issues, local information, and local languages. It is, for example, fairly easy to imagine populations who use Arabic or Thai scripts and who do not have routine access to scripts or input devices based on Roman-derived alphabets. 4.8 A pessimistic summary of intra-DNS approaches for "multilingual names" It appears, from the cases above and others, that none of the intra-DNS-based solutions for "multilingual names" are workable. They rest on too many assumptions that do not appear to be feasible -- that people will adapt deeply-entrenched language habits to conventions laid down to make the lives of computers easy; that we can make "freeze it now, no need for changes in these areas" decisions about Unicode and nameprep; that ACE will smooth over applications problems, even in environments without the ability to key or renderroman-basedRoman-based glyphs (or where user experience is such thattheysuch glyphs cannot easily betold apart);distinguished from each other); that the Unicode Consortium will never decide to repair an error in a way that creates a risk of DNS incompatibility; that we can either deploy EDNS [RFC2671] or that long namesaren'tare not really important; that Japanese and Chinese computer users (and others) will either give up their local or IS 2022-based character coding solutions (for whichUTC addingaddition of a largefractionsfraction of a million new code points ot Unicode is almost certainly a necessary, but probably notsufficientsufficient, condition) or build leakproof and completely accurate boundary conversion mechanisms; that out of band or contextual information will always be sufficient for the "map glyph onto script" problem; and so on. In each case,we can getit is likely that about 80% or90%,90% of cases will work satisfactorily, but it isnot clearunlikely thatis going tosuch partial solutions will be good enough. For example, suppose someone can spell her name 90%correctly: is that likely to be considered adequate? 5. The Key Controversies 5.1. One directorycorrectly, ormanya company name is matched correctly 80% of the time but the other 20% of attempts identify a competitor: are either likely to be considered adequate? 5. Search-based Systems: The Key Controversies For many years, a common response to requirements to locate people or resources on the Internet has been to invoke the term "directory". While an in-depth analysis of the reasons would require a separate document, the history of failure of these invocations has given "directory" efforts a bad reputation. The effort proposed here is different from those predecessors for several reasons, perhaps the most important of which is that it focuses on a fairly-well-understood set of problems and needs, rather than on finding uses for a particular technology. 5.1. One directory or many As suggested in some of the text above, it is an open question as to whether the needs of the community would be best served by a single (even if functionally, and perhaps administratively, distributed) directory with universal applicability, a single directorybutthat supports locally-tailored search (and, most important, matching) functions, or multiple, locally-determined, directories. Each has its attractions. Any but the first would essentially prevent reverse-mapping (determination of the user-visible name of the host or resource from target information such as an address or DNS name). But reverse mapping has become less useful over the years --at least to users-- aswe have assignedmore and more namesperhave been associated with many hostaddress.address and as CIDR [CIDR] has proven problematic for mapping smaller address blocks to meaningful names. Locally-tailored search and mappings would permit national variations on interpretation of which strings matched which other ones, an arrangement that is especially important when different localities apply different rules to, e.g., matching of characters with and without diacriticals. But, of course, this implies that a URL may evaluate properly or not depending on either settings on a client machine or the network connectivity of theuser, whichuser. That is not, in general, a desirablesituation.situation, since it implies that users could not, in the general case, share URLs (or other host references) and that a particular user might not be able to carry references from one host or location to another. And, of course, completely separate directories would permit translation and transliteration functions to be embedded in the directory, giving much of the Internet a different appearance depending on which directory was chosen. The attractions of this are obvious, but, unless things were very carefully designed to preserve uniqueness and precise identities at the right points (which may or may not be possible), such a system would have many of the difficulties associated with multiple DNS roots. Finally, a system of separate directories and databases, if coupled with removal of the DNS-imposed requirement for unique names, would largely eliminate the need for a single worldwide authority to manage the top of the naming hierarchy. 5.2 Why not a proposal?As this document has gone through various preliminary drafts and reviews, theThe questionhas beenwas repeatedly raised with early drafts of this document as to whether it should contain a specific proposal: a specific directory mechanism, schema, and so on. It deliberately does not take that step. It has been difficult to get directory systems deployed in significant ways in the Internet infrastructure, partially because we have had too large a surplus of options. There are also some approaches that could be used to implement the general concepts described here, such as the Common Name Resolution Protocol [RFC2972], which some would not consider directory protocols at all. Consequently, it appeared better to present the general requirements, concepts and arguments here and leave the specifics to other sources, documents, and proposals. 6. Security Considerations The set of proposals implied by this document suggests an interesting set of security issues (i.e., nothing important is ever easy). A directory system used forthis purposelocating network resources would presumably need to be as carefully protected against unauthorized changes as the DNS itself. There also might be new opportunities for problems in an arrangement involving two or more[sub]layers;(sub)layers; but those problems are not more severe than atwo-stageDNS lookupinsequence that involved looking up one name, getting back information, and then doing additional lookups (as will often be theDNS.case with, e.g., NAPTR records [RFC 2915]. 7. References 7.1. Normative References None 7.2. Explanatory and Informative References [Albitz] Any of the editions of Albitz, P. and C. Liu, DNS and BIND, O'Reilly and Associates, 1992, 1997, 1998, 2001. [ASCII] American National Standards Institute (formerly United States of America Standards Institute), X3.4, 1968, "USA Code for Information Interchange". ANSI X3.4-1968 has been replaced by newer versions with slight modifications, but the 1968 version remains definitive for the Internet.[IS646]Some time after ASCII was first forumulated as a standard, ISO adopted international standard 646, which uses ASCII as a base. IS 646 actually contained two code tables: an "International Reference Version" (often referenced as ISO 646-IRV) which was essentially identical to the ASCII of the time, and a "Basic Version" (ISO 646-BV), which designates a number of character positions for national use. [Bush-Arch] Bush, R., T. Griffin, D. Meyer, "Some Internet Architectural Guidelines and Philosophy", work in progress (draft-ymbk-arch-guidelines-05.txt). [CIDR] See Fuller, V., T. Li, J. Yu, K. Varadhan "Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy" , RFC 1519, September 1993 and Eidnes, H., G. de Groot, P. Vixie, "Classless IN-ADDR.ARPA delegation", RFC 2317, March 1998. [COM-SIZE] Size information supplied by Verisign Global Registry Services (the zone administrator, or "registry operator", for COM, see [REGISTRAR], below) to ICANN, third quarter 2002. [DNS-Search] Klensin, J., "A Search-based access model for the DNS", work in progress (draft-klensin-dns-search-04.txt) [FINGER] Zimmerman, D., RFC 1288 "The Finger User Information Protocol". December 1991. The original version of this protocol was outlined in Harrenstien, K., RFC 742 "NAME/FINGER Protocol, Dec-30-1977. [IAB-OPES] Floyd, S, and L. Daigle, Eds, IAB, RFC 3238 "IAB Architectural and Policy Considerations for Open Pluggable Edge Services", January 2002. [IQUERY] Lawrence, D., "Obsoleting IQUERY", work in progress (draft-ietf-dnsext-obsolete-iquery-04.txt). [IS646] ISO/IEC 646:1991 Information technology -- ISO 7-bit coded character set for information interchange [IS10646] ISO/IEC 10646-1:2000 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane and ISO/IEC 10646-2:2001 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2: Supplementary Planes[UNICODE][MINC] TheUnicodeMultilingual Internet Names Consortium,The Unicode Standard, Version 3.0, Addison-Wesley: Reading, MA,http://www.minc.org/ has been an early advocate for the importance of expansion of DNS names to accomodate non-ASCII characters. Some of their specific proposals, while helping people to understand the problems better, were not compatible with the design of the DNS. [NAPTR] Mealling, M. and R. Daniel, "The Naming Authority Pointer (NAPTR) DNS Resource Record", RFC 2915, September 2000.Update[REGISTRAR] In an early stage of the process that created the Internet Corporation for Assigned Names and Numbers (ICANN), a "Green Paper" was released by the US Government. That paper introduced new terminology and some concepts not needed by traditional DNS operations. The term "registry" was applied toversion 3.1, 2001. Updatethe actual operator and database holder of a domain (typically at the top level, since the Green Paper was little concerned with anything else), while organizations that marketed names and made them available toversion 3.2, 2002. [DNS-Search] draft-klensin-dns-search-03.txt, work"registrants" were known as "registrars". In the classic DNS model, the function of "zone administrator" encompassed both registry and registrar roles, although that model did not anticipate a commercial market inprogress.names. [RFC625] RFC 625 On-line hostnames service. M.D. Kudlick, E.J. Feinler. Mar-07-1974. [RFC734] RFC 734 SUPDUP Protocol. M.R. Crispin. Oct-07-1977 [RFC811] RFC 811 Hostnames Server. K. Harrenstien, V. White, E.J. Feinler. Mar-01-1982. [RFC952] RFC 952 DoD Internet host table specification. K. Harrenstien, M.K. Stahl, E.J. Feinler. Oct-01-1985. [RFC882] RFC 882 Domain names: Concepts and facilities. P.V. Mockapetris. Nov-01-1983. (This document was superceded by RFC1034, cited below.) [RFC883] RFC 883 Domain names: Implementation specification. P.V. Mockapetris. Nov-01-1983. (This document was superceded by RFC1035, cited below.) [RFC1034] RFC 1034 Domain names, Concepts and facilities, P.V. Mockapetris. Nov 1987. [RFC1035] RFC 1035 Domain names - implementation and specification. P.V. Mockapetris. Nov-01-1987. [RFC1591] RFC 1591 Domain Name System Structure and Delegation. J. Postel. March 1994. [RFC2181] RFC 2181 Clarifications to the DNS Specification. R. Elz, R. Bush. July 1997. [RFC2825] RFC 2825 A Tangled Web: Issues of I18N, Domain Names, and the Other Internet protocols. IAB, L. Daigle, ed.. May 2000. [RFC2295] Holtman, K. and A. Mutz, "Transparent Content Negotiation in HTTP", RFC 2295, March 1998 [RFC2671] RFC 2671 Extension Mechanisms for DNS (EDNS0). P. Vixie. August 1999. [RFC2826] RFC 2826 IAB Technical Comment on the Unique DNS Root. IAB. May 2000. [RFC2972] RFC 2972 Context and Goals for Common Name Resolution. N. Popp, M. Mealling, L. Masinter, K. Sollins. October 2000. [Seng] Seng, J., et al., Eds., "Internationalized Domain Names: Registration and Administration Guideline for Chinese, Japanese, and Korean", work in progress (draft-jseng-idn-admin-01.txt, coming soon) [STRINGPREP] The canonicalization processes described are profiles on a set of tables and processing steps known collectively as "stringprep" and described in Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ('stringprep')", work in progress (draft-hoffman-stringprep-06.txt). The particular profile used for placing internationalized strings in the DNS is called "nameprep", described in Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names", work in progress (draft-ietf-idn-nameprep-11.txt). [TELNET] See Postel, J. and J.K. Reynolds, RFC 854 "Telnet Protocol Specification" and RFC 855 "Telnet Option Specifications", May-01-1983, and many RFCs describing specific options. [UNICODE] The Unicode Consortium, The Unicode Standard, Version 3.0, Addison-Wesley: Reading, MA, 2000. Update to version 3.1, 2001. Update to version 3.2, 2002. [WHOIS] Harrenstien, K, M.K. Stahl, E.J. Feinler, RFC 0954 "NICNAME/WHOIS", Oct-01-1985. [WHOIS-UPDATE] See, for example, Gargano, J. and K. Weiss, RFC 1834 "Whois and Network Information Lookup Service, Whois++", August 1995; Weider, C., J. Fullton, S. Spero, RFC 1913 "Architecture of the Whois++ Index Service", February 1996; Williamson, S., M. Kosters, D. Blacka, J. Singh, K. Zeilstra. RFC 2167 "Referral Whois (RWhois) Protocol V1.5", June 1997; and Daigle, L. and P. Faltstrom, RFC 2957 "The application/whoispp-query Content-Type" and RFC 2958 "The application/whoispp-response Content-type", October 2000. [X29] International Telecommuncations Union, "Recommendation X.29: Procedures for the exchange of control information and user data between a Packet Assembly/Disassembly (PAD) facility and a packet mode DTE or another PAD", December 1997. 8. Acknowledgements Many people have contributed to versions of this document or the thinking that went into it. The author would particularly like to thank Harald Alvestrand, Rob Austein, Bob Braden, Matt Crawford, Leslie Daigle, Patrik Faltstrom, Eric A. Hall, Ted Hardie, and Paul Hoffman for making specific suggestions and/or challenging the assumptions and presentation of earlier versions and suggesting ways to improve them.10.9. Author's address John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 klensin+srch@jck.com ExpiresDecember 2002April 2003 ----