draft-kunze-dc-00.txt  -->   draft-kunze-dc-01.txt

view Side-By-Side changes


          Dublin Core Metadata for Simple Resource Description Discovery


1. Status of this Document

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups.  Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet- Drafts Internet-Drafts as reference
material or to cite them other than as ``work in progress.''

To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.  Please send comments
to weibel@oclc.org, or to the discussion list meta2@mrrl.lut.ac.uk.


2. Introduction

Finding relevant information on the World Wide Web has become
increasingly problematic in proportion to the explosive growth of available
networked resources.  Current Web indexing evolved rapidly to fill the
demand for resource discovery tools, but that indexing, while enormously useful,
is a poor substitute for richer varieties of resource description.

An invitational workshop held in March of 1995 brought together
librarians, digital library researchers, and text-markup specialists
to address the problem of resource description discovery for networked resources.
This activity evolved into a series of related workshops and ancillary
activities that have become known collectively as the Dublin Core Metadata
Workshop Series.  This report summarizes the state of this effort.

The initial motivation for the first workshop was simply to do
something that would improve the prospects for resource discovery on
the Web.  Specifically, the goal was to identify a simple set of common
description elements that authors (or content managers) could embed in
their documents to promote their discovery -- something like a catalog
card for a network resource.  The term "Dublin Core" applies to
this simple core of descriptive elements.

3. Simple Resource Description  

The goals that motivate the Dublin Core effort are:

    - Simplicity of creation and maintenance
    - Commonly understood semantics
    - International scope and applicability
    - Extensibility
    - Interoperability among collections and indexing systems

These requirements work at cross purposes to some degree, but all are
desirable goals.  The ensuing two years  Much of discussion have the effort of the Workshop Series has been to some
degree an exercise in
directed at minimizing the tensions among them.

The development these goals.

One of formal ontologies the primary deliverables of this effort is currently a prominent line set of
research in digital library communities, aimed at identifying elements
that are judged by the
structure collective participants of knowledge in a given discipline, and linking these
structures into a larger whole.  In contrast, one might think of this
workshop series as an attempt workshops
to identify an "emergent ontology",
that is, a consensus among experienced practitioners across many
disciplines about be the basic core elements of for cross-disciplinary resource discovery.


4. Description
The term ``Dublin Core'' applies to this core of descriptive elements.

Early experience with Dublin Core Elements  

The following comprises deployment has made clear the reference definition need
to support additional qualification of the elements for some applications.
Thus, Dublin Core
Metadata Element set as of December, 1996.  The elements may be expressed in simple unqualified ways
that minimal discovery and retrieval tools can use, or their names
are not expected they may be
expressed with additional structure to support semantics-sharpening
qualifiers that minimal tools can safely ignore but that more complex
tools can employ to change substantively increase discovery precision.

The broad agreements about syntax and semantics that have emerged from this list, though
the
application workshop series will be expressed in a series of some five Informational
RFCs, of them are currently experimental and subject
to interpretation.  Further, it which this document is expected that practice the first.  These RFCs (currently they
are Internet-Drafts) will evolve
to include qualifiers comprise the following documents.

2.1. Dublin Core Metadata for certain of Simple Resource Discovery

An introduction to the elements.  The reference Dublin Core and a description of the elements resides at

    http://purl.org/metadata/dublin_core_elements

Note that elements have a descriptive name intended
semantics of the 15-element Dublin Core element set without qualifiers.
This is the present document.

2.2. Encoding Dublin Core Metadata in HTML  

A formal description of the convention for embedding unqualified Dublin
Core metadata in HTML.

2.3. Qualified Dublin Core Metadata for Simple Resource Discovery

The principles of element qualification and the semantics of Dublin Core
metadata when expressed with a recommended qualifier set known as the
Canberra Qualifiers.

2.4. Encoding Qualified Dublin Core Metadata in HTML 

A formal description of the convention for embedding qualified Dublin
Core metadata in HTML.

2.5. Dublin Core on the Web:  RDF Compliance and DC Extensions

A formal description for encoding Dublin Core metadata with qualifiers
in HTML compliant metadata, and how to extend the core element set.


3. Description of Dublin Core Elements  

The following is the reference definition of the Dublin Core Metadata
Element Set.  It is expected that practice will evolve to include
qualifiers for certain of the elements.  The reference description of
the elements resides at [1]:

	http://purl.org/metadata/dublin_core_elements

Note that elements have a descriptive name intended to convey a common
semantic understanding of the element.  To promote global interoperability,
a number of the element descriptions suggest a controlled vocabulary for
the respective element values.  It is assumed that other controlled
vocabularies will be developed for interoperability within certain local
domains.

In addition, the element descriptions below, a formal, single-
word formal single-word label is
specified to make the syntactic specification of elements simpler in
for encoding schemes.  Each element is optional and repeatable.
Element descriptions follow.


4.1.

3.1. Title				Label: TITLE

     The name given to the resource by the CREATOR or PUBLISHER.

4.2.

3.2. Author or Creator			Label: CREATOR

     The person(s) person or organization(s) organization primarily responsible for creating
     the intellectual content of the resource.  For example, authors
     in the case of written documents, artists, photographers,
     or illustrators in the case of visual resources.

4.3.

3.3. Subject and Keywords		Label: SUBJECT

     The topic of the resource, or resource.  Typically, subject will be expressed
     as keywords or phrases that describe the subject or content of the
     resource.  The intent of the
     specification of this element is to promote the use of controlled vocabularies and keywords.  This element might well include
     scheme-qualified formal
     classification data (for example, Library of
     Congress Classification Numbers or Dewey Decimal numbers) or
     scheme-qualified controlled vocabularies (such as MEdical Subject
     Headings or Art and Architecture Thesaurus descriptors) as well.
   
4.4. schemas is encouraged.
   
3.4. Description			Label: DESCRIPTION

     A textual description of the content of the resource, including
     abstracts in the case of document-like objects or content
     descriptions in the case of visual resources.  Future metadata
     collections might well include computational content description
     (spectral analysis of a visual resource, for example) that may not
     be embeddable in current network systems.  In such a case this
     field might contain a link to such a description rather than the
     description itself.

4.5. 
     
3.5. Publisher				Label: PUBLISHER

     The entity responsible for making the resource available in its
     present form, such as a publisher, publishing house, a university department,
     or a corporate entity.   The intent of specifying this field is to
     identify the entity that provides access to the resource.
     
4.6.   
     
3.6. Other Contributor 			Label: CONTRIBUTOR

     Person(s)

     A person or organization(s) in addition to those organization not specified in the a CREATOR element who have
     has made significant intellectual contributions to the resource
     but whose contribution is secondary to the individuals any person or entities specifed organization
     specified in the a CREATOR element (for example, editors,
     transcribers, illustrators, editor, transcriber,
     and convenors).

4.7. illustrator).
     
3.7. Date				Label: DATE

     The date the resource was made available in its present form.  The
     recommended  
     Recommended best practice is an 8 digit number in the form YYYYMMDD
     YYYY-MM-DD as defined by ANSI X3.30-1985. in [2], a profile of ISO 8601.  In this
     scheme, the date  element for
     the day this is written would be 19961203, or December 3, 1996. 1994-11-05 corresponds to November 5,
     1994.  Many other schema are possible, but if used, they should
     be identified in an unambiguous manner.
   
4.8.

3.8. Resource Type 			Label: TYPE

     The category of the resource, such as home page, novel, poem,
     working paper, technical report, essay, dictionary.  It is expected that
     RESOURCE  For the sake
     of interoperability, TYPE will should be chosen selected from an enumerated
     list that is under development in the workshop series at the time
     of types.

4.9. publication of this draft.
 
3.9. Format  				Label: FORMAT

     The data representation format of the resource, such as text/html, ASCII,
     Postscript file,  executable application, or JPEG image.  The intent
     of specifying this element is to provide information necessary to
     allow people or machines used to make decisions about the usability of identify the encoded data (what hardware and software
     and possibly hardware that might be required needed to display or execute it, for example).  As with RESOURCE TYPE, operate
     the resource.  For the sake of interoperability, FORMAT
     will should
     be assigned selected from an enumerated lists such as registered Internet
     Media Types (MIME types).  In principal, formats can include
     physical media such as books, serials, or other non-electronic media. 

      
4.10. list that is under development
     in the workshop series at the time of publication of this draft.

3.10. Resource Identifier 		Label: IDENTIFIER

     String or number used to uniquely identify the resource.  Examples
     for networked resources include URLs and URNs (when implemented).
     Other globally-unique identifiers,such identifiers, such as International Standard
     Book Numbers (ISBN) or other formal names would are also be candidates
     for this element.

4.11.

3.11. Source				Label: SOURCE

     The work, either print

     A string or electronic, number used to uniquely identify the work from which
     this resource
     is was derived, if applicable.  For example, an html encoding of a
     Shakespearean sonnet might identify the paper PDF
     version of the
     sonnet novel ``Gone with the Wind'' might have a SOURCE
     element containing an ISBN number for the physical book from which
     the electronic PDF version was transcribed.

4.12. derived.

3.12. Language 				Label: LANGUAGE

     Language(s) of the intellectual content of the resource.  Where
     practical, the content of this field should coincide with the
     NISO Z39.53 three character codes for written languages. 

4.13. 

3.13. Relation (experimental)		Label: RELATION 

     Relationship 

     The relationship of this resource to other resources.  The intent of specifying
     this element is to provide a means to express relationships among
     resources that have formal relationships to others, but exist as
     discrete resources themselves.  For example, images in a document,
     chapters in a book, or items in a collection.  A formal  Formal specification
     of RELATION is currently under development.  Users and developers
     should understand that use of this element should
     be is currently considered
     to be experimental.

4.14.

3.14. Coverage (experimental)		Label: COVERAGE

     The spatial locations and and/or temporal durations characteristic characteristics of the resource.
     Formal specification of COVERAGE is currently under development.
     Users and developers should understand that use of this element should be
     is currently considered to be experimental.

4.15.

3.15. Rights Management (experimental)	Label: RIGHTS
   
     The content of this element is intended to be a

     A link (a URL or
     other suitable URI as appropriate) to a copyright notice, to a rights-management statement, or perhaps
     to a server service that would provide such information in a dynamic way.  The intent about terms of
     specifying this field is to allow providers a means access
     to associate
     terms and conditions or copyright statements with a resource or
     collection the resource.  Formal specification of resources.   No assumptions RIGHTS is currently under
     development.  Users and developers should be made by users
     if such a field understand that use of
     this element is empty or not present.


5. currently considered to be experimental.


4. Security Considerations

The Dublin Core element set poses no risk to computers and networks.
It poses minimal risk to searchers who obtain incorrect or private
information due to careless mapping from rich data descriptions to
simple Dublin Core scheme.  No other security concerns are likely
to be affected raised by the element description consensus documented here.


6.


5. References

   [1] Weibel, S., Miller, E., "Dublin Dublin Core Metadata Element Set: Reference Description", Description,
       http://purl.org/metadata/dublin_core_elements
       
   [2] ISO 8601 Profile for the Dublin Core,
       http://purl.org/metadata/dublin_core_date_formats


7. Authors' Addresses

Stuart L. Weibel
OCLC Online Computer Library Center, Inc.
Office of Research
6565 Frantz Rd.
Dublin, Ohio, 43017, USA
Email: weibel@oclc.org
Voice: +1 614-764-6081
Fax:   +1 614-764-2344

John A. Kunze
Center for Knowledge Management
University of California, San Francisco
530 Parnassus Ave, Box 0840
San Francisco, CA  94143-0840, USA
Email: jak@ckm.ucsf.edu
Voice: +1 415-502-6660
Fax:   +1 415-476-4653

Carl Lagoze
Digital Library Research Group
Department of Computer Science
Cornell University
Ithaca, NY  14853, USA
Email: lagoze@cs.cornell.edu
Voice: +1-607-255-6046
Fax:   +1-607-255-4428


APPENDIX:  A Proposed Convention for Embedding Metadata in HTML.

The following proposed convention reflects the consensus of a break-out
group at the W3C Distributed Indexing and Searching Workshop, May 28-29,
1996, concerning tagging of meta information in HTML.  This break out
group included representatives of the Dublin Core/Warwick Framework
Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort,
Verity Software, and the W3C.

                        Attendees (alphabetically):

 Nick Arnett    narnett@verity.com       Mic Bowman    bowman@transarc.com
 Eliot
 Christian      echristi@usgs.gov        Dan Connolly  conolly@w3.org
 Martijn Koster m.koster@webcrawler.com  John Kunze    jak@ckm.ucsf.edu

 Carl Lagoze    lagoze@cs.cornell.edu    Michael       fuzzy@lycos.com
                                         Mauldin
 Christian
 Mogensen       christian@vivid.com      Wick Nichols  wickn@microsoft.com

 Timothy Niesen tmn@swl.msd.ray.com      Stuart        weibel@oclc.org
                                         Weibel
 Andrew Wood    woody@dstc.edu.au


1. The Problem

The problem is to identify a simple means of embedding metadata within HTML
documents without requiring additional tags or changes to browser software,
and without unnecessarily compromising current practices for robot
collection of data.

While metadata is intended for display in some situations, it is judged
undesireable for such embedded metadata to display on browser screens as
a side effect of displaying a document. Therefore, any solution requires
encoding information in attribute tags rather than as container element
content.

The goal was to agree on a simple convention for encoding structured
metadata information of a variety of types (which may or may not be
registered with a central registry analogous to the Mime Type registry).
It was judged that a registry may be a necessary feature of the metadata
infrastructure as alternative schema are elaborated, but that deployment
in the short-term could go forward without such a registry, especially
in light of the proposed use of the LINK tag to link descriptions to a
standard schema description as described below.


2. A Proposed Convention

The solution agreed upon is to encode schema elements in META tags, one
element per META tag, and as many META tags as are necessary.  Grouping of
schema elements is achieved by a prefix schema identifier associated with
each schema element.  The convention agreed upon is as follows:

     <META NAME    = "schema_identifier.element_name"
           CONTENT = "string data">

Thus, a partial Dublin Core citation might be encoded as follows:

     <META NAME    = "DC.title"
           CONTENT = "HTML 2.0 Specification">

     <META NAME    = "DC.creator"
           CONTENT = "Berners-Lee, Tim">

     <META NAME    = "DC.creator"
           CONTENT = "Connolly, Dan">

     <META NAME    = "DC.date"
           CONTENT = "19951126">

     <META NAME    = "DC.identifier"
           CONTENT = "ftp://ds.internic.net/rfc/rfc1866.txt">

And a collection of Microsoft Word metadata might be encoded as follows:

     <META NAME    = "MSW.title"
           CONTENT = "W3C Indexing Work Shop Report">

     <META NAME    = "MSW.creator"
           CONTENT = "Nichols, Wick">

     <META NAME    = "MSW.date"
           CONTENT = "19960630">


3. Linkage to the Reference Description of a Metadata Schema

It is judged useful to provide a means for linking to the reference
definition of the metadata schema (or schemata) used in a document.  Doing
so serves as a primitive registration mechanism for metadata schemata, and
lays the foundation for a more formal, machine-readable linkage mechanism
in the future. The proposed convention for doing so is as follows:

     <LINK REL = SCHEMA.schema_identifier HREF="URL">

Thus, the reference description of one metadata scheme, the Dublin Core
Metadata Element Set, would be referenced in the LINK HREF as follows:

     <LINK REL = SCHEMA.dc HREF = "http://purl.org/metadata/dublin_core">

The description of an element could be accessed by the construction of URL
using the # token to identify a named anchor. Thus, the derived URL below
actually links to the title element in the reference description of the
Dublin Core Metadata Element Set.

     http://purl.org/metadata/dublin_core_elements#title

This URL would correspond to the human-readable description of the title
element within the document by a NAME anchor such as:

     <A NAME = "title"> Title </A>

         The name of the work provided by the author or publisher.

While use of the LINK tag is not required for a given schema, when used,
it will make possible retrieval of the reference definition of a given
schema element, and will therefore reduce the need for a formal metadata
scheme registry. Multiple LINK tags can be used so that elements derived
from multiple schemas can be referenced within a single document.


4. Consistency of Description Schemas

To promote consistency among resource description schemas, it is suggested
that the semantics for metadata elements be related to existing well-known
schemas whenever feasible.


----