draft-ietf-822ext-iso2022jp-02.txt  -->   rfc1468.txt

view Side-By-Side changes

Date: Tue, 09 Apr 2002 00:36:13 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Mon, 18 Jan 1993 20:05:10 GMT
ETag: "3de031-2b9c-2b5b0d76"
Accept-Ranges: bytes
Content-Length: 11164
Connection: close
Content-Type: text/plain





Network Working Group                                          Jun                                           J. Murai
Internet Draft                                              Mark
Request for Comments: 1468                               Keio University
                                                              M. Crispin
                                                       Erik
                                                       Panda Programming
                                                         E. van der Poel
                                                       1st December 1992
                                                               June 1993


           Japanese Character Encoding for Internet Message Bodies Messages

Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid memo provides information for a maximum of six
   months. the Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. community.  It is does
   not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.

   This draft document will be submitted to the RFC editor as specify an
   informational document.  This document will expire before 1st June
   1993. Internet standard.  Distribution of this memo is
   unlimited.  Please send comments
   to ietf-822@dimacs.rutgers.edu.

Introduction

   This document describes the encoding used in electronic mail [RFC822]
   and network news [RFC1036] message bodies messages in several Japanese networks. It
   was first specified by and used in JUNET [JUNET]. The encoding is now
   also widely used in Japanese IP communities.

   This document names the

   The name given to this encoding is "ISO-2022-JP", which is intended
   to be used in the "charset" parameter field of MIME [MIME] messages. The
   use of ISO-2022-JP in RFC 1342 [RFC1342] headers is expected to be
   the subject of a separate document.

   This document only describes the encoding of plain text. The encoding
   of other subtypes of text, such as richtext, is not discussed here.





Murai et al              Expires 1st June 1993                  [Page 1]

Internet Draft                                 Updated 1st December 1992 (see
   [MIME1] and [MIME2]).

Description

   The message body text starts in ASCII [ASCII], and switches to Japanese characters
   through an escape sequence. For example, the escape sequence ESC $ B
   (three bytes, hexadecimal values: 1B 24 42) indicates that the bytes
   following this escape sequence are Japanese characters, which are
   encoded in two bytes each.  To switch back to ASCII, the escape
   sequence ESC ( B is used.

   The following table gives the escape sequences and the character sets
   used in ISO-2022-JP messages. The ISOREG number is the registration
   number in ISO's registry [ISOREG].

       Esc Seq    Character Set                  ISOREG

       ESC ( B    ASCII                             6
       ESC ( J    JIS X 0201-1976 ("Roman" set)    14
       ESC $ @    JIS X 0208-1978                  42
       ESC $ B    JIS X 0208-1983                  87

   Note that JIS X 0208-1983 0208 was called JIS C 6226-1983 6226 until the name was changed in



Murai, Crispin & van der Poel                                   [Page 1]

RFC 1468   Japanese Character Encoding for Internet Messages   June 1993


   on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201.

   The "Roman" character set of JIS X 0201 [JISX0201] is identical to
   ASCII except for backslash (\) () and tilde (~). The backslash is
   replaced by the Yen sign, and the tilde is replaced by macron
   (overline). overline. This
   set is Japan's national variant of ISO 646 [ISO646].

   The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
   Katakana and some other symbols and characters. Each character takes
   up two bytes.

   For further details about the JIS Japanese national character set
   standards, refer to [JISX0201] and [JISX0208].  For further
   information about the escape sequences, see [ISO2022] and [ISOREG].

   If there are JIS X 0208 characters on a line, there must be a switch
   to ASCII or to the "Roman" set of JIS X 0201 before the end of the
   line (i.e. (i.e., before the CRLF). This means that the next line starts in
   the character set that was switched to before the end of the previous
   line.

   Also, the message body text must end with CRLF, and there must be a switch
   to ASCII before the last CRLF (if there are any non-ASCII characters in the message body). ASCII.

   Other restrictions are given in the Formal Syntax below.



Murai et al              Expires 1st June 1993                  [Page 2]

Internet Draft                                 Updated 1st December 1992

Formal Syntax

   The notational conventions used here are identical to those used in
   RFC 822 [RFC822].

   The * (asterisk) convention is as follows:

       l*m something

   meaning at least l and at most m somethings, with l and m taking
   default values of 0 and infinity, respectively.


   line


   message             = *text *1( headers 1*( CRLF *single-byte-char *segment
                         single-byte-seq *text *single-byte-char ) CRLF
                                           ; see also [MIME1] "body-part"
                                           ; note: must end in ASCII

   headers             = <see [RFC822] "fields" and [MIME1] "body-part">

   segment             = single-byte-segment / double-byte-segment

   single-byte-segment = single-byte-seq 1*text 1*single-byte-char



Murai, Crispin & van der Poel                                   [Page 2]

RFC 1468   Japanese Character Encoding for Internet Messages   June 1993


   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

   single-byte-seq     = ESC "(" ( "B" / "J" )

   double-byte-seq     = ESC "$" ( "@" / "B" )

   CRLF                = CR LF

                                                    ; ( Octal, Decimal.)

   ESC                 = <ISO 2022 ESC, escape>     ; (    33,      27.)

   SI                  = <ISO 2022 SI, shift-in>    ; (    17,      15.)

   SO                  = <ISO 2022 SO, shift-out>   ; (    16,      14.)

   CR                  = <ASCII CR, carriage return>; (    15,      13.)

   LF                  = <ASCII LF, linefeed>       ; (    12,      10.)

   one-of-94           = <any char in 94-char set> one of 94 values>     ; (41-176, 33.-126.)

   CHAR

   7BIT                = <any ASCII character> 7-bit value>          ; ( 0-177,  0.-127.)

   text

   single-byte-char    = <any CHAR, 7BIT, including bare CR & bare LF, but NOT
                          including CRLF, and not including ESC, SI, SO>

MIME Considerations

   The name given to the JUNET character encoding is "ISO-2022-JP". This
   name is intended to be used in MIME messages as follows:

       Content-Type: text/plain; charset=iso-2022-jp




Murai et al              Expires 1st June 1993                  [Page 3]

Internet Draft                                 Updated 1st December 1992

   The ISO-2022-JP encoding is already in 7-bit form, so it is not
   necessary to use a Content-Transfer-Encoding header. It should be
   noted that applying the Base64 or Quoted-Printable encoding will
   render the message unreadable in current JUNET software.

   ISO-2022-JP may also be used in MIME Part 2 headers.  The "B"
   encoding should be used with ISO-2022-JP text.

Background Information

   The JUNET encoding was described in the JUNET User's Guide [JUNET]
   (JUNET Riyou No Tebiki Dai Ippan).

   The encoding is based on the particular usage of ISO 2022 announced



Murai, Crispin & van der Poel                                   [Page 3]

RFC 1468   Japanese Character Encoding for Internet Messages   June 1993


   by 4/1 (see [ISO2022] for details). However, the escape sequence
   normally used for this announcement is not included in ISO-2022-JP
   messages.

   The so-called half-width (hankaku) Katakana, that is, the Kana set of JIS X 0201, are 0201 is not used in ISO-2022-JP messages.

   In the past, some systems erroneously used the escape sequence ESC (
   H in JUNET messages. This escape sequence is officially registered
   for a Swedish character set [ISOREG], and should not be used in ISO-
   2022-JP messages.

   Some systems do not distinguish between ESC ( B and ESC ( J or
   between ESC $ @ and ESC $ B for display. However, when relaying a
   message to another system, the escape sequences must not be altered
   in any way.

   The human user (not implementor) should try to keep lines within 80
   display columns, or, preferably, within 75 (or so) columns, to allow
   insertion of ">" at the beginning of each line in excerpts. Each JIS
   X 0208 character takes up two columns, and the escape sequences do
   not take up any columns. The implementor is reminded that JIS X 0208
   characters take up two bytes and should not be split in the middle to
   break lines for displaying, etc.

   The JIS X 0208 standard was revised in 1990, to add two characters at
   the end of the table. Although ISO 2022 specifies special additional
   escape sequences to indicate the use of revised character sets, it is
   suggested here not to make use of this special escape sequence in
   ISO-2022-JP text, even if the two characters added to JIS X 0208 in
   1990 are used.








Murai et al              Expires 1st June 1993                  [Page 4]

Internet Draft                                 Updated 1st December 1992

   For further information about Japanese character encodings such as PC
   codes, FTP locations of implementations, etc, see "Electronic
   Handling of Japanese Text" [JPN.INF].

References

   [ASCII] American National Standards Institute, "Coded character set
   -- 7-bit American national standard code for information
   interchange", ANSI X3.4-1968 X3.4-1986.

   [ISO646] International Organization for Standardization (ISO),
   "Information processing technology -- ISO 7-bit coded character set for
   information interchange", International Standard, Ref. No. ISO 646-
   1983 (E) ISO/IEC
   646:1991.

   [ISO2022] International Organization for Standardization (ISO),
   "Information processing -- ISO 7-bit and 8-bit coded character sets



Murai, Crispin & van der Poel                                   [Page 4]

RFC 1468   Japanese Character Encoding for Internet Messages   June 1993


   -- Code extension techniques", International Standard, Ref. No. ISO
   2022-1986 (E) (E).

   [ISOREG] International Organization for Standardization (ISO),
   "International Register of Coded Character Sets To Be Used With
   Escape Sequences" Sequences".

   [JISX0201] Japanese Standards Association, "Code for Information
   Interchange", JIS X 0201-1976 0201-1976.

   [JISX0208] Japanese Standards Association, "Code of the Japanese
   graphic character set for information interchange", JIS X 0208-1978,
   -1983 and -1990 -1990.

   [JPN.INF] Ken R. Lunde <lunde@adobe.com>, "Electronic Handling of
   Japanese Text", March 1992,
   msi.umn.edu(128.101.24.1):pub/lunde/japan[123].inf

   [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
   Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
   User's Guide (First Edition)"), February 1988

   [MIME] Nathaniel 1988.

   [MIME1] Borenstein N., and Ned N. Freed, "MIME (Multipurpose
   Internet Mail Extensions): Mechanisms for Specifying and
   Describing the Format of Internet Message Bodies", Proposed (Internet) standard, RFC 1341,
   Bellcore, Innosoft, June 1992, rfc1341 1992.

   [MIME2] Moore, K., "Representation of Non-ASCII Text in Internet
   Message Headers", RFC 1342, University of Tennessee, June 1992.

   [RFC822] David H. Crocker, D., "Standard for the Format of ARPA Internet
   Text Messages", Internet standard, STD 11, RFC 822, UDEL, August 1982, rfc822 1982.

   [RFC1036] M. Horton M., and R. Adams, "Standard for Interchange of USENET
   Messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic
   Studies, December 1987, rfc1036

   [RFC1342] Keith Moore, "Representation of Non-ASCII Text in Internet
   Message Headers", Proposed (Internet) standard, June 1992, rfc1342


Security Considerations




Murai et al              Expires 1st June 1993                  [Page 5]

Internet Draft                                 Updated 1st December 1992


   Security considerations are not discussed in this memo. 1987.

Acknowledgements

   Many people assisted in drafting this document. The authors wish to
   thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
   Handa.

Security Considerations

   Security issues are not discussed in this memo.





Murai, Crispin & van der Poel                                   [Page 5]

RFC 1468   Japanese Character Encoding for Internet Messages   June 1993


Authors' Addresses

   Jun Murai
   Keio University
   5322 Endo, Fujisawa
   Kanagawa 252 Japan

   Fax: +81 (466) 49-1101 466 49 1101
   EMail: jun@wide.ad.jp


   Mark Crispin
   Panda Programming
   6158 Lariat Loop NE
   Bainbridge Island, WA 98110-2098
   USA

   Phone: +1 (206) 842-2385 206 842 2385
   EMail: MRC@PANDA.COM


   Erik M. van der Poel
   A-105 Park Avenue
   4-4-10 Ohta, Kisarazu
   Chiba 292 Japan

   Phone: +81 (438) 22-5836 438 22 5836
   Fax:   +81 (438) 22-5837 438 22 5837
   EMail: erik@poel.juice.or.jp








Murai et al              Expires 1st June 1993






















Murai, Crispin & van der Poel                                   [Page 6]

----