MIME

From Free net encyclopedia

Multipurpose Internet Mail Extensions (MIME) is an Internet Standard for the format of e-mail. Virtually all human written Internet e-mail and a fairly large proportion of automated e-mail is transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME standards that it is sometimes called SMTP/MIME e-mail.

Contents

Introduction

The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters (see also 8BITMIME). This effectively limits Internet e-mail to messages which, when transmitted, include only the characters sufficient for writing a small number of languages, primarily English. Other languages based on the Latin alphabet typically include diacritics not supported in 7-bit ASCII, meaning text in these languages cannot be correctly represented in basic e-mail.

MIME defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English using character encodings other than ASCII as well as 8-bit binary content such as files containing images, sounds, movies, and computer programs. MIME is also a fundamental component of communication protocols such as HTTP, which requires that data be transmitted in the context of e-mail-like messages, even though the data may not actually be e-mail.

Mapping messages into and out of MIME format is typically done automatically by an e-mail client or by mail servers when sending or receiving Internet (SMTP/MIME) e-mail.

The basic format of Internet e-mail is defined in RFC 2822, which is an updated version of RFC 822. These standards specify the familiar formats for text e-mail headers and body and rules pertaining to commonly used header fields such as "To:", "Subject:", "From:", and "Date:". MIME defines a collection of e-mail headers for specifying additional attributes of a message including content type, and defines a set of transfer encodings which can be used to represent 8-bit binary data using characters from the 7-bit ASCII character set. MIME also specifies rules for encoding non-ASCII characters in e-mail message headers, such as "Subject:", allowing these header fields to contain non-English characters.

MIME is extensible. Its definition includes a method to register new content types and other MIME attribute values.

One of the explicit goals of the MIME definition was to not require changes to pre-existing e-mail servers, and to allow plain text e-mail to function in both directions with pre-existing clients. This goal is achieved by allowing all MIME message attributes to be optional, with default values making a non-MIME message likely to be interpreted correctly by a MIME-capable client. In addition, a simple MIME text message is likely to be interpreted correctly by a non-MIME client although it has e-mail headers the non-MIME client won't know how to interpret. Similarly, if the quoted printable transfer encoding (see below) is used, the ascii parts of the message will be intelligible to users with non-mime clients.

MIME headers

MIME-Version

The presence of this header indicates the message is MIME-formatted. The value is typically "1.0" so this header appears as

  MIME-Version: 1.0

Content-Type

This header indicates the type and subtype of the message content, for example

  Content-type: text/plain

The combination of type and subtype is generally called a MIME type, although in modern applications, Internet media type is the favored term, indicating its applicability outside of MIME messages. A large number of file formats have registered MIME types. Any text type has an additional charset parameter that can be included to indicate the character encoding. A very large number of character encodings have registered MIME charset names.

Although originally defined for MIME e-mail, the content-type header and MIME type registry is reused in other Internet protocols such as HTTP and SIP. The MIME type registry is managed by IANA.

Through the use of the multipart type, MIME allows messages to have parts arranged in a tree structure where the leaf nodes are any non-multipart content type and the non-leaf nodes are any of a variety of multipart types. This mechanism supports:

  • simple text messages using text/plain (the default value for "Content-type:")
  • text plus attachments (multipart/mixed with a text/plain part and other non-text parts). A MIME message including an attached file generally indicates the file's original name with the "Content-disposition:" header, so the type of file is indicated both by the MIME content-type and the (usually OS-specific) filename extension.
  • reply with original attached (multipart/mixed with a text/plain part and the original message as a message/rfc822 part)
  • alternative content, such as a message sent in both plain text and another format such as HTML (multipart/alternative with the same content in text/plain and text/html forms)
  • many other message constructs

Content-Transfer-Encoding

MIME (RFC 2045) defines a set of methods for representing binary data in ASCII text format. The content-transfer-encoding: MIME header indicates the method that has been used. The RFC and the IANA's list of transfer encodings define the following values, which are not case sensitive:

  • Suitable for use with normal SMTP:
    • 7bit - up to 998 octets per line of the code range [1..127] with CR and LF (codes 10 and 13) only allowed to appear as part of a CRLF line ending. This is the default value.
    • quoted-printable - used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient and mostly human readable when used for text data consisting primarily of US-ASCII characters but also containing byte values outside that range.
    • base64 - used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Has a fixed overhead and is intended for non text data and text that is not ASCII heavy.
  • Suitable for use with SMTP servers that support the 8BITMIME transport SMTP extension:
    • 8bit - up to 998 octets per line CR and LF (codes 10 and 13) only allowed to appear as part of a CRLF line ending.
  • Not suitable for use with SMTP:
    • binary - any sequence of octets. Not usable with SMTP mail.

There is no encoding defined which is explicitly designed for sending arbitrary binary data through 8BITMIME transports, thus base64 or quoted-printable (with their associated inefficiency) must sometimes still be used.

Encoded-Word

Since RFC2822 message header names and values are always ASCII characters, values that contain non-ASCII data must use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding (the "charset") and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.

The form is: "=?charset?encoding?encoded text?=".

  • charset is often utf-8, but may be any character set registered with IANA. iso-2022-jp is common in Japan. iso-8859-1 and more recently iso-8859-15 are common in Western Europe.
  • encoding can be either "Q" denoting quoted-printable encoding, or "B" denoting base64 encoding.
  • encoded text is the quoted-printable or base64-encoded text.

For example,

Subject: =?utf-8?Q?=C2=A1Hola,=20se=C3=B1or!?=

is interpreted as "Subject: ¡Hola, señor!".

The encoded-word format is not used for the names of the headers (for example Subject). These header names are always in English in the raw message. When viewing a message with a non-English e-mail client, the header names are translated by the client.

Multipart Messages

A MIME multipart message contains a boundary in the "Content-type:" header; this boundary, which must not occur in any of the parts, is placed between the parts, and at the beginning and end of the body of the message, as follows:

Content-type: multipart/mixed; boundary="frontier"
MIME-version: 1.0

This is a multi-part message in MIME format.
--frontier
Content-type: text/plain

This is the body of the message.
--frontier
Content-type: application/octet-stream
Content-transfer-encoding: base64
  
gajwO4+n2Fy4FV3V7zD9awd7uG8/TITP/vIocxXnnf/5mjgQjcipBUL1b3uyLwAVtBLOP4nV
LdIAhSzlZnyLAF8na0n7g6OSeej7aqIl3NIXCfxDsPsY6NQjSvV77j4hWEjlF/aglS6ghfju
FgRr+OX8QZMI1OmR4rUJUS7xgoknalqj3HJvaOpeb3CFlNI9VGZYz6H6zuQBOWZzNB8glwpC
--frontier--

Each part consists of its own content header (zero or more Content- header fields) and a body. Multipart content can be nested. A multipart never has a global charset or Content-Transfer-Encoding, these details are determined by the Content- header fields of the individual parts. There are several different types of multipart messages:

Notes:

  • Before the first boundary is an area thats ignored by MIME compliant clients. This area is generally used to put a message to users of old non-MIME clients.
  • It is up to the sending mail client to choose a boundary string that doesn't clash with the body text. Typically this is done by inserting a large random string.

Mixed

Multipart/mixed is used for sending files with different "Content-type" headers inline (or as attachments). If sending pictures or other easily readable files, most mail clients will display them inline (unless otherwise specified with the "Content-disposition" header). Otherwise it will offer them as attachments.

Digest

Multipart/mixed and multipart/digest must be supported for minimal MIME conformance as specified in RFC 2049. The default content type for a mixed part is text/plain, for a digest it's message/rfc822. Multipart/digest is a simple way to forward one or more messages.

Alternative

The multipart/alternative message is supposed to have the same (or similar) content in each part has a different "Content-type" header. The formats are ordered from worst representation to best representation (this order was chosen so that plain text would end up first, making life easier for users of non-mime clients). Mail clients are supposed to choose the last part that they are capable of displaying though some may not follow this and give a particular content type priority. Typically this type is used with a Text/plain part first to support older clients followed by a text/html part to provide a formatted message for modern clients.

Spammers took advantage of this and filled a text/plain part of a multipart/alternative message with words that make it sound like a legitimate e-mail. This was particularly good at fooling Bayesian filtering spam filters.

Related

A multipart/related has as its first part its main content. All following items should have the mime-header "Content ID:" followed by some unique title. Often times a URL is used as the Content ID. The main part can then reference those items as inline. A common technique is to have the main part an HTML document, and use image tags to reference images stored in the latter parts.

Report

Multipart/report is a message type that contains data formatted for a mail server to read. It is split between a text/plain (or some other content/type easily readable) and a message/delivery-status, which contains the data formatted for the mail server to read.

Signed

A multipart/signed message has two parts, a body part and a signature part. The whole of the body part, including mime headers, is used to create the signature part. There are many signature types are possible, like application/pgp-signature and application/x-pkcs7-signature.

Encrypted

A multipart/encrypted message has two parts. The first part has control information that is needed to decrypt the application/octet-stream second part.

See also

References

RFC 1847 
Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
RFC 2045 
MIME Part One: Format of Internet Message Bodies.
RFC 2046 
MIME Part Two: Media Types. N. Freed, Nathaniel Borenstein. November 1996.
RFC 2047 
MIME Part Three: Message Header Extensions for Non-ASCII Text. Keith Moore. November 1996.
RFC 4288 
MIME Part Four: Media Type Specifications and Registration Procedures.
RFC 4289 
MIME Part Four: Registration Procedures. N. Freed, J. Klensin. December 2005.
RFC 2049 
MIME Part Five: Conformance Criteria and Examples. N. Freed, N. Borenstein. November 1996.
RFC 2231 
MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. N. Freed, K. Moore. November 1997.
RFC 2387 
The MIME Multipart/Related Content-type

External links

ca:MIME de:Multipurpose Internet Mail Extensions es:MIME eo:MIME fr:Multipurpose Internet Mail Extensions it:Multipurpose Internet Mail Extensions he:MIME lt:MIME nl:Multipurpose Internet Mail Extensions ja:Multipurpose Internet Mail Extensions pl:MIME pt:MIME ru:MIME sl:MIME sr:MIME fi:MIME vi:MIME zh:MIME