|
MIME is an abbreviation for Multipurpose Internet Mail
Extensions and describes how messages
are sent on the Internet. Whether its e-Mail or the World Wide Web, MIME
is used to keep things in order: Like its human counterpart: MIME is
silent, conveys information, and often provides nothing more than
entertainment value.
Today's computer technology thinks in 8-bit bytes. When information is
transmitted, its usually done so in an 8-bit fashion. There are, however,
instances when a transport medium will only handle 7-bits.
Furthermore, when it comes to e-Mail, there must be some consideration
for systems that are based upon IBM's EBCDIC (Extended Binary
Coded-Decimal Interchange Code), rather than the ASCII (American Standard
Code for Information Interchange) code that we are most familiar with.
MIME makes sure that messages meet these criteria.
MIME is best thought of as nothing more than a simple message that
describes the contents that follow. In the World Wide Web, the first
thing a server will do is send out a MIME header. Using the WWW as an
example, the MIME header will look exactly like this: "Content-Type:
text/html" The server is telling the client that what follows is a
text message, comprised of the HTML language. The browser then knows to
display the message in accordance with HTML. The server might have sent a
MIME header of: "Content-Type: text/plain", in which case the
browser would render a fixed-font display of the message or document. If
the server sent a "Content-Type: image/jpeg", the browser would
expect to render a JPEG (Joint Photographics Experts Group) image.
In the case of e-Mail, the same MIME header as discussed above is sent,
but usually another one describing the type of encoding used is also sent.
This is the "Content-Transfer-Encoding" header. Often, you see
headers of: "Content-Transfer-Encoding: 7BIT",
"Content-Transfer-Encoding: 8BIT",
"Content-Transfer-Encoding: quoted-printable", or
"Content-Transfer-Encoding: base64". Another common MIME header
that is seen in e-Mail (in fact, it is mandatory) is the
"MIME-Version" header, normally: "MIME-Version: 1.0".
There are many ways to encode a document for transmission, but MIME
standardizes two such mechanisms in RFC 2045. There are known as
"Quoted-Printable" and "Base64". Other possible
encoding types are UUENCODE, the Macintosh BinHex 4.0 (RFC 1740), and the
Base85 encoding specified in Level 2 Postscript. These schemes, however,
may have compatibility problems with 7-Bit gateways and EBCDIC systems, so
the use of these encoding schemes is not recommended.
As you are probably aware, an 8-bit byte yields 256 possible variations
(called characters). Of these 256 characters, 128 are
"printable" characters of the US-ASCII character set. In this
scheme, the encoding is such that data is unlikely to be modified by the
transport facility.
Lines are transmitted in lengths of no more than 76
characters. Carriage
Returns in the data are translated to a "soft line-break"
character (=). Decimal characters 128-255 must be "Quoted";
that is, they are represented in hexadecimal form. For
example: the code "=FF" would represent character number 255.
Generally speaking, any character may be quoted.
The Quoted-Printable scheme is efficient and possesses a high degree of
readability even if the encoded version is viewed. Unfortunately, this
type of encoding may have trouble passing through EBCDIC-based e-Mail
gateways. There is a height degree of EBCDIC compatibility that can be
achieved by quoting the: !"#@[|]^`{\} characters.
The most common encoding scheme used is known as Base64, representing its
use of 64 printable characters in its alphabet. Actually, there's 65
characters because the = sign is used for "padding". Here's the
character set used:
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
Basically, groups of three 8-bit bytes (24 bits) are encoded into four
6-bit groupings (24 bits). Again, no more than 76 characters are allowed
per line. Padding is accomplished through the use of the = character.
The . CR LF and - characters are not used. This is particularly useful
for SMTP Mail transport.
Base64 is fully compliant with EBCDIC systems, as well as 7-bit
transport
mediums. The downside is that Base64 encoded files occupy (consistently)
33% more space than the original binary source. For example, a source
file that is 300 KBytes in size would be 400 KBytes after Base64 coding.
The "Content-Type" header indicates a type/subtype of the data
to follow. Early on, anything went. For example, the MIME type of
"application/msword" was issued to define Microsoft Word
documents. Now, formal vendor applications will have the letter
vnd prepended to the MIME subtype; hence MIME types of
"application/vnd.ms-excel" or
"application/vnd.lotus-1-2-3".
Some of the MIME subtypes may start with a prefix of x, like
"audio/x-pm-realaudio" or "image/x-MS-bmp". The X
means that it is experimental.
MIME types are indicated by the Originator of an e-Mail. Unfortunately,
most PC packages have no facility for determining what the proper MIME
type should be for any particular file attachment. For that reason, most
PC e-Mail packages apply the MIME type of
"application/octet-stream" to most e-Mail attachments.
The Recipient of the e-Mail receives the message and will decode the
attachment per the "Content-Transfer-Encoding" instruction,
usually Base64. Many e-Mail packages, such as Thunderbird, MS Internet
Mail,
or Eudora will then hyperlink the attachment. As such, it is up to the
Recipient's Operating System to interpret the file type (usually
by extension) and properly open the file with the appropriate
application. In
theory, the e-Mail program should know what file application to run based
upon the MIME type, not filename extension.
In the UNIX world, there is a file in each user's home directory,
called
".mailcap", that is used by UNIX multimedia e-Mail programs to
determine the correct MIME type and application.
A WWW Server is remarkably dumb. For the most part, all it does is send
out files all day long, and occasionally run a Common Gateway
interface (CGI) program or some such.
It doesn't have the "smarts" to look at a file and figure out
what it is. Instead, it relies on a file called mime.types
to determine a MIME type based upon the filename's extension.
Keeping track of MIME types can be a fascinating thing, often indicating
future things to come. But if you're really into MIME, you will want to
begin reading the following materials:
-
RFC 822
This RFC defines that standard format of e-Mail
messages on the Internet.
-
RFC 2045
This RFC is part 1 of a two-part document on MIME. (One might
assign a
MIME type of "multipart/mixed" to it). This
document mainly
describes the Quoted-Printable and Base64 encoding schemes.
-
RFC 2046
This RFC is part 2 of the two-part document on MIME. RFC 2046
describes
the MIME type (Media Type) syntax in great detail.
|