Operating System Information --> Unix (WAM and Glue) --> General Usage Topics --> Filetypes: Text and Binary -->

Filetypes: Text and Binary

There are two fundamentally different types of files in the computer world. They differ in both the kind of content they hold, and the way they are internally organized. (And, since life is never simple, and there are ALWAYS exceptions to every rule, you will also find certain files that mix the two types.)

Because of these differences, it is important during file transfers to choose the appropriate type of transfer for the file. If you aren't sure of the filetype, extensions commonly used on filenames will often give you a clue.

Here's a summary of the content and internal organization of TEXT and BINARY files, with notes about transferring them and what common filename extensions refer to each.

Contents

  1. Text Files
  2. Binary Files
  3. Mixed (text and binary) Files
  4. How the number of bits in each bytes affects them

1. Text Files
(back to contents list)

Content Plain old printable characters -- the 7-bit ASCII character set. Among the common types are:
  • Readable documents -- the "normal" kind of text file.
  • Web pages -- written in HTML, which is plain text
  • Encoded files -- binary files (see below) which have been turned into plain text to protect them from destruction when they are being sent through e-mail or file transfers that can't handle the special needs of binary files.
NOT text files -- Word processor files, except when saved as some form of "text" or "ascii" file, are binary, not text files, because they contain special non-ascii codes for formatting.
Organization Text files are organized into lines. By definition, a line must have an end. It's important to know that these line ends, which consist of the ASCII characters carriage-return (CR, a control-M) and/or line-feed (LF, a control-J), are different on PCs and Macs and Unix:
PC
Mac
Unix
IBM mainframe
CR + LF
CR
LF
Transfers Choosing an ascii or text type of transfer for text files is important because of the two main functions performed by such a transfer:
  1. handling the differences in line-ends on different systems, and
  2. performing character-set translations when necessary, such as on
    1. IBM mainframes, which use the "EBCDIC" character set instead of the ASCII set used by all other computers, or
    2. Apple II's, whose "ASCII" characters have their 8th bit turned on.

Binary style transfers leave everything untouched, which is disastrous when character translations are necessary -- you get gobbledygook. And if the line-ends on the two systems are different, the results of a binary transfer range from funny to a royal pain. Looking at the table of line-ends (under "Organization", above), you can see why binary transfers would result in

  • PC files sent to Macs having a control-J (linefeed) in front of every line
  • PC files on Unix having a control-M (carriage return) at the end of every line
  • Unix files on PC's or Macs looking like one large mass of text sprinkled with control-J's where line-ends should be
  • any other files on the IBM mainframe having an extra blank line between each line.

A common example of this can be observed in the HTML source for a web page downloaded by a web browser from a different kind of system from your own, and saved with a "Save As" or related command. HTML files are text files, but Web browsers do binary downloads. Browsers can ignore CR's and LF's, and display the file just fine, but anything else looking at the file (especially text editors) generally can't.

The only time a binary transfer is safe for text files is when the sending and receiving machines use the same system -- and even that isn't always true (e.g., the IBM mainframe).

Filename
Extensions
Readable text
WWW home pages
PC "uuencoded" files
Mac "binhex" encoded files
.txt, .asc
.htm, .html
.uu
.hqx

2. Binary Files
(back to contents list)

Content Binary files include
  • graphics, sound and video files
  • computer programs
  • archives (files that contain one or more files that have been grouped together for convenience and compressed to save space)
  • formatted documents produced by word processors, spread-sheets, databases, etc. (As noted above, word processor documents not saved explicitly as "text" or "ascii" are binary files).
Unlike text files, whose characters all tend to be made up of 7 bits, binary files are made up of bytes which depend on having all 8 bits intact. Sending such files through e-mail or text-style transfers, which tend to respect only 7 bits, tends to destroy them.
Organization Because binary files don't consist of readable text, they tend not to have a conceptual needs for "lines", and thus tend to have neither lines nor line-ends. They are just one long stream of 8-bit bytes -- hence the "octent stream" term frequently encountered in Web browsing.

Mac note -- Macs have a special kind of binary file called MacBinary. Such a file includes not only the plain "binary" (a.k.a "raw binary") information of the file itself, but also any associated "resource" information (e.g., fonts). Computer programs and formatted documents (mentioned above) tend to be MacBinary. Most other files (graphics, sound, video), though they may include information such as the creator-application for the file (which helps define the icon used for the file), are essentially just plain "binary" files.

Transfers Binary transfers leave the file untouched. As noted in the "Text" section above, they do no character translation or line-end adjustment. The only adaptation they will do is handling the exchange of physical file information (size, creation/modification dates, etc.)

MacBinary -- most Mac file transfer programs have a variant type of transfer for MacBinary files. Use this when transferring files between Macs, or between a Mac and an intermediary machine in what will ultimately be a Mac-to-Mac transfer. Transferring Mac graphic/sound/video files should be done as "raw binary", as should formatted documents intended for use with the same application on a PC.

Filename
Extensions
PC Executable programs
Compressed archives
Graphics, sound, etc.
Formatted documents
.exe, .com
.zip
.gif, .jpg, .tif, .au
.doc, .wpd, .pdf, .xls, .xlw, .mdb . . .
Mac Self-extracting archives
Compressed archives
Graphics, sound, etc.
Formatted documents
.sea
.sit, .cpt
.gif, .jpeg, .tiff, .au
.doc, .pdf, .xls, .xlw . . .

3. Mixed (Text and Binary) Files
(back to contents list)

Content The only type of file we have encountered which is a mixture of binary bytes and text characters is the special output of the SAS and SPSS statistical packages, known as the SAS dataset and the SPSS system file. Variable names and labels are kept in text form, while numbers tend to be stored as binary values.
Organization Since these files consist of binary numbers and text words or phrases with no line structure, they have the same "endless stream" quality as binary files, though on the UMDD IBM mainframe, as with other binary files, they are stored in arbitrary fixed-length records.
Transfers The binary nature of much of the content of these files requires that a binary file transfer be used for these. Since there is no line structure, the failure to use a text-type transfer doesn't create problems as long as no character-translation is needed. So transfer between PCs and Macs and Unix can be done in simple binary fashion.

The only problem in transferring these files occurs with transfers between the IBM mainframe and other systems. The IBM mainframe's EBCDIC character set is incompatible with the ASCII character set used for text on all other computer systems, and thus requires translation. So,

  • a binary transfer, which leaves the text portion of the file untranslated, will make a mess out of it
  • a text transfer will translate not only the text part of the file, but also the binary part (since it cannot distinguish between the two parts), making a mess of the binary part.
This catch-22 can only be resolved by using the statistical package involved to generate its own portable file -- SAS's export file or SPSS's transport file -- and using a binary transfer to send it. These use ASCII for the text portion, and can be reconverted to the original system file once they are on the desired system.
Filename
Extensions
There are no standard extensions for these files.

4. Bits and Bytes -- text and binary
(back to contents list)

Characters (in text files) and bytes (in binary files) refer to the same thing -- a set of eight binary bits:

binary byte * * * * * * * *
ASCII character 0 * * * * * * *

where 0 is a zero bit and * can be either a one or a zero.

Since only 7 bits are required to create all 128 characters in the ASCII set, programs that deal with transferring characters (such as e-mail or text-type file transfers) very often don't pay attention to the left-hand 8th bit, or, worse, may use it for other purposes, such as the error-checking technique known as parity, or deliberately clear it.

Since a binary file absolutely requires that the eighth bit be trusted -- after all, it might be part of an instruction code in a program, or represent a pixel in a picture, or signal the start of a bold-face passage in a word processor file -- using a text-type transfer for a binary file virtually guarantees the file's destruction.

How do I:
How are we doing? Comments on this page?
Office of Information Technology
Office of Information Technology Help Desk Web Site University of Maryland Web Site Office of Information Technology Web Site