Abstract: The UUDeview library is a highly portable set of functions that provide facilities for decoding uuencoded, xxencoded, Base64 and BinHex-Encoded files as well as for encoding binary files into all of these representations except BinHex. This document describes how the features of encoding and decoding can be integrated into your own applications.
The information is intended for developers only, and is not required reading material for end users. It is assumed that the reader is familiar with the general issue of encoding and decoding and has some experience with the ``C'' programming language.
This document describes version 0.5, patchlevel 18 of the library.
Figure 1 displays how the library can be integrated into an application. The library does not assume any capabilities of the operating system or application language, and can thus be used in almost any environment. The few necessary interfaces must be provided by the application, which does usually know a great deal more about the target system.(3174,2874)(814,-2548) (3376,-886)( 0, 1)525 (3376,-361)(-1, 0)1950 (1426,-361)( 0,-1)1800 (1426,-2161)( 1, 0)1950 (3376,-2161)( 0, 1)525 (3376,-1636)(-1, 0)1725 (1651,-1636)( 0, 1)750 (1651,-886)( 1, 0)1650 (3301,-886)( 1, 0) 75 (1351,-2161)(-1, 0)525 (826,-2161)( 0, 1)2475 (826,314)( 1, 0)3150 (3976,314)( 0,-1)2475 (3976,-2161)(-1, 0)525 (3451,-2161)( 0, 1)1875 (3451,-286)(-1, 0)2100 (1351,-286)( 0,-1)1875 (1351,-2161)( 0, 1) 0 (901,-2536)(3000,300) (1726,-1561)(1650,600) (2551,-1186)(0,0)[b]1214.4ptUUDeview (2401,-1861)(0,0)[b]1012.0ptApplication OS (2401,-2086)(0,0)[b]1012.0ptServices Interface (2401,-586)(0,0)[b]1012.0ptApplication (2401,-811)(0,0)[b]1012.0ptLanguage Interface (2401,-61)(0,0)[b]1214.4ptApplication (2401,-2461)(0,0)[b]1214.4ptOperating System (2551,-1456)(0,0)[b]1214.4ptDecoding Library
Figure 1: Integration of the Library
int UUEncodeMulti | (FILE *outfile, FILE *infile, |
char *infname, int encoding, | |
char *outfname, char *mimetype, | |
int filemode) |
int UUEncodePartial | (FILE *outfile, FILE *infile, |
char *infname, int encoding, | |
char *outfname, char *mimetype, | |
int filemode, int partno, | |
long linperfile) |
int UUEncodeToStream | (FILE *outfile, FILE *infile, |
char *infname, int encoding, | |
char *outfname, int filemode) |
int UUEncodeToFile | (FILE *infile, char *infname, |
int encoding, char *outfname, | |
char *diskname, long linperfile) |
int UUE_PrepSingle | (FILE *outfile, FILE *infile, |
char *infname, int encoding, | |
char *outfname, int filemode, | |
char *destination, char *from, | |
char *subject, int isemail) |
int UUE_PrepPartial | (FILE *outfile, FILE *infile, |
char *infname, int encoding, | |
char *outfname, int filemode, | |
int partno, long linperfile, | |
long filesize, | |
char *destination, char *from, | |
char *subject, int isemail) |
The minimal decoding program is displayed in Figure 2. Only four code lines are needed for the implementation. <stdlib.h> defines NULL, <uudeview.h> declares the decoding library functions, and <config.h>, the library's configuration file, is needed for some configuration details9.
The second version, printed in figure 3, addresses all of the above problems. The code size more than tripled, but that's largely because of the error messages.
#include <stdio.h> #include <string.h> #include <errno.h> #include <stdlib.h> #include <config.h> #include <uudeview.h> int main (int argc, char *argv[]) { uulist *item; int i, res; UUInitialize (); for (i=1; i<argc; i++) if ((res = UULoadFile (argv[i], NULL, 0)) != UURET_OK) fprintf (stderr, "could not load %s: %s\n", argv[i], (res==UURET_IOERR) ? strerror (UUGetOption (UUOPT_ERRNO, NULL, NULL, 0)) : UUstrerror(res)); for (i=0; (item=UUGetFileListItem(i)) != NULL; i++) { if ((item->state & UUFILE_OK) == 0) continue; if ((res = UUDecodeFile (item, NULL)) != UURET_OK) { fprintf (stderr, "error decoding %s: %s\n", (item->filename==NULL)?"oops":item->filename, (res==UURET_IOERR) ? strerror (UUGetOption (UUOPT_ERRNO, NULL, NULL, 0)) : UUstrerror(res)); } else { printf ("successfully decoded '%s'\n", item->filename); } } UUCleanUp (); return 0; }
Figure 3: The ``Trivial Decoder'', Version 2
This last section adds a simple filename filter (targeting at a DOS system with 8.3 filenames) and a simple message callback, which just dumps messages to the console. Figure 4 lists the changes with respect to version 2 (for the full listing, refer to the source file on disk).
... right after the #includes
#include <fptools.h> void MsgCallBack (void *opaque, char *msg, int level) { fprintf (stderr, "%s\n", msg); } char * FNameFilter (void *opaque, char *fname) { static char dname[13]; char *p1, *p2; int i; if ((p1 = _FP_strrchr (fname, '/')) == NULL) p1 = fname; if ((p2 = _FP_strrchr (p1, '\\')) == NULL) p2 = p1; for (i=0, p1=dname; *p2 && *p2!='.' && i<8; i++) *p1++ = (*p2==' ')?(p2++,'_'):*p2++; while (*p2 && *p2 != '.') p2++; if ((*p1++ = *p2++) == '.') for (i=0; *p2 && *p2!='.' && i<3; i++) *p1++ = (*p2==' ')?(p2++,'_'):*p2++; *p1 = '\0'; return dname; }... within main() after UUInitialize
UUSetMsgCallback (NULL, MsgCallBack); UUSetFNameFilter (NULL, FNameFilter);... replacing the main loop's else
else { printf ("successfully decoded '%s' as '%s'\n", item->filename, UUFNameFilter (item->filename)); }
Figure 4: Changes for Version 3
Three bytes are 24 bits, and they are divided into 4 sections of 6 bits each. Table 1 describes in detail how the input bits are copied into the output data bits. 6 bits can have values from 0 to 63; each of the ``three in four'' encodings now uses a character table with 64 entries, where each possible value is mapped to a specific character.
Input Octet 1 Input Bit 7 6 5 4 3 2 1 0 Output Data #1 5 4 3 2 1 0 Output Data #2 5 4 Input Octet 2 Input Bit 7 6 5 4 3 2 1 0 Output Data #2 3 2 1 0 Output Data #3 5 4 3 2 Input Octet 3 Input Bit 7 6 5 4 3 2 1 0 Output Data #3 1 0 Output Data #4 5 4 3 2 1 0
Table 1: Bit mapping for Three-in-Four encoding
This is a test file for illustrating the various encoding methods. Let's make this text longer than 57 bytes to wrap lines with Base64 data, too. Greetings, Frank Pilhofer
begin mode filename
... encoded data ...
``empty'' line
end
Each line of uuencoded data is prefixed, in the first column, with the encoded number of encoded octets on this line. The most common prefix that you'll see is `M'. By looking up `M' in table 2, we see that it represents the number 45. Therefore, this prefix means that the line contains 45 octets (which are encoded into 60 (45/3*4) plain-text characters).
begin 600 test.txt M5&AI<R!I<R!A('1E<W0@9FEL92!F;W(@:6QL=7-T<F%T:6YG('1H92!V87)I M;W5S"F5N8V]D:6YG(&UE=&AO9',N($QE="=S(&UA:V4@=&AI<R!T97AT(&QO M;F=E<B!T:&%N"C4W(&)Y=&5S('1O('=R87`@;&EN97,@=VET:"!"87-E-C0@ E9&%T82P@=&]O+@I'<F5E=&EN9W,L($9R86YK(%!I;&AO9F5R"@`` ` end
Xxencoding is absolutely identical to uuencoding with the difference of using a different mapping of data values into printable characters (table 3). Instead of `M', a normal-sized xxencoded line is prefixed by `h' (note that `h' encodes 45, just as `M' in uuencoding). The empty data line at the end consists of a single `+' character. Our sample file looks like the following:
begin 600 test.txt hJ4VdQm-dQm-V65FZQrEUNaZgNG-aPr6UOKlgRLBoQa3oOKtb65FcNG-qML7d hPrJn0aJiMqxYOKtb64pZR4VjN5Ai62lZR0Rn64pVOqIUR4VdQm-oNLVo64lj hPaRZQW-oO43i0XIr647tR4Jn65Fj65RmML+UP4ZiNLAURqZoO0-0MLBZBXEU ZN43oMGkUR4xj9Ud5QaJZR4ZiNrAg62NmMKtf63-dP4VjNaJm0U++ + end
The general concept of three-in-four encoding is the same as with the previous two types, just another new character table to represent the values needs to be introduced (table 4). Note that this table differs from the xxencoding table only in a single character (`/' versus `-'). If a line of encoding does not feature either character, it may be difficult to tell which encoding is used on the line.
VGhpcyBpcyBhIHRlc3QgZmlsZSBmb3IgaWxsdXN0cmF0aW5nIHRoZSB2YXJpb3VzCmVuY29kaW5n IG1ldGhvZHMuIExldCdzIG1ha2UgdGhpcyB0ZXh0IGxvbmdlciB0aGFuCjU3IGJ5dGVzIHRvIHdy YXAgbGluZXMgd2l0aCBCYXNlNjQgZGF0YSwgdG9vLgpHcmVldGluZ3MsIEZyYW5rIFBpbGhvZmVy Cg==For a more elaborate documentation of Base64 encoding and details of the MIME framework, I suggest reading [RFC1521].
A BinHex file is a stream of characters, beginning and ending with a colon `:'; intermediate line breaks are to be ignored by the decoder. Each line but the last should be exactly 64 characters in length. The last line may be shorter, and in a special case can also be 65 characters long. The trailing colon must not stand alone, so if the input data ends on an output line boundary, the colon is appended to this line as 65th character. Thus a BinHex begins with a colon in the first column and ends with a colon not in the first column.
BinHex is another three-in-four encoding, and not surprisingly, another different character table is used (table 5). The documentation does not explicitely mention what is supposed to happen if the original input data does not have a multiple of three octets. But from reading between the lines, it looks like ``unnecessary'' characters (those that would result in equal signs in Base64 encoding) are not printed.(This file must be converted with BinHex 4.0)
The encoded characters decode into a RLE-compressed bytestream, which must be handled in the next step (of course, decoding and decompressing are usually handled at the same time). A Run Length Encoding simply replaces multiple subsequent occurrences of one octet are replaced by the character, a special marker, and the repetition count. BinHex uses the marker 0x90 (octal 0220, decimal 128). The octet sequence 0xff 0x90 0x04 would decompress into four times 0xff. If the marker itself occurs, it must be ``escaped'' by the special sequence 0x90 0x00 (the marker with a repetition count of 0). Table 6 shows four more examples. Note the last example, where the marker itself is repeated.
The decompression results in a data stream which consists of three parts, the header section, the data fork and the resource fork. Figure 5 shows how the sections are composed. The numbers above each item indicate its size in octets. The header has the following items:(5176,2027)(389,-2508) (1201,-961)(150,300) (1351,-961)(1050,300) (2401,-961)(150,300) (2551,-961)(600,300) (3151,-961)(600,300) (4051,-961)(600,300) (4651,-961)(600,300) (3751,-961)(300,300) (5251,-961)(300,300) (5253,-1713)(300,300) (5253,-2463)(300,300) (4501,-1711)(-1, 0)3300 (1201,-1711)( 0, 1)300 (1201,-1411)( 1, 0)3300 (4501,-2461)(-1, 0)3300 (1201,-2461)( 0, 1)300 (1201,-2161)( 1, 0)3300 (4501,-1411)(115.38462,0.00000)7( 1, 0) 57.692 (4501,-1711)(115.38462,0.00000)7( 1, 0) 57.692 (4501,-2161)(115.38462,0.00000)7( 1, 0) 57.692 (4501,-2461)(115.38462,0.00000)7( 1, 0) 57.692 (1276,-886)(0,0)[b]1214.4ptn (1876,-886)(0,0)[b]1214.4ptName (2476,-886)(0,0)[b]1214.4pt0 (2851,-886)(0,0)[b]1214.4ptType (3451,-886)(0,0)[b]1214.4ptAuth (4351,-886)(0,0)[b]1214.4ptDlen (4951,-886)(0,0)[b]1214.4ptRlen (1876,-586)(0,0)[b]1012.0ptn (2476,-586)(0,0)[b]1012.0pt1 (2851,-586)(0,0)[b]1012.0pt4 (3451,-586)(0,0)[b]1012.0pt4 (3901,-586)(0,0)[b]1012.0pt2 (4351,-586)(0,0)[b]1012.0pt4 (4951,-586)(0,0)[b]1012.0pt4 (5401,-586)(0,0)[b]1012.0pt2 (1126,-736)(0,0)[rb]1214.4ptHeader (1126,-1006)(0,0)[rb]1214.4ptSection (1276,-586)(0,0)[b]1012.0pt1 (5401,-886)(0,0)[b]1012.0ptHC (5402,-1337)(0,0)[b]1012.0pt2 (1128,-1488)(0,0)[rb]1214.4ptData (1128,-1758)(0,0)[rb]1214.4ptSection (3228,-1638)(0,0)[b]1214.4ptData Fork (3228,-1338)(0,0)[b]1012.0ptDlen (5403,-1638)(0,0)[b]1012.0ptDC (5402,-2087)(0,0)[b]1012.0pt2 (1128,-2238)(0,0)[rb]1214.4ptResource (1128,-2508)(0,0)[rb]1214.4ptSection (3228,-2388)(0,0)[b]1214.4ptResource Fork (3228,-2088)(0,0)[b]1012.0ptRlen (5403,-2388)(0,0)[b]1012.0ptRC (3901,-886)(0,0)[b]1012.0ptFlag
Figure 5: BinHex file structure
BinHex 4.0 uses a 16-bit CRC with a 0x1021 seed. The general algorithm is to take data 1 bit at a time and process it through the following:This is the sample file in BinHex. However, the encoder I used replaced the LF characters from the original file with CR characters. It probably noticed that the input file was plain text and reformatted it to Mac-style text, but I consider this a software bug. The assigned filename is ``test.txt''.
- Take the old CRC (use 0x0000 if there is no previous CRC) and shift it to the left by 1.
- Put the new data bit in the least significant position (right bit).
- If the bit shifted out in (1) was a 1 then xor the CRC with 0x1021.
- Loop back to (1) until all the data has been processed.
(This file must be converted with BinHex 4.0) :#&4&8e3Z9&K8!&4&@&4dG(Kd!!!!!!#X!!!!!+3j9'KTFb"TFb"K)(4PFh3JCQP XC5"QEh)JD@aXGA0dFQ&dD@jR)(4SC5"fBA*TEh9c$@9ZBfpND@jR)'ePG'K[C(- Z)%aPG#Gc)'eKDf8JG'KTFb"dCAKd)'a[EQGPFL"dD'&Z$68h)'*jG'9c)(4[)(G bBA!JE'PZCA-JGfPdD#"#BA0P0M3JC'&dB5`JG'p[,Je(FQ9PG'PZCh-X)%CbB@j V)&"TE'K[CQ9b$B0A!!!!:
This is a test file for = illustrating the various encoding methods=2e=20= Let=27s make this text= longer than =357 bytes to wrap lines = with Base64 data=2c too=2e Greetings=2c Frank Pilhofer
This document was translated from LATEX by HEVEA.