Base64
From Free net encyclopedia
Template:Table Numeral Systems Base 64 literally means a positional numeral system using a base of 64. It is the largest power of two base that can be represented using only printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things. All well-known variants that are known by the name Base64 use the characters A–Z, a–z, and 0–9 in that order for the first 62 digits but the symbols chosen for the last two digits vary considerably between different systems. Several other encoding methods such as uuencode and later versions of binhex use a different set of 64 characters to represent 6 binary digits but these are never called by the name base64.
Contents |
MIME
In the MIME e-mail format, base64 is a binary to text encoding scheme whereby an arbitrary sequence of bytes is converted to a sequence of printable ASCII characters. It is defined as a MIME content transfer encoding for use in internet e-mail. The only characters used are the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.
Full specifications for this form of base64 are contained in RFC 1421 and RFC 2045. The scheme is defined to encode a sequence of octets (bytes). This matches up with the definition of files on almost all systems. The resultant base64-encoded data exceeds the original in length by the ratio 4:3, and typically appears to consist of seemingly random characters. As newlines are inserted in the encoded data every 76 characters, the actual length of the encoded data is approximately 135.1% of the original.
To convert data to base64, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes to encode, the corresponding buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" and the indicated character output. If there were only one or two input bytes, only the first two or three characters of the output are used and are padded with two or one "=" characters respectively. This prevents extra bits being added to the reconstructed data. The process then repeats on the remaining input data.
For example, the historic slogan of Wikipedia,
- Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
encoded in base64 is as follows:
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
An example
In the above example the encoded value of Man is TWFu. Here is a method that illustrates how one gets the value.
ASCII value of 'M' = 77 = 01001101
ASCII value of 'a' = 97 = 01100001
ASCII value of 'n' = 110 = 01101110
Now from these three bytes we get the 24 bit buffer which would be
010011010110000101101110 (As mentioned in the method the first byte M is placed in the most significant 8 bits of the 24 bit buffer followed by second and third bytes). This number has to be converted to its base 64 value which can be done by taking 6 bits at a time.Now taking 6 bits at a time from the buffer we get the 4 numbers, which are then converted to their corresponding values in Base 64.
010011 = 19 = T
010110 = 22 = W
000101 =5 = F
101110 = 46 = u
UTF-7
UTF-7 introduced a system called Modified Base64. This data encoding scheme is used to encode the UTF-16 used as an intermediate format in UTF-7 into printable ASCII characters. It is a variant of the base64 used in MIME. UTF-7 was intended to allow use of unicode in e-mail without using a separate content transfer encoding. The main difference it has versus the MIME variant base64 is that it does not use the "=" symbol for padding, as that character tends to require a fair amount of escaping. Instead, it pads the octet bits with zeros.
Modified Base64 is standardized as RFC 2152, A Mail-Safe Transformation Format of Unicode.
IRCu
In the P10 server-server protocol used by the IRCu IRC daemon and compatible software, a version of base64 is used to encode client/server numerics and binary IP addresses. Client and server numerics have fixed sizes which match up with an exact number of base64 digits so no padding is needed. Binary IP addresses have leading zero bits added to make them fit. The symbol set is slightly different from the MIME one using [] instead of +/.
URL Applications
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. Hibernate, a database persistence framework for Java objects, uses Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in not only a compact way, but in a relatively unreadable one when trying to obscure the nature of data from a casual human observer.
Using a URL-encoder on standard Base64, however, is inconvenient as it will translate the '/' and '+' characters into special '%XX' hexadecimal sequences. When this is later used with database storage or across heterogeneous systems, they will themselves choke on the '%' character generated by URL-encoders (because the '%' character is also used in ANSI SQL as a wildcard).
For this reason, a modified Base64 for URL variant exists, where no padding '=' will be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '*' and '-', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general.
Another variant called modified Base64 for regexps uses '!-' instead of '*-' to replace the standard Base64 '+/', because both '+' and '*' may be reserved for regular expressions (note that '[]' used in the IRCu variant above would not work in that context).
There are other variants that use '_-' or '._' when the Base64 variant string must be used within valid identifiers for programs, or '.-' for use in XML name tokens (Nmtoken), or even '_:' for use in more restricted XML identifiers (Name).
Other applications
Base64 can be used in a variety of contexts. For example, Thunderbird uses Base64 to obscure e-mail POP3 passwords. Base64 is often used as a security shortcut to obscure secrets without incurring the overhead of cryptographic key management.
Spammers use Base64 to evade basic anti-spam tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
See also
External links
- RFC 1421 (Privacy Enhancement for Electronic Internet Mail)
- RFC 2045 (MIME)
- RFC 3548 (The Base16, Base32, and Base64 Data Encodings)
- Base64 source code in C
- Base64 source code in Java / Another Java source code
- MIME::Base64 Perl module
- Firefox extension that supports ASCII/Base64 conversions
- emacs functions for Base64 conversions
- Base64 article at TenMinuteTutor.com
Resources
- Base64 Encryptor / Decryptor by Chemicalware (Windows and Linux)
- Online Base64 Decoder/Encoder, SourceForge
- Online JavaScript implementation
- Base64 Decoder with graphical user interface (Windows)
- Multi-platform Base64 Encoder/Decoder
- Online Base64, HEX, Binary, etc. Encoder/Decoderda:Base64
de:Base64 fr:Base64 nl:Base64 ja:Base64 pl:Base64 pt:Base64 ru:Base64 sk:Base64 sr:Base64