What is Base64?
Base64 is a encoding algorithm that allows you to transform any characters into an alphabet which consists of Latin letters, digits, plus, and slash. Thanks to it, you can convert Chinese characters, emoji, and even images into a “readable” string, which can be saved or transferred anywhere.
To figuratively understand why Base64 was invented, imagine that during a phone call Alice wants to send an image to Bob. The first problem is that she cannot simply describe how the image looks, because Bob needs an exact copy. In this case, Alice may convert the image into the binary system and dictate to Bob the binary digits (bits), after that he will be able to convert them back to the original image. The second problem is that the tariffs for phone calls are too expensive and dictate each byte as 8 binary digits will last too long. To reduce costs, Alice and Bob agree to use a more efficient data transfer method by using a special alphabet, which replaces every “six digits” with one “letter”.
To realize the difference, check out a 5x5 image converted to binary digits:
010001 110100 100101 000110 001110 000011 011101 100001 000000 010000 000000 000001 000000 001111 000000 000000 000000 001111 111100 000000 000000 000000 000000 000000 000000 000010 110000 000000 000000 000000 000000 000000 000000 010000 000000 000001 000000 000000 000000 000010 000000 100100 010000 000001 000000 000011 001011
Although the same image converted to Base64 looks like this:
R0lGODdhAQABAPAAAP8AAAAAACwAAAAAAQABAAACAkQBADs
I think the difference is obvious. Even if you remove spaces or padding zeros from binary digits, the Base64 string will still be shorter. I grouped bits only to show that each group meets each character of the Base64 string.
Well, the story about Alice and Bob is just a thought-out example to tell you what kind of problem solves the Base64 algorithm. In fact, it is a binary-to-text encoding, whose task is to encode binary data into printable characters, when the data transmission channel or the storage medium cannot handle 8-bit character encodings.
History
The history of the Base64 started long ago, in those times when engineers argued how many bits should be in a byte. Now we use eight-bit bytes, but before that were used seven-bit, six-bit, and even three-bit bytes. By the time the eight-bit encoding was approved as a standard, many systems used old encodings and did not support the “new standard”. This led to the fact that some data was simply lost during the transfer between the new and the old systems. For example, a mail server may discard the eighth bit when sending emails. Moreover, there was another problem with mail servers — they could only send text, but not binary data (such as images, video, archives). And so, in a magical way, clever minds develop an algorithm to solve these problems. Of course, over time, other binary-to-text encodings were developed, but thanks to the simplicity, efficiency and portability, Base64 became the most popular and was used almost everywhere.
For the first time the algorithm was described back in 1987 by a document describing the PEM protocol (if you are interested in the details, check the RFC 989 § 4.3). Since then, the algorithm has evolved, giving rise to new standards that are actively used throughout the world of IT.
Naming
Initially, the algorithm was named as “printable encoding” and only after a couple of years, in June 1992, RFC 1341 defines it as “Base64”. Since this algorithm uses 64 basic characters it was not difficult to give it a name (especially that Base85 already existed). Therefore, I think it will not be a problem for you to guess what means the names of algorithms such as Base16, Base32, Base36, Base58, Base91, or Base122.
Size
During encoding, the Base64 algorithm replaces each three bytes with four bytes and, if necessary, adds padding characters, so the result will always be a multiple of four. Simply put, the size of the result will always be 33% (more exactly, 4⁄3) larger than the original data. The formula for calculating the length of the result string without padding is as follows: n * 4 / 3
, where n is the length of the original data.
Usage
Base64 is most commonly used to encode binary data (for example, images, or sound files) for embedding into HTML, CSS, EML, and other text documents. In addition, Base64 is used to encode data that may be unsupported or damaged during transfer, storage, or output. Here are some of the applications of the algorithm:
- Attach files when sending emails
- Embed images in HTML or CSS via data URI
- Preserve raw bytes of cryptographic functions
- Output binary data as XML or JSON in API responses
- Save binary files to database when BLOB is unavailable
- Hide secrets from prying eyes (really a very bad idea)
Security
Base64 is not an encryption algorithm and in no case should it be used to “hash” passwords or “encrypt” sensitive data, because it is a reversible algorithm and the encoded data can be easily decoded. Base64 may only be used to encode raw result of a cryptographic function.
Roughly speaking, in terms of information security, Base64 is just a foreign language that some people do not understand. Nevertheless, even they can understand the meaning of the encoded message simply by using an online translator, which instantly returns the original message.
Comments (31)
I hope you enjoy this discussion. In any case, I ask you to join it.
Thank you for your comment. I'm glad you like this article.
As for using Base64 to sanitize strings, this is a known practice, but since it has several drawbacks it should be used wisely.
+
,/
and=
as well. These characters cannot exist in URLs or filenames, so do be careful when "sanitising" strings with this!I realised when reading your site that the idea of base64 encoding has similarities to UUencoding that old people (like me) remember from the early days of email in the 1990s. Then it was considered very poor form to include binary attachments, hence the need to turn them into printable ASCII - which is what UUencoding did. UUencoding gave a predictable increase in file size of one third - each three binary characters transformed into four printable ASCII ones. Quite a good Wikipedia article on Uuencode, explaining its relation to Base64, and why Base64 is better.
It was the best meaning of Base64 i see in the worldwide thanks to you, actually i saw the encoded image to Base4
in android IDE but wanted more info about it, and one thing that I did not know is that "encode image to Base64" we will get String or when we decode it!?
and here I got it!
Thanks
If we have got the output of the encoding algorithm "3AqxxqQkWV" how to know what encoding algorithm was used?
Simply put, the size of the result will always be 33% (more exactly, 4⁄3) larger than the original data.
Might be better if you replace 4/3 with 1/3 as you are adding that to the original size (because you used the word larger)
I.e.
x+x/3=4x/3
But your statement shows:
x+4x/3=7x/3
Required to send this message
4*6 = 24 bit required to send message in Base64
In this way no profit of tarrif of phone that above mention in you post
I swear I recall a lexical obfuscation trick where all you need is a https:// prefix (or other protocol prefix instructor) and then the obfuscated hex/b64/ascii text and it would be decoded by browser somehow?