A virtual teacher who reveals to you the great secrets of Base64

What is Base64?

Base64 is a encoding algorithm that allows you to transform any characters into an alphabet which consists of Latin letters, digits, plus, and slash. Thanks to it, you can convert Chinese characters, emoji, and even images into a “readable” string, which can be saved or transferred anywhere.

To figuratively understand why Base64 was invented, imagine that during a phone call Alice wants to send an image to Bob. The first problem is that she cannot simply describe how the image looks, because Bob needs an exact copy. In this case, Alice may convert the image into the binary system and dictate to Bob the binary digits (bits), after that he will be able to convert them back to the original image. The second problem is that the tariffs for phone calls are too expensive and dictate each byte as 8 binary digits will last too long. To reduce costs, Alice and Bob agree to use a more efficient data transfer method by using a special alphabet, which replaces every “six digits” with one “letter”.

To realize the difference, check out a 5x5 image converted to binary digits:

010001 110100 100101 000110 001110 000011 011101 100001 000000 010000 000000 000001 000000 001111 000000 000000 000000 001111 111100 000000 000000 000000 000000 000000 000000 000010 110000 000000 000000 000000 000000 000000 000000 010000 000000 000001 000000 000000 000000 000010 000000 100100 010000 000001 000000 000011 001011

Although the same image converted to Base64 looks like this:

R0lGODdhAQABAPAAAP8AAAAAACwAAAAAAQABAAACAkQBADs

I think the difference is obvious. Even if you remove spaces or padding zeros from binary digits, the Base64 string will still be shorter. I grouped bits only to show that each group meets each character of the Base64 string.

Well, the story about Alice and Bob is just a thought-out example to tell you what kind of problem solves the Base64 algorithm. In fact, it is a binary-to-text encoding, whose task is to encode binary data into printable characters, when the data transmission channel or the storage medium cannot handle 8-bit character encodings.

History

The history of the Base64 started long ago, in those times when engineers argued how many bits should be in a byte. Now we use eight-bit bytes, but before that were used seven-bit, six-bit, and even three-bit bytes. By the time the eight-bit encoding was approved as a standard, many systems used old encodings and did not support the “new standard”. This led to the fact that some data was simply lost during the transfer between the new and the old systems. For example, a mail server may discard the eighth bit when sending emails. Moreover, there was another problem with mail servers — they could only send text, but not binary data (such as images, video, archives). And so, in a magical way, clever minds develop an algorithm to solve these problems. Of course, over time, other binary-to-text encodings were developed, but thanks to the simplicity, efficiency and portability, Base64 became the most popular and was used almost everywhere.

For the first time the algorithm was described back in 1987 by a document describing the PEM protocol (if you are interested in the details, check the RFC 989 § 4.3). Since then, the algorithm has evolved, giving rise to new standards that are actively used throughout the world of IT.

Naming

Initially, the algorithm was named as “printable encoding” and only after a couple of years, in June 1992, RFC 1341 defines it as “Base64”. Since this algorithm uses 64 basic characters it was not difficult to give it a name (especially that Base85 already existed). Therefore, I think it will not be a problem for you to guess what means the names of algorithms such as Base16, Base32, Base36, Base58, Base91, or Base122.

Size

During encoding, the Base64 algorithm replaces each three bytes with four bytes and, if necessary, adds padding characters, so the result will always be a multiple of four. Simply put, the size of the result will always be 33% (more exactly, 43) larger than the original data. The formula for calculating the length of the result string without padding is as follows: n * 4 / 3, where n is the length of the original data.

Usage

Base64 is most commonly used to encode binary data (for example, images, or sound files) for embedding into HTML, CSS, EML, and other text documents. In addition, Base64 is used to encode data that may be unsupported or damaged during transfer, storage, or output. Here are some of the applications of the algorithm:

  • Attach files when sending emails
  • Embed images in HTML or CSS via data URI
  • Preserve raw bytes of cryptographic functions
  • Output binary data as XML or JSON in API responses
  • Save binary files to database when BLOB is unavailable
  • Hide secrets from prying eyes (really a very bad idea)

Security

Base64 is not an encryption algorithm and in no case should it be used to “hash” passwords or “encrypt” sensitive data, because it is a reversible algorithm and the encoded data can be easily decoded. Base64 may only be used to encode raw result of a cryptographic function.

Roughly speaking, in terms of information security, Base64 is just a foreign language that some people do not understand. Nevertheless, even they can understand the meaning of the encoded message simply by using an online translator, which instantly returns the original message.

Comments (29)

I hope you enjoy this discussion. In any case, I ask you to join it.

  • Alan,
    Thanks for a great explanatory article. This is something I've used by feel more than understanding, and it's nice to fill in the blanks in my knowledge. The only thing I'd add is under usage. Your API responses example touches on this at a high level, but I often find it useful for sanitizing string values that can include special characters ({}, <>, ', ;, newline, etc.) without using language specific methods to qualify strings.
    • Administrator,
      Hello Alan,
      Thank you for your comment. I'm glad you like this article.

      As for using Base64 to sanitize strings, this is a known practice, but since it has several drawbacks it should be used wisely.
    • Duratcho,
      Keep in mind that base64 uses +, / and = as well. These characters cannot exist in URLs or filenames, so do be careful when "sanitising" strings with this!
  • John,
    Great site, well done for setting this up.
    I realised when reading your site that the idea of base64 encoding has similarities to UUencoding that old people (like me) remember from the early days of email in the 1990s. Then it was considered very poor form to include binary attachments, hence the need to turn them into printable ASCII - which is what UUencoding did. UUencoding gave a predictable increase in file size of one third - each three binary characters transformed into four printable ASCII ones. Quite a good Wikipedia article on Uuencode, explaining its relation to Base64, and why Base64 is better.
  • Ahmad,
    Hi
    It was the best meaning of Base64 i see in the worldwide thanks to you, actually i saw the encoded image to Base4
    in android IDE but wanted more info about it, and one thing that I did not know is that "encode image to Base64" we will get String or when we decode it!?
    and here I got it!
    Thanks
  • Apps,
    How to identify the encoding algorithms uses on a string?
    If we have got the output of the encoding algorithm "3AqxxqQkWV" how to know what encoding algorithm was used?
  • john,
    Your statement:
    Simply put, the size of the result will always be 33% (more exactly, 4⁄3) larger than the original data.

    Might be better if you replace 4/3 with 1/3 as you are adding that to the original size (because you used the word larger)
    I.e. x+x/3=4x/3
    But your statement shows:
    x+4x/3=7x/3
  • Hakan_Ozay,
    Excellent! Thanks for your effort.
  • Bruno,
    Thank you for this website dude. Cheers.
  • Suyash,
    Love this site and thanks for sharing your passion for Base64, today I learned that I can encode audio to Base 64, truly genius!
  • Chandra,
    Suppose I have a message- Hii = 24 bit
    Required to send this message
    4*6 = 24 bit required to send message in Base64

    In this way no profit of tarrif of phone that above mention in you post
    • Administrator,
      Hello! The thing is that in this case there is no reason to encode your textual message: you can just dictate it letter by letter. But you will need a binary-to-text encoding algorithm if you need to send something that simply cannot be "described" by words (for example, a picture or a video file).
  • Nick,
    So say i have url https://site.domain.com/landingpage.html, and I base64 encode it, how do i parse the base64 encoded url into a browser address line (or clickable link) such that it is decoded into the intended/original url? i am guessing it needs to form part of a query telling the browser to decode it first? eg. how does it look? https://something?=xxxBase64xxx ??
    • Administrator,
      Hello Nick! You can pass it as query string as follows: `https://something?Base64Page=xxxBase64xxx`, then on your page fetch and decode the `Base64Page` parameter from the URL. However, there may be problems with large pages.
      • Nick,
        Does it matter what page url 'something.com' I choose? Ie. Will it redirect/reflect to my Base64 url I pass in query as u explain, regardless of 'something.com'decoy page?

        I swear I recall a lexical obfuscation trick where all you need is a https:// prefix (or other protocol prefix instructor) and then the obfuscated hex/b64/ascii text and it would be decoded by browser somehow?
  • Alok,
    Thanks for the good explanation. I love to use this site for encoding and decoding of base64. Easy to use and reliable for its accuracy. Thanks.
  • nqhXncMU,
    0'XOR(if(now()=sysdate(),sleep(15),0))XOR'Z
  • nqhXncMU,
    vjAPzqMW' OR 61=(SELECT 61 FROM PG_SLEEP(15))--
  • nqhXncMU,
    -1" OR 2+567-567-1=0+0+0+1 --
  • gBqsPxAZ,
    0'XOR(if(now()=sysdate(),sleep(15),0))XOR'Z
  • gBqsPxAZ,
    JXlPQZQj') OR 179=(SELECT 179 FROM PG_SLEEP(15))--
  • gBqsPxAZ,
    Lcno7eU4' OR 777=(SELECT 777 FROM PG_SLEEP(15))--
  • nqhXncMU,
    YqGOJ5jQ' OR 13=(SELECT 13 FROM PG_SLEEP(15))--
  • nqhXncMU,
    -1' OR 2+852-852-1=0+0+0+1 --
  • nqhXncMU,
    0"XOR(if(now()=sysdate(),sleep(15),0))XOR"Z
  • ncMUFCMU,
    -5) OR 210=(SELECT 210 FROM PG_SLEEP(15))--
  • ncMUFCMU,
    0"XOR(if(now()=sysdate(),sleep(15),0))XOR"Z
  • ncMUFCMU,
    1 waitfor delay '0:0:15' --
  • ncMUFCMU,
    -1' OR 2+536-536-1=0+0+0+1 or '6tylmwqN'='
Add new comment

If you have any questions, remarks, need help, or just like this page, please feel free to let me know by leaving a comment using the form bellow.
I will be happy to read every comment and, if necessary, I will do my best to respond as quickly as possible. Of course, spammers are welcome only as readers.