A virtual teacher who reveals to you the great secrets of Base64

Base64 Characters

The Base64 Alphabet contains 64 basic ASCII characters which are used to encode data. Yeah, that’s right, 64 characters is enough to encode any data of any length. The only drawback is that the size of the result will increase to 33%. However, its benefits are much more important, at least because all these symbols are available in 7-bit and 8-bit character sets.

Characters of the Base64 alphabet can be grouped into four groups:

  • Uppercase letters (indices 0-25): ABCDEFGHIJKLMNOPQRSTUVWXYZ
  • Lowercase letters (indices 26-51): abcdefghijklmnopqrstuvwxyz
  • Digits (indices 52-61): 0123456789
  • Special symbols (indices 62-63): +/

It is very important to note that the Base64 letters are case sensitive. This means that, for example, when decoding the values “QQ==”, “Qq==”, “qq==”, and “qQ==” four different results are obtained.

For a better understanding, I grouped all characters into the Base64 table:

Uppercase Letters
IndexCharacter
0A
1B
2C
3D
4E
5F
6G
7H
8I
9J
10K
11L
12M
13N
14O
15P
16Q
17R
18S
19T
20U
21V
22W
23X
24Y
25Z
Lowercase Letters
IndexCharacter
26a
27b
28c
29d
30e
31f
32g
33h
34i
35j
36k
37l
38m
39n
40o
41p
42q
43r
44s
45t
46u
47v
48w
49x
50y
51z
Digits
IndexCharacter
520
531
542
553
564
575
586
597
608
619
Symbols
IndexCharacter
62+
63/

In addition to these characters, the equal sign (=) is used for padding. That is, the equal sign does not own an index and is not involved in the encoding of data. By and large, the padding character ensures that the length of Base64 value is a multiple of 4 bytes and it is always appended at the end of the output. Nevertheless, the heart of the algorithm contains only 64 characters, and for each of them there is a unique index. Only indices determine which characters will be used to encode the data, and only thanks to them you can “recover” the original data. All indices are listed in the Base64 table above.

Given all of the above, a Base64 value can be defined using the following regular expression:

^[A-Za-z0-9+/]+={0,2}$

However, some standards allow and even require the use of multi-line values. In such cases, we need to supplement the list of characters, by allowing “Line Feed” and “Carriage Return”.

^[A-Za-z0-9+/\r\n]+={0,2}$
Comments (5)

I hope you enjoy this discussion. In any case, I ask you to join it.

  • Brice,
    Hey thanks for your dedication to the subject and for making this helpful site. Where I currently am, Wikipedia is filtered and I needed to lookup the 'alphabet' since I didn't know it by heart. I learned a few things too while I was reading.

    Btw, I can't put a space in the Name field of this comment form. I was going to leave my first and last name but validation doesn't allow space.
    • Administrator,
      Hello dear Brice,
      Thank you for your comment. I’m glad you found this site useful.

      I apologize for the inconvenience with the “name” field. At the moment, I can’t fix it because it may break some related things, but I’ll look into it as soon as possible.

      By the way, if you want to remember the Base64 Characters you need just to remember the order of these four groups: Uppercase, Lowercase, Digits, and Symbols. That is, remembering this you can easily compute the Base64 alphabet since all indices, as well as Base64 characters, go in strict order.

      If you are looking for furthermore reading, I recommend you to read What is Base64? as well as explanation of Encode Algorithm and Decode Algorithm.
  • Omar,
    Just wanted to say thanks for putting together this website and all the useful info it contains.

    Keep it up :)
  • James,
    Hello,

    Could you please, specify, how "the size of the result will increase to 33%" when in the introductory article you wrote that with the Base64 encoding source (binary) code would reduce by size not increase?
    • Administrator,
      Hello dear James,
      I deeply apologize for any misleading information.

      If I understand you correctly, by “introductory article” do you mean What is Base64? If so, please note that there I compared the Base64 length with binary numeral system (where each byte is represented as 8 binary digits).

      Anyway, for example, if you encode the string “ABC” (Length = 3) to Base64, the result is “QUJD” (Length = 4). That is, the result is approximately 33% (more exactly, 4/3) larger than the original data.
Add new comment

If you have any questions, remarks, need help, or just like this page, please feel free to let me know by leaving a comment using the form bellow.
I will be happy to read every comment and, if necessary, I will do my best to respond as quickly as possible. Of course, spammers are welcome only as readers.

Loading...