The Base64 Alphabet contains 64 basic ASCII characters which are used to encode data. Yeah, that’s right, 64 characters is enough to encode any data of any length. The only drawback is that the size of the result will increase to 33%. However, its benefits are much more important, at least because all these symbols are available in 7-bit and 8-bit character sets.
Characters of the Base64 alphabet can be grouped into four groups:
- Uppercase letters (indices 0-25):
- Lowercase letters (indices 26-51):
- Digits (indices 52-61):
- Special symbols (indices 62-63):
It is very important to note that the Base64 letters are case sensitive. This means that, for example, when decoding the values “QQ==”, “Qq==”, “qq==”, and “qQ==” four different results are obtained.
For a better understanding, I grouped all characters into the Base64 table:
In addition to these characters, the equal sign (
=) is used for padding. That is, the equal sign does not own an index and is not involved in the encoding of data. By and large, the padding character ensures that the length of Base64 value is a multiple of 4 bytes and it is always appended at the end of the output. Nevertheless, the heart of the algorithm contains only 64 characters, and for each of them there is a unique index. Only indices determine which characters will be used to encode the data, and only thanks to them you can “recover” the original data. All indices are listed in the Base64 table above.
Given all of the above, a Base64 value can be defined using the following regular expression:
However, some standards allow and even require the use of multi-line values. In such cases, we need to supplement the list of characters, by allowing “Line Feed” and “Carriage Return”.