What is Unicode?
Unicode is a universal character encoding standard that is used to support characters in non-ASCII scripts. The Internet was originally built on ASCII, which is based on the English alphabet and consists of only 128 characters.
Unicode allows for support of all the languages around the world and their unique character sets - Unicode can support over 1 million characters! The way Unicode works is by allowing more bits, short for binary digit, which are units of information on a machine. ASCII characters only require about 7 bits, while Unicode uses 16 bits. This is necessary because it takes more bits to process languages such as Chinese, Arabic, and Russian.
There are different types of Unicode including UTF-8 and UTF-16, the two most common. UTF-8 has become the typical standard used on the web because it adjusts the number of bits used depending on the character. This means that ASCII characters in UTF-8 only take up the bits they need to process.