8-bit Unicode Transformation format, called UTF-8, is a variable width character encoding that can encode all of the 1.111.064 valid code points in Unicode wit one to four 8-bit bytes. The number "8" means 8-bit blocks are used by UTF for representing a character.
Since 2009, UTF-8 has been the leading encoding for the World Wide Web.
For characters that are equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. This is similar to the ASCII value.
For any character equal to or below 2047 (hex 0x07FF), the UTF-8 representation is scattered over two bytes.
For any character that is equal to or greater than 2048 but less than 65535 (0xFFFF), the UTF-8 representation will be spread across three bytes.
The list below shows some UTF-8 character codes which are supported by HTML5:
Character Codes | Decimal | Hexadecimal |
---|---|---|
C0 Controls and Basic Latin | 0-127 | 0000-007F |
C1 Controls and Latin-1 Supplement | 128-255 | 0080-00FF |
Latin Extended-A | 256-383 | 0100-017F |
Latin Extended-B | 384-591 | 0180-024F |
Spacing Modifiers | 688-767 | 02B0-02FF |
Diacritical Marks | 768-879 | 0300-036F |
Greek and Coptic | 880-1023 | 0370-03FF |
Cyrillic Basic | 1024-1279 | 0400-04FF |
Cyrillic Supplement | 1280-1327 | 0500-052F |
General Punctuation | 8192-8303 | 2000-206F |
Currency Symbols | 8352-8399 | 20A0-20CF |
Letterlike Symbols | 8448-8527 | 2100-214F |
Arrows | 8592-8703 | 2190-21FF |
Mathmetical Operators | 8704-8959 | 2200-22FF |
Box Drawings | 9472-9599 | 2500-257F |
Block Elements | 9600-9631 | 2580-259F |
Geometric Shapes | 9632-9727 | 25A0-25FF |
Miscellaneous Symbols | 9728-9983 | 2600-26FF |
Dingbats | 9984-10175 | 2700-27BF |
Practice Your Knowledge
Quiz Time: Test Your Skills!
Ready to challenge what you've learned? Dive into our interactive quizzes for a deeper understanding and a fun way to reinforce your knowledge.