Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

ARCHIVED: What is Unicode?

Developed in cooperation between the Unicode Consortium and the International Organization for Standardization (ISO), Unicode is an attempt to consolidate the alphabets and ideographs of the world's languages into a single, international character set. It focuses on the characters themselves rather than on languages. Thus, a letter shared between English and Russian (or for that matter, an ideograph shared between kanji and Han script) would have the same Unicode character. As a multilingual standard, Unicode makes it possible for developers to create applications without having to resort to the costly, time-consuming task of releasing localized versions for each language.

Most Western character sets are 7-bit (e.g., US ASCII) or 8-bit (e.g., Latin-1), limiting them, respectively, to 128 or 256 characters. This limitation has resulted in a slew of sets customized for each language. For languages like Chinese, Korean, and Japanese, which use heavily ideographic (i.e., based on the content of a word rather than its component sounds) writing systems consisting of thousands of characters, traditional 7- and 8-bit character sets are not adequate. Therefore, to include the character sets of the world's principal writing systems, Unicode uses primarily a 16-bit set, allowing up to 65,536 characters. This does have the consequence that Unicode text takes up twice as much disk space as text using an 8-bit character set.

As a character set, Unicode does not concern itself with the specific appearance, or glyph, of a character. Instead, it includes only a code and name for each character. Individual fonts are assigned the tasks of rendering characters into glyphs, with the exact appearance of glyphs varying between fonts. Similarly, Unicode does not, for the most part, distinguish between plain and rich text, instead allowing applications to apply their own text processing and formatting.

For more information about Unicode, visit the Unicode Consortium's web page.

This is document aems in domain all.
Last modified on April 28, 2009.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.