DBCS

From Free net encyclopedia

DBCS stands for Double Byte Character Set. This term has two basic meanings:

  • In CJK computing, the term "DBCS" traditionally means a character set in which every graphic character not representable by an accompanying SBCS is encoded in two bytes; Han characters would generally comprise most of these two-byte characters.
  • The term "DBCS" can also mean a character set in which all characters (including all control characters) are encoded in two bytes.

DBCS’s in CJK computing

In CJK computing, the term DBCS traditionally refers to a character set where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bit set (i.e., being 1), and is always paired up with a single-byte character-set (SBCS). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022; i.e., "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC.

Note that this meaning of DBCS is different than what some consider correct usage today: Some might insist that these character sets be properly called either MBCS’s or variable-width encodings. Nevertheless, the term “MBCS” is not a traditional term and one should not expect the term “MBCS” to be understood; “DBCS” is the correct traditional term to describe these character sets.

Controversy

Some people use DBCS to mean Unicode, specifically UTF-16, while other people use the term DBCS to mean older (pre-Unicode) codepages that use more than one byte per character. Shift-JIS, GB2312 and Big5 are a few codepages that can contain more than one byte per character, but even using the term DBCS for these codepages is incorrect terminology because these codepages are really MBCS (MultiByte Character Sets). Some IBM mainframes do have true DBCS codepages, which contain only the double byte portion of a multibyte codepage.

If a business uses the term "DBCS enablement" for software internationalization, they are using ambiguous terminology. The business either means they want to write software for East Asian markets using older technology with codepages, or they are planning on using Unicode. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible codepages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other codepages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.

See also