ISO/IEC 8859-2

From Free net encyclopedia

(Difference between revisions)

Revision as of 12:42, 8 March 2006
Plugwash (Talk | contribs)

Next diff →

Current revision

ISO 8859-2, more formally cited as ISO/IEC 8859-2 or less formally as Latin-2, is part 2 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. 2, consisting of 191 characters from the Latin script, each encoded as a single 8-bit code value.

ISO_8859-2:1987, more commonly known by its preffered mime name of ISO-8859-2 (note extra hyphen), is the IANA charset name for this standard used together with the control codes from ISO/IEC 6429 for the C0 (0x00-0x1F) and C1 (0x80-0x9F) parts. Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted. This character set also has the aliases ISO_8859-2, latin2, l2 and csISOLatin2.

This encoding shares a lot of assignments with windows-1250 but is not a strict subset of it (unlike the case with windows-1252 and ISO 8859-1).

These code values can be used in almost any data interchange system to communicate in the following European languages: Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (in Latin transcription), Serbocroatian, Slovak, Slovenian, Upper Sorbian and Lower Sorbian. Furthermore it is suitable to represent several western European languages like German or English. When used alone, these latter languages are nominally using ISO 8859-1 encoding, but the needed codepoints are shared with ISO 8859-2, which is an important aspect for multi-lingual documents.

Unlike ISO 8859-1, it is also suitable for Finnish, but with the exception of very rare ž, š or (Swedish) å, Finnish characters encoded in ISO 8859-1 and ISO 8859-2 are identical. Historically, the Finnish texts were said to be in ISO 8859-1, and ž š were not used in computer text processing (å belongs to ISO 8859-1).

It may be argued that ISO 8859-2 is not really suitable for Romanian because of lack of letters s and t with commas below, containing s and t with cedillas instead. These letters were unified in the first versions of the Unicode standard, meaning that the appearance with cedilla or with comma was treated as a glyph choice rather than as separate characters; fonts intended for use with Romanian should, therefore, have characters with comma below at those code points.

ISO/IEC 8859-2
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	unused
1x	unused
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	Template:Num	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x	unused
9x	unused
Ax	NBSP	Ą	˘	Ł	¤	Ľ	Ś	§	¨	Š	Ş	Ť	Ź		Ž	Ż
Bx	°	ą	˛	ł	´	ľ	ś	ˇ	¸	š	ş	ť	ź	˝	ž	ż
Cx	Ŕ	Á	Â	Ă	Ä	Ĺ	Ć	Ç	Č	É	Ę	Ë	Ě	Í	Î	Ď
Dx	Đ	Ń	Ň	Ó	Ô	Ő	Ö	×	Ř	Ů	Ú	Ű	Ü	Ý	Ţ	ß
Ex	ŕ	á	â	ă	ä	ĺ	ć	ç	č	é	ę	ë	ě	í	î	ď
Fx	đ	ń	ň	ó	ô	ő	ö	÷	ř	ů	ú	ű	ü	ý	ţ	˙

In the table above, 20 is the regular SPACE character, and A0 is the NO-BREAK SPACE. AD is a SOFT HYPHEN, which should not appear at all in compliant web browsers.

Code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-2.

[edit]

External links

ISO 8859-2 (Latin 2) Resources de:ISO 8859-2

pl:ISO 8859-2 sl:ISO 8859-2 fi:ISO 8859-2 zh:ISO 8859-2

Retrieved from "http://www.netipedia.com/index.php/ISO/IEC_8859-2"

Categories: ISO 8859

ISO/IEC 8859-2

From Free net encyclopedia

Current revision

External links

Views

Personal tools

Search

Partner sites