XHTML

From Free net encyclopedia

Template:HTML The Extensible HyperText Markup Language, or XHTML, is a markup language that has the same expressive possibilities as HTML, but a stricter syntax. Whereas HTML is an application of SGML, a very flexible markup language, XHTML is an application of XML, a more restrictive subset of SGML. Because they need to be well-formed (syntactically correct), XHTML documents allow for automated processing to be performed using a standard XML library — unlike HTML, which requires a relatively complex, lenient, and generally custom parser (though an SGML parser library could possibly be used). XHTML can be thought of as the intersection of HTML and XML in many respects, since it is a reformulation of HTML in XML. XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000.

Contents

Overview

Image:W3C valid XHTML 1.0 icon.png

XHTML is the successor to HTML. As such, many consider XHTML to be the “current version” of HTML, but it is a separate, parallel recommendation; the W3C continues to recommend the use of XHTML 1.1, XHTML 1.0, and HTML 4.01 for web publishing.

The need for a more strict version of HTML was felt primarily because World Wide Web content now needs to be delivered to many devices (like mobile devices) apart from traditional computers, where extra resources cannot be devoted to support the additional complexity of HTML syntax.

Most of the recent versions of popular web browsers render XHTML properly, but many older browsers can only render XHTML as HTML. Similarly, almost all web browsers that are compatible with XHTML also render HTML properly. Some argue this compatibility is slowing the switch from HTML to XHTML. During October 2005 approximately 10% of web surfers were using browsers capable of rendering XHTML properly. [1] Microsoft's Internet Explorer is incompatible with some XHTML recommendations, despite Microsoft's full membership in the W3C. [2] Therefore, most web content authors are forced to choose between writing valid, standards-compliant documents and providing content that renders properly on the browsers of most visitors.

An especially useful feature of XHTML is that elements from different XML namespaces (such as MathML and Scalable Vector Graphics) can be incorporated within it. However, this feature is only available when serving XHTML as actual XML with the application/xhtml+xml MIME-type.

The changes from HTML to first-generation XHTML (i.e. XHTML 1.x) are minor, and are mainly to achieve conformance with XML. The most important change is the requirement that the document must be well formed and that all elements must be explicitly closed as required in XML. Since XML's tags are case-sensitive, the XHTML recommendation has defined all tag names to be lowercase. This is in direct contrast to established traditions which began around the time of HTML 2.0, when most people preferred uppercase tags, generally to show the contrast between mark-up and context easier to the human editor. In XHTML, all attribute values must be enclosed by quotes (either 'single' or "double" quotes may be used). In contrast, this was optional in SGML, and hence in HTML, where quotes may be omitted in some circumstances. All elements must also be explicitly closed, including empty elements such as img and br. This can be done by adding a closing slash to the start tag: <img … /> and <br />. Attribute minimization (e.g., <option selected>) is also prohibited as the attribute “selected” contains no explicit value; instead, use <option selected="selected">. More differences are detailed in the W3C XHTML 1.0 recommendation [3].

Versions of XHTML

XHTML 1.0

The original XHTML W3C Recommendation, XHTML 1.0, was simply a reformulation of HTML 4.01 in XML. There are three different "flavors" of XHTML 1.0, each equal in scope to their respective HTML 4.01 versions.

  • XHTML 1.0 Strict is the same as HTML 4.01 Strict, but follows XML syntax rules.
  • XHTML 1.0 Transitional allows some common deprecated elements and attributes not found in XHTML 1.0 Strict to be used, such as <center>, <u>, <strike>, and <applet>. Supports everything found in XHTML 1.0 Strict, but it is also good for compatibility with older browsers which cannot render style sheets, due to the prevalent use of attributes such as body, bgcolor etc. [4]
  • XHTML 1.0 Frameset: Allows the use of HTML framesets.

XHTML 1.1

The most recent XHTML W3C Recommendation is XHTML 1.1: Module-based XHTML. This is based on XHTML 1.0 Strict using the DTDs of the Modularization of XHTML. All deprecated features of HTML, e.g. presentational elements and framesets, have been removed from this version. Presentation is controlled purely by Cascading Style Sheets. This version also allows for ruby markup support, needed for East-Asian languages (especially CJK).

The modularization of XHTML allows small chunks of XHTML to be re-used by other XML applications in a well-defined manner. It also allows XHTML to be extended for specialist purposes. Note that such extended documents are not XHTML 1.1 conforming documents. For example, if you extend a document with the frameset module you can no longer claim the document is XHTML 1.1. Instead it might be described as an XHTML Host Language Conforming Document if the relevant criteria are satisfied. To be used correctly and conform XHTML must be DOM 2.0.01.a.xxx compliant.

The XHTML 2.0 draft specification

Work on XHTML 2.0 is, as of 2006, still underway. The XHTML 2.0 draft is controversial because it breaks backward compatibility with all previous versions, and is therefore, in effect, a new markup language created to circumvent (X)HTML's limitations rather than being simply a new version. Many issues with compatibility are easily addressed, however, by parsing XHTML 2.0 the same way a user agent would parse XHTML 1.1: via an XML parser and a default CSS document conforming to the XHTML 2.0 recommendation.

New features brought into the HTML family of markup languages by XHTML 2.0:

  • HTML forms will be replaced by XForms, an XML-based user input specification allowing forms to be displayed appropriately for different rendering devices.
  • HTML frames will be replaced by XFrames.
  • The DOM Events will be replaced by XML Events, which uses the XML Document Object Model.
  • A new list element type, the nl element type, will be included to specifically designate a list as a navigation list. This will be useful in creating nested menus, which are currently created by a wide variety of means like nested unordered lists or nested definition lists.
  • Any element will be able to act as a hyperlink, e.g., <li href="articles.html">Articles</li>, similar to XLink.
  • Any element will be able to reference alternative media with the src attribute, e.g., <p src="lbridge.jpg" type="image/jpeg">London Bridge</p> is the same as <object src="lbridge.jpg" type="image/jpeg"><p>London Bridge</p></object>.
  • The alt attribute of the img element has been removed: alternative text will be given in the content of the img element, much like the object element, e.g., <img src="hms_audacious.jpg">HMS <em>Audacious</em></img>.
  • A single heading element (h) will be added. The level of these headings will be indicated by the nested section elements, each with their own h heading.
  • The remaining presentational elements i, b and tt, still allowed in XHTML 1.x (even Strict), will be absent from XHTML 2.0. The only somewhat presentational elements remaining will be sup and sub for superscript and subscript respectively. All other tags are meant to be semantic instead (e.g. <strong> for strong or bolded text) while allowing the user agent to style the semantics via CSS.
  • The addition of RDF triple with the property and about attributes to facilitate the conversion from XHTML to RDF/XML.

Others in the XHTML family

Validating XHTML documents

An XHTML document that conforms to the XHTML specification is said to be a valid document. In a perfect world, all browsers would follow the web standards and valid documents would predictably render on every browser and platform. Although validating XHTML does not ensure cross-browser compatibility, it is a recommended first step. A document can be checked for validity with the W3C Markup Validation Service.

DOCTYPEs / XML Namespaces / XML Schemas

For a document to validate, it must contain a Document Type Declaration, or DOCTYPE. A DOCTYPE declares to the browser what Document Type Definition (DTD) the document conforms to. A Document Type Declaration should be placed at the very beginning of an XHTML document, even before the <html> tag. These are the most common XHTML Document Type Declarations:

XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
XHTML 2.0

XHTML 2.0 currently (Oct 2005) defines its DOCTYPE as

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml2.dtd">

In addition to the DOCTYPE, all XHTML elements must be in the appropriate XML namespace for the version being used. This is usually done using an xmlns declaration in the root element.

For XHTML 1.x this is

<html xmlns="http://www.w3.org/1999/xhtml">

XHTML 2.0 requires a namespace and a XML Schema instance declaration, which might be declared as

<html xmlns="http://www.w3.org/2002/06/xhtml2/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2/ http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd">

The system identifier part of the DOCTYPE, which in these examples is the URL that begins with "http", need only point to a copy of the DTD to use if the validator cannot locate one based on the public identifier (the other quoted string). It does not need to be the specific URL that is in these examples; in fact, authors are encouraged to use local copies of the DTD files when possible. The public identifier, however, must be character-for-character the same as in the examples. Similarly the actual URL to the XML Schema file can be changed, as long as the URL before it, e.g., the XHTML 2.0 namespace, remains the same.

Character encoding may be specified at the beginning of an XHTML document in the XML declaration and within a meta http-equiv element. (If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.)

Common errors

Some of the most common errors in XHTML are:

  • Not closing empty elements (elements without closing tags in HTML4)
    • Incorrect: <br>
    • Correct: <br />
      Note that any of these are acceptable in XHTML: <br></br>, <br/> and <br />. Older HTML-only browsers will generally accept <br> and <br />. Using <br /> gives some degree of backward and forward compatibility.
  • Not closing non-empty elements
    • Incorrect: <p>This is a paragraph.<p>This is another paragraph.
    • Correct: <p>This is a paragraph.</p><p>This is another paragraph.</p>
  • Improperly nesting elements (elements must be closed in reverse order)
    • Incorrect: <em><strong>This is some text.</em></strong>
    • Correct: <em><strong>This is some text.</strong></em>
  • Not specifying alternate text for images (using the alt attribute, which helps make pages accessible to users of devices that do not load images and to the visually impaired)
    • Incorrect: <img src="/skins/common/images/poweredby_mediawiki_88x31.png" />
    • Correct: <img src="/skins/common/images/poweredby_mediawiki_88x31.png" alt="Powered By MediaWiki" />
    • Correct (XHTML 2.0): <img src="/skins/common/images/poweredby_mediawiki_88x31.png">Powered By MediaWiki</img>
  • Putting text directly in the body of the document (this is not an error in XHTML 1.0 Transitional)
    • Incorrect: <body>Welcome to my page.</body>
    • Correct: <body><p>Welcome to my page.</p></body>
  • Nesting block-level elements within inline elements
    • Incorrect: <em><h2>Introduction</h2></em>
    • Correct: <h2><em>Introduction</em></h2>
  • Not putting quotation marks around attribute values
    • Incorrect: <td rowspan=3>
    • Correct: <td rowspan="3">
  • Using the ampersand outside of entities (use &amp; to display the ampersand character)
    • Incorrect: <title>Cars & Trucks</title>
    • Correct: <title>Cars &amp; Trucks</title>
  • Using the ampersand outside of entities in URLs (use &amp; instead of & in links also)
    • Incorrect: <a href="index.php?page=news&style=5">News</a>
    • Correct: <a href="index.php?page=news&amp;style=5">News</a>
  • Using uppercase element or attribute names
    • Incorrect: <BODY><P>The Best Page Ever</P></BODY>
    • Correct: <body><p>The Best Page Ever</p></body>
  • Attribute minimization
    • Incorrect: <textarea readonly>READ-ONLY</textarea>
    • Correct: <textarea readonly="readonly">READ-ONLY</textarea>
  • Using document.write()in scripts instead of node creation methods, e.g. document.createElementNS(); document.getElementById().appendChild();
  • Using comments in embedded scripts and stylesheets. In XHTML the contents of the comment block are invisible to the browser. Consider
<style type="text/css">
 <!--
  p { color: green; }
 -->
</style>
In XHTML this is equivalent to the following because the content of the comment is removed from the parse tree.
<style type="text/css">
</style>
Whilst in HTML it is equivalent to the following, which is probably what was intended.
<style type="text/css">
  p { color: green; }
</style>
Instead, the above should be written using the <![CDATA[ ]]> syntax.
<style type="text/css">
<![CDATA[
  p { color: green; }
]]>
</style>

This is not an exhaustive list, but gives a general sense of errors that XHTML coders often make.

Backward compatibility

XHTML 1.0 documents are mostly backward compatible with HTML — that is, processable as HTML by a web browser that does not know how to properly handle XHTML — when authored according to certain guidelines given in the specification and served as text/html. Authors who follow the compatibility guidelines essentially create HTML that, while technically invalid due to the use of malformed empty-element tags, happens to be processable by all modern web browsers, which are very lenient when reading "tag soup".

If XHTML is served with the appropriate application/xhtml+xml mime type, many older web browsers will exhibit various bugs, which makes the deployment of XHTML difficult, even if the use of application/xhtml+xml is restricted to browsers that declare that they prefer it.

XHTML 1.1's modularity features prevent it from being backward compatible with XHTML 1.0 and HTML. XHTML 2.0, likewise, is not backward compatible with its predecessors.

Difficulties with backward compatibility have resulted in some criticism of the use of XHTML and the lack of progress towards its deployment as intended by the W3C.

Example

The following is an example of XHTML 1.0 Strict.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>XHTML 1.0 Example</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  </head>
  <body>
    <p>This is a tiny example of an <abbr title="Extensible HyperText Markup 
Language">XHTML</abbr> 1.0 Strict document.</p>
  </body>
</html>

See also

External links

Template:Wikibookspar

Validators

cs:XHTML da:XHTML de:Extensible Hypertext Markup Language es:XHTML eo:XHTML eu:XHTML fr:XHTML gl:XHTML ko:XHTML id:XHTML ia:XHTML it:XHTML he:XHTML lv:XHTML nl:Extensible HyperText Markup Language ja:Extensible HyperText Markup Language no:XHTML pl:XHTML pt:EXtensible Hypertext Markup Language ro:XHTML ru:XHTML sk:XHTML fi:XHTML sv:XHTML th:XHTML vi:XHTML uk:XHTML zh:XHTML