HTML 5: What’s news?
Some of the new features in HTML 5 are functions for embedding audio, video and graphics, client-side data storage, and interactive documents. Other features are new page elements like <header>, <section>, <footer>, and <figure>.
HTML 5 improves interoperability and reduce development costs by making precise rules on how to handle all HTML elements, and how to recover from errors.
HTML 5 is planned to be the fifth major revision of the core language of the World Wide Web, HTML. When HTML 5 is expressed in XML, it is called XHTML 5. The ideas behind HTML 5 were pioneered in 2004 by the Web Hypertext Application Technology Working Group (WHATWG). HTML 5 was adopted as the starting point of the work of the new HTML working group of the W3C in 2007. The HTML working group has published the First Public Working Draft of the specification on 22nd January 2008. The specification is ongoing work, and expected to remain so for many years.
HTML 5 differences
The HTML 5 language has a “custom” HTML syntax that is compatible with HTML 4 and XHTML1 documents published on the Web, but is not compatible with the more esoteric SGML features of HTML 4, such as
<em/content/. Documents using this “custom” syntax must be served with the
text/html MIME type.
HTML 5 also defines detailed parsing rules (including “error handling”) for this syntax which are largely compatible with popular implementations. User agents will follow these rules for resources that have the
text/html MIME type. Here is an example document that conforms to the HTML syntax:
<!doctype html> <html> <head> <meta charset="UTF-8"> <title>Example document</title> </head> <body> <p>Example paragraph</p> </body> </html>
The other syntax that can be used for HTML 5 is XML. This syntax is compatible with XHTML1 documents and implementations. Documents using this syntax need to be served with an XML MIME type and elements need to be put in the
http://www.w3.org/1999/xhtml namespace following the rules set forth by the XML specifications. [XML]
Below is an example document that conforms to the XML syntax of HTML 5. Note that XML documents must have an XML MIME type such as
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Example document</title> </head> <body> <p>Example paragraph</p> </body> </html>
For the HTML syntax of HTML 5 authors have three means of setting the character encoding:
- At the transport level. By using the HTTP
Content-Typeheader for instance.
- Using a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used.
- Using a
metaelement with a
charsetattribute that specifies the encoding as the first element child of the
<meta charset="UTF-8">could be used to specify the UTF-8 encoding. This replaces the need for
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
For the XML syntax authors have to use the rules as set forth in the XML specifications to set the character encoding.
The HTML syntax of HTML 5 requires a
DOCTYPE to be specified to ensure that the browser renders the page in standards mode. The
DOCTYPE has no other purpose and is therefore optional for XML. Documents with an XML MIME type are always handled in standards mode. [DOCTYPE]
DOCTYPE declaration is
<!DOCTYPE html> and is case-insensitive in the HTML syntax.
DOCTYPEs from earlier versions of HTML were longer because the HTML language was SGML based and therefore required a reference to a DTD. With HTML 5 this is no longer the case and the
DOCTYPE is only needed to enable standards mode for documents written using the HTML syntax. Browsers already do this for
The following elements have been introduced for better structure:
sectionrepresents a generic document or application section. It can be used together with
h6to indicate the document structure.
articlerepresents an independent piece of content of a document, such as a blog entry or newspaper article.
asiderepresents a piece of content that is only slightly related to the rest of the page.
headerrepresents the header of a section.
footerrepresents a footer for a section and can contain information about the author, copyright information, et cetera.
navrepresents a section of the document intended for navigation.
dialogcan be used to mark up a conversation like this:
<dialog> <dt> Costello <dd> Look, you gotta first baseman? <dt> Abbott <dd> Certainly. <dt> Costello <dd> Who's playing first? <dt> Abbott <dd> That's right. <dt> Costello <dd> When you pay off the first baseman every month, who gets the money? <dt> Abbott <dd> Every dollar of it. </dialog>
figurecan be used to associate a caption together with some embedded content, such as a graphic or video:
<figure> <video src=ogg>…</video> <legend>Example</legend> </figure>
Then there are several other new elements:
videofor multimedia content. Both provide an API so application authors can script their own user interface, but there is also a way to trigger a user interface provided by the user agent.
sourceelements are used together with these elements if there are multiple streams available of different types.
embedis used for plugin content.
markrepresents a run of marked text.
meterrepresents a measurement, such as disk usage.
timerepresents a date and/or time.
canvasis used for rendering dynamic bitmap graphics on the fly, such as graphs, games, et cetera.
commandrepresents a command the user can invoke.
datagridrepresents an interactive representation of a tree list or tabular data.
detailsrepresents additional information or controls which the user can obtain on demand.
datalisttogether with the a new
inputis used to make comboboxes:
<input list=browsers> <datalist id=browsers> <option value="Safari"> <option value="Internet Explorer"> <option value="Opera"> <option value="Firefox"> </datalist>
nestelements provide a templating mechanism for HTML.
event-sourceis used to “catch” server sent events.
outputrepresents some type of output, such as from a calculation done through scripting.
progressrepresents a completion of a task, such as downloading or when performing a series of expensive operations.
rbelements allow for marking up ruby annotations.
type attribute now has the following new values:
The idea of these new types is that the user agent can provide the user interface, such as a calendar date picker or integration with the user’s address book and submit a defined format to the server. It gives the user a better experience as his input is checked before sending it to the server meaning there is less time to wait for feedback
HTML 5 has introduced several new attributes to various elements that were already part of HTML 4:
areaelements now have a
mediaattribute for consistency with the
linkelement. It is purely advisory.
areaelements have a new attribute called
pingthat specifies a space separated list of URIs which have to be pinged when the hyperlink is followed. Currently user tracking is mostly done through redirects. This attribute allows the user agent to inform users which URIs are going to be pinged as well as giving privacy-conscious users a way to turn it off.
areaelement, for consistency, now has the
baseelement can now have a
targetattribute as well mainly for consistency with the
aelement and because it was already widely supported. Also, the
targetattribute for the
areaelements is no longer deprecated, as it is useful in Web applications, for example in conjunction with
valueattribute for the
lielement is no longer deprecated as it is not presentational. The same goes for the
startattribute of the
metaelement has a
charsetattribute now as this was already supported and provides a nicer way to specify the character encoding for the document.
- A new
autofocusattribute can be specified on the
input(except when the
buttonelements. It provides a declarative way to focus a form control during page load. Using this feature should enhance the user experience as the user can turn it off if he does not like it, for instance.
- The new
fieldsetelements allows for controls to be associated with more than a single form.
formelements have a new
replaceattribute which affects what will be done with the document after a form has been submitted.
selectelements (as well as the
datalistelement) have a
dataattribute that allows for automatically prefilling of form controls, in case of
form, or the form control, in case of
datalist, with data from the server.
- The new
requiredattribute applies to
input(except when the
imageor some button type such as
textarea. It indicates that the user has to fill in a value in order to submit the form.
textareaelements have a new attribute called
inputmodewhich gives a hint to the user interface as to what kind of input is expected.
- You can now disable an entire
fieldsetby using the
disabledattribute on it. This was not possible before.
inputelement has several new attributes to specify constraints:
step. As mentioned before it also has a new
listattribute which can be used together with the
buttonalso have a new
templateattribute which can be used for repetition templates.
menuelement has three new attributes:
autosubmit. They allow the element to transform into a menu as found in typical user interfaces as well as providing for context menus in conjunction with the global
styleelement has a new
scopedattribute which can be used to enable scoped style sheets. Style rules within such a
styleelement only apply to the local tree.
scriptelement has a new attribute called
asyncthat influences script loading and execution.
htmlelement has a new attribute called
manifestthat points to an application cache manifest used in conjunction with the API for offline Web applications.
linkelement has a new attribute called
sizes. It can be used in conjunction with the
iconrelationship (set through the
relattribute) to indicate the size of the referenced icon.
olelement has a new attribute called
reversedto indicate that the list order is descending when present.
iframeelement has two new attributes called
sandboxwhich allow for sandboxing content, e.g. blog comments.
Several attributes from HTML 4 now apply to all elements. These are called global attributes:
There are also several new global attributes:
contenteditableattribute indicates that the element is an editable area. The user can change the contents of the element and manipulate the markup.
contextmenuattribute can be used to point to a context menu provided by the author.
draggableattribute can be used together with the new drag & drop API.
irrelevantattribute indicates that an element is not yet, or is no longer, relevant.
templateglobal attributes complement the data template feature.
data-*collection of author defined attributes. Authors can define any attribute they want as long as they prefix it with
data-to avoid clashes with future versions of HTML. The only requirement on these attributes is that they are not used for user agent extensions.
The following are the attributes for the repetition model. These are global attributes and as such may be used on all HTML elements, or on any element in any other namespace, with the attributes being in the
HTML 5 also makes all event handler attributes from HTML 4 that take the form
onevent-name global attributes and adds several new event handler attributes for new events it defines, such as the
onmessage attribute which can be used together with the new
event-source element and the cross-document messaging API.
These elements have slightly modified meanings in HTML 5 to better reflect how they are used on the Web or to make them more useful:
aelement without an
hrefattribute now represents a “placeholder link”.
addresselement is now scoped by the new concept of sectioning.
belement now represents a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is emboldened.
hrelement now represents a paragraph-level thematic break.
ielement now represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized. Usage varies widely by language.
- For the
labelelement the browser should no longer move focus from the label to the control unless such behaviour is standard for the underlying platform user interface.
menuelement is redefined to be useful for actual menus.
smallelement now represents small print (for side comments and legal print).
strongelement now represents importance rather than strong emphasis.
- Quotation marks for the
qelement are now to be provided by the author rather than the user agent.
The elements in this section are not to be used by authors. User agents will still have to support them and HTML 5 will get a rendering section in due course that says exactly how. (The
isindex element for instance is already supported by the parser.)
The following elements are not in HTML 5 because their effect is purely presentational and therefore better handled by CSS:
The following elements are not in HTML 5 because their usage affected usability and accessibility for the end user in a negative way:
The following elements are not included because they have not been used often, created confusion or can be handled by other elements:
acronymis not included because it has created lots of confusion. Authors are to use
applethas been obsoleted in favor of
isindexusage can be replaced by usage of form controls.
dirhas been obsoleted in favor of
noscript is only conforming in the HTML syntax. It is not included in the XML syntax as its usage relies on an HTML parser.
Some attributes from HTML 4 are no longer allowed in HTML 5. If they need to have any impact on user agents for compatibility reasons it is defined how they should work in those scenarios.
In addition, HTML 5 has none of the presentational attributes that were in HTML 4 as they are better handled by CSS: