This is the way I use HTML. I have stripped out all the legalese and the dangerous APPLET for Java feature. Later versions of HTML only added even more undesirable complex features. |
Contents |
The Structure of HTML documentsHTML 3.2 Documents start with a <!DOCTYPE> declaration followed by an HTML element containing a HEAD and then a BODY element: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <TITLE>A study of population dynamics</TITLE> ... other head elements </HEAD> <BODY> ... document body </BODY> </HTML> In practice, the HTML, HEAD and BODY start and end tags can be omitted from the markup. Every conforming HTML 3.2 document must start with the <!DOCTYPE> declaration that is needed to distinguish HTML 3.2 documents from other versions of HTML. Every HTML 3.2 document must also include the descriptive title element. A minimal HTML 3.2 document thus looks like: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <TITLE>A study of population dynamics</TITLE> |
The HEAD elementThis contains the document head, but you can always omit both the start and end tags for HEAD. The contents of the document head is an unordered collection of the following elements:
TITLE is a container and requires both start and end tags. TITLEEvery HTML 3.2 document must have exactly one TITLE element in the document's HEAD. It provides an advisory title which can be displayed in a user agent's window caption etc. Character entities can be used for accented characters and to escape special characters such as & and <. Markup is not permitted in the content of a TITLE element. Example TITLE element: <TITLE>A study of population dynamics</TITLE> BASEThe BASE element gives the base URL for dereferencing relative URLs, using the rules given by the URL specification, e.g. <BASE href="http://www.acme.com/intro.html"> ... <IMG SRC="icons/logo.gif"> The image is dereferenced to http://www.acme.com/icons/logo.gif In the absence of a BASE element the document URL should be used. Note that this is not necessarily the same as the URL used to request the document, as the base URL may be overridden by an HTTP header accompanying the document. METAThe META element can be used to include name/value pairs describing properties of the document, such as author, expiry date, a list of key words etc. The NAME attribute specifies the property name while the CONTENT attribute specifies the property value, e.g. <META NAME="Author" CONTENT="Dave Raggett"> |
The BODY elementThis contains the document body. Both start and end tags for BODY may be omitted. The body can contain a wide range of elements: The key attributes are: BACKGROUND, BGCOLOR, TEXT, LINK, VLINK and ALINK. These can be used to set a repeating background image, plus background and foreground colors for normal text and hypertext links. Example: <body bgcolor=white text=black link=red vlink=maroon alink=fuchsia>
Colors are given in the sRGB color space as hexadecimal numbers (e.g. COLOR="#C0FFC0"), or as one of 16 widely understood color names. These colors were originally picked as being the standard 16 colors supported with the Windows VGA palette.
Block and Text level elementsMost elements that can appear in the document body fall into one of two groups: block level elements which cause paragraph breaks, and text level elements which don't. Common block level elements include H1 to H6 (headers), P (paragraphs) LI (list items), and HR (horizontal rules). Common text level elements include EM, I, B and FONT (character emphasis), A (hypertext links), IMG and APPLET (embedded objects) and BR (line breaks). Note that block elements generally act as containers for text level and other block level elements (excluding headings and address elements), while text level elements can only contain other text level elements. The exact model depends on the element. HeadingsH1, H2, H3, H4, H5 and H6 are used for document headings. You always need the start and end tags. H1 elements are more important than H2 elements and so on, so that H6 elements define the least important level of headings. More important headings are generally rendered in a larger font than less important ones. Use the optional ALIGN attribute to set the text alignment within a heading, e.g. <H1 ALIGN=CENTER> ... centered heading ... </H1> The default is left alignment, but this can be overridden by an enclosing DIV or CENTER element. ADDRESSThe ADDRESS element requires start and end tags and specifies information such as authorship and contact details for the current document. User agents should render the content with paragraph-breaks before and after. Note that the content is restricted to paragraphs, plain text and text-like elements as defined by the %text entity. Example: <ADDRESS> Newsletter editor<BR> J.R. Brown<BR> 8723 Buena Vista, Smallville, CT 01234<BR> Tel: +1 (123) 456 7890 </ADDRESS> Block elements
|
ParagraphsThe P element is used to markup paragraphs. It is a container and requires a start tag. The end tag is optional as it can always be inferred by the parser. User agents should place paragraph breaks before and after P elements. The rendering is user agent dependent, but text is generally wrapped to fit the space available. Example: <P>This is the first paragraph. <P>This is the second paragraph. Paragraphs are usually rendered flush left with a ragged right margin. The ALIGN attribute can be used to explicitly specify the horizontal alignment:
For example: <p align=center>This is a centered paragraph. <p align=right>and this is a flush right paragraph. The default is left alignment, but this can be overridden by an enclosing DIV or CENTER element. ListsList items can contain block and text level items, including nested lists, although headings and address elements are excluded. This limitation is defined via the %flow entity. Unordered ListsUnordered lists take the form: <UL> <LI> ... first list item <LI> ... second list item ... </UL> The UL element is used for unordered lists. Both start and end tags are always needed. The LI element is used for individual list items. The end tag for LI elements can always be omitted. Note that LI elements can contain nested lists. The COMPACT attribute can be used as a hint to the user agent to render lists in a more compact style. The TYPE attribute can be used to set the bullet style on UL and LI elements. The permitted values are "disc", "square" or "circle". The default generally depends on the level of nesting for lists.
This list was chosen to cater for the original bullet shapes used by Mosaic in 1993. Ordered (i.e. numbered) ListsOrdered (i.e. numbered) lists take the form: <OL> <LI> ... first list item <LI> ... second list item ... </OL> The OL START attribute can be used to initialize the sequence number (by default it is initialized to 1). You can set it later on with the VALUE attribute on LI elements. Both of these attributes expect integer values. You can't indicate that numbering should be continued from a previous list, or to skip missing values without giving an explicit number. The COMPACT attribute can be used as a hint to the user agent to render lists in a more compact style. The OL TYPE attribute allows you to set the numbering style for list items:
Definition ListsDefinition lists take the form: <DL> <DT> term name <DD> term definition ... </DL> DT elements can only act as containers for text level elements, while DD elements can hold block level elements as well, excluding headings and address elements. For example: <DL> <DT>Term 1<dd>This is the definition of the first term. <DT>Term 2<dd>This is the definition of the second term. </DL> which could be rendered as:
The COMPACT attribute can be used with the DL element as a hint to the user agent to render lists in a more compact style. DIR and MENUThese elements have been part of HTML from the early days. They are intended for unordered lists similar to UL elements. User agents are recommended to render DIR elements as multicolumn directory lists, and MENU elements as single column menu lists. In practice, Mosaic and most other user agents have ignored this advice and instead render DIR and MENU in an identical way to UL elements. Preformatted TextThe PRE element can be used to include preformatted text. User agents render this in a fixed-pitch font, preserving spacing associated with white space characters such as space and newline characters. Automatic word-wrap should be disabled within PRE elements. Note that the SGML standard requires that the parser remove a newline immediately following the start tag or immediately preceding the end tag. PRE has the same content model as paragraphs, excluding images and elements that produce changes in font size, e.g. IMG, BIG, SMALL, SUB, SUP and FONT. A few user agents support the WIDTH attribute. It provides a hint to the user agent of the required width in characters. The user agent can use this to select an appropriate font size or to indent the content appropriately. Here is an example of a PRE element; a verse from Shelley (To a Skylark): <PRE> Higher still and higher From the earth thou springest Like a cloud of fire; The blue deep thou wingest, And singing still dost soar, and soaring ever singest. </PRE>which is rendered as: Higher still and higher From the earth thou springest Like a cloud of fire; The blue deep thou wingest, And singing still dost soar, and soaring ever singest. The horizontal tab character (encoded in Unicode, US ASCII and ISO 8859-1 as decimal 9) should be interpreted as the smallest non-zero number of spaces which will leave the number of characters so far on the line as a multiple of 8. Its use is strongly discouraged since it is common practice when editing to set the tab-spacing to other values, leading to misaligned documents. XMP, LISTING and PLAINTEXTThese are obsolete tags for preformatted text that predate the introduction of PRE. User agents may support these for backward compatibility. Authors should avoid using them in new documents! DIV and CENTERDIV elements can be used to structure HTML documents as a hierarchy of divisions. The ALIGN attribute can be used to set the default horizontal alignment for elements within the content of the DIV element. Its value is restricted to LEFT, CENTER or RIGHT, and is defined in the same way as for the paragraph element <P>. Note that because DIV is a block-like element it will terminate an open P element. Other than this, user agents are not expected to render paragraph breaks before and after DIV elements. CENTER is directly equivalent to DIV with ALIGN=CENTER. Both DIV and CENTER require start and end tags. CENTER was introduced by Netscape before they added support for the HTML 3.0 DIV element. It is retained in HTML 3.2 on account of its widespread deployment. BLOCKQUOTEThis is used to enclose block quotations from other works. Both the start and end tags are required. It is often rendered indented, e.g. They went in single file, running like hounds on a strong scent, and an eager light was in their eyes. Nearly due west the broad swath of the marching Orcs tramped its ugly slot; the sweet grass of Rohan had been bruised and blackened as they passed. from "The Two Towers" by J.R.R. Tolkien. FORMThis is used to define an HTML form, and you can have more than one form in the same document. Both the start and end tags are required. Forms can contain a wide range of HTML markup including several kinds of form fields such as single and multi-line text fields, radio button groups, checkboxes, and menus.
Further details on handling forms are given in RFC 1867. HR - horizontal rulesHorizontal rules may be used to indicate a change in topic. In a speech-based user agent, the rule could be rendered as a pause. HR elements are not containers so the end tag is forbidden. The attributes are: ALIGN, NOSHADE, SIZE and WIDTH.
TablesHTML 3.2 includes a widely deployed subset of the specification given in RFC 1942 and can be used to markup tabular material or for layout purposes. Note that the latter role typically causes problems when rending to speech or to text only user agents. Tables take the general form: <TABLE BORDER=3 CELLSPACING=2 CELLPADDING=2 WIDTH="80%"> <CAPTION> ... table caption ... </CAPTION> <TR><TD> first cell <TD> second cell <TR> ... ... </TABLE> The attributes on TABLE are all optional. By default, the table is rendered without a surrounding border. The table is generally sized automatically to fit the contents, but you can also set the table width using the WIDTH attribute. BORDER, CELLSPACING and CELLPADDING provide further control over the table's appearence. Captions are rendered at the top or bottom of the table depending on the ALIGN attribute. Each table row is contained in a TR element, although the end tag can always be omitted. Table cells are defined by TD elements for data and TH elements for headers. Like TR, these are containers and can be given without trailing end tags. TH and TD support several attributes: ALIGN and VALIGN for aligning cell content, ROWSPAN and COLSPAN for cells which span more than one row or column. A cell can contain a wide variety of other block and text level elements including form fields and other tables. The TABLE element always requires both start and end tags. It supports the following attributes:
The CAPTION element has one attribute ALIGN which can be either ALIGN=TOP or ALIGN=BOTTOM. This can be used to force the caption to be placed above the top or below the bottom of the table respectively. Most user agents default to placing the caption above the table. CAPTION always requires both start and end tags. Captions are limited to plain text and text-level elements as defined by the %text entity. Block-level elements are not permitted. The TR or table row element requires a start tag, but the end tag can always be left out. TR acts as a container for table cells. It has two attributes:
There are two elements for defining table cells. TH is used for header cells and TD for data cells. This distinction allows user agents to render header and data cells in different fonts and enables speech-based browsers to do a better job. The start tags for TH and TD are always needed but the end tags can be left out. Table cells can have the following attributes:
Tables are commonly rendered in bas-relief, raised up with the outer border as a bevel and individual cells inset into this raised surface. Borders around individual cells are only drawn if the cell has explicit content. White space doesn't count for this purpose except . The algorithms used to automatically size tables should take into account the minimum and maximum width requirements for each cell. This is used to determine the minimum and maximum width requirements for each column and hence for the table itself. Cells spanning more than one column contribute to the widths of each of the columns spanned. One approach is to evenly apportion the cell's minimum and maximum width between these columns, another is to weight the apportioning according to the contributions from cells that don't span multiple columns. For some user agents it may be necessary or desirable to break text lines within words. In such cases, a visual indication that this has occurred is advised. The minimum and maximum width of nested tables contribute to the minimum and maximum width of the cell in which they occur. Once the width requirements are known for the top-level table, the column widths for that table can be assigned. This allows the widths of nested tables to be assigned and hence, in turn, the column widths of such tables. If practical, all columns should be assigned at least their minimum widths. It is suggested that any surplus space is then shared out proportional to the difference between the minimum and maximum width requirements of each column. Note that pixel values for width and height refer to screen pixels, and should be multiplied by an appropriate factor when rendering to very high resolution devices such as laser printers. For instance, if a user agent has a display with 75 pixels per inch and is rendering to a laser printer with 600 dots per inch, then the pixel values given in HTML attributes should be multiplied by a factor of 8. |
Text level elementsThese don't cause paragraph breaks. Text level elements that define character styles can generally be nested. They can contain other text-level elements but not block-level elements.
Font style elementsThese all require start and end tags, e.g. This has some <B>bold text</B>. Text level elements must be properly nested - the following is in error: This has some <B>bold and <I></B>italic text</I>.User agents should do their best to respect nested emphasis, e.g. This has some <B>bold and <I>italic text</I></B>. Where the available fonts are restricted or for speech output, alternative means should be used for rendering differences in emphasis.
Note: future revisions to HTML may be phase out STRIKE in favor of the more concise "S" tag from HTML 3.0. Phrase ElementsThese all require start and end tags, e.g. This has some <EM>emphasized text</EM>.
Form fieldsINPUT, SELECT and TEXTAREA are only allowed within FORM elements. INPUT can be used for a variety of form fields including single line text fields, password fields, checkboxes, radio buttons, submit and reset buttons, hidden fields, file upload, and image buttons. SELECT elements are used for single or multiple choice menus. TEXTAREA elements are used to define multi-line text fields. The content of the element is used to initialize the field. INPUT text fields, radio buttons, check boxes, ...INPUT elements are not containers and so the end tag is forbidden.
SELECT menusSELECT is used to define select one from many or many from many menus. SELECT elements require start and end tags and contain one or more OPTION elements that define menu items. One from many menus are generally rendered as drop-down menus while many from many menus are generally shown as list boxes. Example: <SELECT NAME="flavor"> <OPTION VALUE=a>Vanilla <OPTION VALUE=b>Strawberry <OPTION VALUE=c>Rum and Raisin <OPTION VALUE=d>Peach and Orange </SELECT> SELECT attributes:
OPTION attributes:
TEXTAREA multi-line text fieldsTEXTAREA elements require start and end tags. The content of the element is restricted to text and character entities. It is used to initialize the text that is shown when the document is first loaded. Example: <TEXTAREA NAME=address ROWS=4 COLS=40> Your address here ... </TEXTAREA> It is recommended that user agents canonicalize line endings to CR, LF (ASCII decimal 13, 10) when submitting the field's contents. The character set for submitted data should be ISO Latin-1, unless the server has previously indicated that it can support alternative character sets.
Special Text level ElementsA (Anchor), IMG, FONT, BASEFONT, BR and MAP. The A (anchor) elementAnchors can't be nested and always require start and end tags. They are used to define hypertext links and also to define named locations for use as targets for hypertext links, e.g. The way to <a href="hands-on.html">happiness</a>. and also to define named locations for use as targets for hypertext links, e.g. <h2><a name=mit>545 Tech Square - Hacker's Paradise</a></h2>
IMG - inline imagesUsed to insert images. IMG is an empty element and so the end tag is forbidden. Images can be positioned vertically relative to the current textline or floated to the left or right. See BR with the CLEAR attribute for control over textflow. e.g. <IMG SRC="canyon.gif" ALT="Grand Canyon"> IMG elements support the following attributes:
Here is an example of how you use ISMAP: <a href="/cgibin/navbar.map"><img src=navbar.gif ismap border=0></a> The location clicked is passed to the server as follows. The user agent derives a new URL from the URL specified by the HREF attribute by appending `?' the x coordinate `,' and the y coordinate of the location in pixels. The link is then followed using the new URL. For instance, if the user clicked at at the location x=10, y=27 then the derived URL will be: "/cgibin/navbar.map?10,27". It is generally a good idea to suppress the border and use graphical idioms to indicate that the image is clickable. Note that pixel values refer to screen pixels, and should be multiplied by an appropriate factor when rendering to very high resolution devices such as laser printers. For instance, if a the user agent has a display with 75 pixels per inch and is rendering to a laser printer with 600 dots per inch, then the pixel values given in HTML attributes should be multiplied by a factor of 8. FONTRequires start and end tags. This allows you to change the font size and/or color for the enclosed text. The attributes are: SIZE and COLOR. Font sizes are given in terms of a scalar range defined by the user agent with no direct mapping to point sizes etc. The FONT element may be phased out in future revisions to HTML.
Some user agents also support a FACE attribute which accepts a comma separated list of font names in order of preference. This is used to search for an installed font with the corresponding name. FACE is not part of HTML 3.2. The following shows the effects of setting font to absolute sizes: size=1 size=2 size=3 size=4 size=5 size=6 size=7 The following shows the effect of relative font sizes using a base font size of 3: The same thing with a base font size of 6: BASEFONTUsed to set the base font size. BASEFONT is an empty element so the end tag is forbidden. The SIZE attribute is an integer value ranging from 1 to 7. The base font size applies to the normal and preformatted text but not to headings, except where these are modified using the FONT element with a relative font size. BRUsed to force a line break. This is an empty element so the end tag is forbidden. The CLEAR attribute can be used to move down past floating images on either margin. <BR CLEAR=LEFT> moves down past floating images on the left margin, <BR CLEAR=RIGHT> does the same for floating images on the right margin, while <BR CLEAR=ALL> does the same for such images on both left and right margins. MAPThe MAP element provides a mechanism for client-side image maps. These can be placed in the same document or grouped in a separate document although this isn't yet widely supported. The MAP element requires start and end tags. It contains one or more AREA elements that specify hotzones on the associated image and bind these hotzones to URLs. Here is a simple example for a graphical navigational toolbar: <img src="navbar.gif" border=0 usemap="#map1"> <map name="map1"> <area href=guide.html alt="Access Guide" shape=rect coords="0,0,118,28"> <area href=search.html alt="Search" shape=rect coords="184,0,276,28"> <area href=shortcut.html alt="Go" shape=rect coords="118,0,184,28"> <area href=top10.html alt="Top Ten" shape=rect coords="276,0,373,28"> </map> The MAP element has one attribute NAME which is used to associate a name with a map. This is then used by the USEMAP attribute on the IMG element to reference the map via a URL fragment identifier. Note that the value of the NAME attribute is case sensitive. The AREA element is an empty element and so the end tag is forbidden. It takes the following attributes: SHAPE, COORDS, HREF, NOHREF and ALT. The SHAPE and COORDS attributes define a region on the image. If the SHAPE attribute is omitted, SHAPE="RECT" is assumed.
Where x and y are measured in pixels from the left/top of the associated image. If x and y values are given with a percent sign as a suffix, the values should be interpreted as percentages of the image's width and height, respectively. For example: SHAPE=RECT COORDS="0, 0, 50%, 100%" The HREF attribute gives a URL for the target of the hypertext link. The NOHREF attribute is used when you want to define a region that doesn't act as a hotzone. This is useful when you want to cut a hole in an underlying region acting as a hotzone. If two or more regions overlap, the region defined first in the map definition takes precedence over subsequent regions. This means that AREA elements with NOHREF should generally be placed before ones with the HREF attribute. The ALT attribute is used to provide text labels which can be displayed in the status line as the mouse or other pointing device is moved over hotzones, or for constructing a textual menu for non-graphical user agents. Authors are strongly recommended to provide meaningful ALT attributes to support interoperability with speech-based or text-only user agents. |
Character Entities for ISO Latin-1nbsp " " no-break space ¡ iexcl "¡" inverted exclamation mark ¢ cent "¢" cent sign £ pound "£" pound sterling sign ¤ curren "¤" general currency sign ¥ yen "¥" yen sign ¦ brvbar "¦" broken (vertical) bar § sect "§" section sign ¨ uml "¨" umlaut (dieresis) © copy "©" copyright sign ª ordf "ª" ordinal indicator, feminine « laquo "«" angle quotation mark, left ¬ not "¬" not sign shy "­" soft hyphen ® reg "®" registered sign ¯ macr "¯" macron ° deg "°" degree sign ± plusmn "±" plus-or-minus sign ² sup2 "²" superscript two ³ sup3 "³" superscript three ´ acute "´" acute accent µ micro "µ" micro sign ¶ para "¶" pilcrow (paragraph sign) · middot "·" middle dot ¸ cedil "¸" cedilla ¹ sup1 "¹" superscript one º ordm "º" ordinal indicator, masculine » raquo "»" angle quotation mark, right ¼ frac14 "¼" fraction one-quarter ½ frac12 "½" fraction one-half ¾ frac34 "¾" fraction three-quarters ¿ iquest "¿" inverted question mark À Agrave "À" capital A, grave accent Á Aacute "Á" capital A, acute accent  Acirc "Â" capital A, circumflex accent à Atilde "Ã" capital A, tilde Ä Auml "Ä" capital A, dieresis or umlaut mark Å Aring "Å" capital A, ring Æ AElig "Æ" capital AE diphthong (ligature) Ç Ccedil "Ç" capital C, cedilla È Egrave "È" capital E, grave accent É Eacute "É" capital E, acute accent Ê Ecirc "Ê" capital E, circumflex accent Ë Euml "Ë" capital E, dieresis or umlaut mark Ì Igrave "Ì" capital I, grave accent Í Iacute "Í" capital I, acute accent Î Icirc "Î" capital I, circumflex accent Ï Iuml "Ï" capital I, dieresis or umlaut mark Ð ETH "Ð" capital Eth, Icelandic Ñ Ntilde "Ñ" capital N, tilde Ò Ograve "Ò" capital O, grave accent Ó Oacute "Ó" capital O, acute accent Ô Ocirc "Ô" capital O, circumflex accent Õ Otilde "Õ" capital O, tilde Ö Ouml "Ö" capital O, dieresis or umlaut mark × times "×" multiply sign Ø Oslash "Ø" capital O, slash Ù Ugrave "Ù" capital U, grave accent Ú Uacute "Ú" capital U, acute accent Û Ucirc "Û" capital U, circumflex accent Ü Uuml "Ü" capital U, dieresis or umlaut mark Ý Yacute "Ý" capital Y, acute accent Þ THORN "Þ" capital THORN, Icelandic ß szlig "ß" small sharp s, German (sz ligature) à agrave "à" small a, grave accent á aacute "á" small a, acute accent â acirc "â" small a, circumflex accent ã atilde "ã" small a, tilde ä auml "ä" small a, dieresis or umlaut mark å aring "å" small a, ring æ aelig "æ" small ae diphthong (ligature) ç ccedil "ç" small c, cedilla è egrave "è" small e, grave accent é eacute "é" small e, acute accent ê ecirc "ê" small e, circumflex accent ë euml "ë" small e, dieresis or umlaut mark ì igrave "ì" small i, grave accent í iacute "í" small i, acute accent î icirc "î" small i, circumflex accent ï iuml "ï" small i, dieresis or umlaut mark ð eth "ð" small eth, Icelandic ñ ntilde "ñ" small n, tilde ò ograve "ò" small o, grave accent ó oacute "ó" small o, acute accent ô ocirc "ô" small o, circumflex accent õ otilde "õ" small o, tilde ö ouml "ö" small o, dieresis or umlaut mark ÷ divide "÷" divide sign ø oslash "ø" small o, slash ù ugrave "ù" small u, grave accent ú uacute "ú" small u, acute accent û ucirc "û" small u, circumflex accent ü uuml "ü" small u, dieresis or umlaut mark ý yacute "ý" small y, acute accent þ thorn "þ" small thorn, Icelandic ÿ yuml "ÿ" small y, dieresis or umlaut mark |