This document is Copyright (c) Information Technology Group, www.itgroup.ro, Alin Avasilcutei, Cornel Paslariu, Virgil Mager.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 with no Invariant Sections, no Front-Cover Texts, no Back-Cover Texts.
This document is an introductory overview of the essential HTML
concepts and elements required for building HTML documents, and is not
intended to rigorously follow
the specifications of any current
HTML standard.
As with any presentation which is not intended to be an exact
reproduction, this document occasionally contains original
interpretations of some of the notions it presents. The reader is
urged to only use this document as a "Getting Started" tool, and not to
rely upon its content for reference information on the HTML language.
The reader of this document is assumed literate in programming, and
familiar with the C programming language. For an introduction to the C
programming language see The C programming language.
HTML (HyperText Markup Language) is a document format originally
designed for embedding hyperlinks in text documents; however,
later standards came to include advanced features such as embedding
various types of multi-media objects (images, sound, movies, etc),
enabling page interactivity via specialized objects (check-boxes,
drop-down lists, etc), embedding scripting facilities, etc. Given the
complexity reached by the current HTML standards, this tutorial will
sart with an introductory overview of the
basic
organization principles of an HTML document (presented in a list
format below), while a detailed presentation of the various HTML
features will follow later throughout this document and in the other
related documents contained in this package.
an 'HTML document' is a string of characters that
complies with a specific syntax called the 'HTML syntax'. An HTML
document may be contained in a text file called an 'HTML file',
or it may be generated
"live" by a running program, or it may be receved over an internet
connection,
etc.
when a web browser loads an HTML
document (from a file, or via an internet connection, etc), what will
eventually be displayed by the browser is a collection
of
'display objects' arranged in a certain way inside the browser window. For example, a three-object
document may
contain an image centered in the
browser window, followed underneath by a button marked 'See next
image', and a
followed by line of text at the bottom of the page containing a short
description
of the image, with each of these
three visual items representing a distinct 'display object'.
an HTML document is organized as a collection of 'HTML elements',
where each such HTML element is a string of characters
that complies with a specific syntax, and which is used to describe a
display object.
There is a one-to-one correspondence between each HTML element
contained in an HTML document and each of the display objects that are
displayed by a browser when then HTML document is loaded.
For example, the string of characters '<img
src="http://images.com/my_image.jpg" />' is a real-life HTML
element
as it may appear in an HTML document, and its correspondent display
object is an image that displays the file 'my_image.jpg'
located on the web server 'images.com'
the HTML elements are always
arranged in a containment hierarchy
inside an HTML document, i.e. each
HTML element is always contained inside another HTML element.
Furthermore, when an HTML document is rendered in a web browser, the
element containment hierarchy directly translates into a graphical containment hierarchy,
i.e. each display
object (which corresponds to a given HTML element) is always contained
inside another, bigger display object.
note: as with any containment hierarchy, there is one obvious
exception to the above rule, namely the 'root element' of the hierarchy,
which is the "top-level" HTML element of the document and which is
itself not contained in any other element.
The following example is a valid, real-life HTML document which, when
rendered in a web browser, places a
small icon image in the center of the browser window. Both the HTML
document and its corresponding element containment hierarchy are listed
below, followed by an introductory explanation for each line of HTML
code contained in the document:
The '<html>' and '</html>' strings in the first
and last line of the HTML document are called 'tags', and the
'html' string itself is called the tag name. A tag of the form
'<tagName>' (e.g. <html>) is called a 'start tag',
while a tag of the form '</tagName>' (e.g. </html>) is
called an 'end tag'.
A start tag together with its corresponding end tag are
collectively called
an 'HTML element',
or simply an 'element', and
everything that resides between the start tag and the end
tag of an element is called 'the element's content'. An
element's content is not part
of the element.
An element is always of a certain type, with the type
being determined by the element's tag name, e.g. the element starting
in line 3 with the start tag '<table>' (and ending with the end
tag
'</table>') is an
'element of type table', or simply a 'table element'
A typical HTML document directly
contains a single element
which is of type 'html', i.e. it starts with an '<html>'
start tag, it contains a number of other elements inside the 'html'
element, and it ends with
an '</html>' end tag. The 'html' element contained in an HTML
document is
called the root element of the HTML document, and all the other
elements contained in the document are directly or indirectly contained
inside the 'html' root element.
The HTML language syntax specifies that the elements
contained in the 'html' root
element consist of an optional 'head' element (this optional elements
is not
included in the example above), followed by a 'body' element, and it is
the 'body' element that further contains the elements that represent
the display objects that are effectively rendered by a web browser (the
allowed contents of the 'head' and 'body'
elements is detailed later throughout this tutorial).
The third line in the example above specifies that the
document's body (i.e. the document's 'body' element) contains a 'table'
element. Moreover, as it can be seen from the structure of the HTML
document above, the 'body' of this document directly contains only
a
table
element and nothing else, with the rest of the document elements being
contained inside the table
element (and thus they are only "indirectly contained" in the
document's 'body' element).
The 'border', 'width', and 'height' that are "attached to" the start
tag of the 'table' element (<table border='1'
width='100%' height='100%'>) are
called 'element attributes' and
they are used to describe various particularities of the corresponding
display object: in the example above, width='100%' and height='100%'
mean that the table will occupy the entire space of its container
display object (which in this case this means the display area of
the entire browser
window because the table is contained directly in the document 'body'),
and 'border=1' means that the table's borders will be drawn in the
browser window. As a result, the process of rendering the table element
will result in a large "box" (with visible margins) that will occupy
the entire browser window.
The fourth line in the example above specifies that the table
contains a 'table_row' ('<tr>') element. The very fact
that a 'table' element can contain 'table_row' elements is part of the
HTML language specification. As it can be seen from the structure of
the document exemplified above, the table contains a single 'table_row' element,
and by default the row will thus occupy the entire width and height of
the table (which also means the entire width and height of the browser
window in the example above).
The fifth line in the example above specifies that the table
raw contains a 'table_cell' ('<td>') element. The very
fact that a 'table_raw' element can contain 'table_cell' elements is
part of the HTML language specification. As it can be seen from the
structure of the document exemplified above, the table raw contains a single
table cell element, and by default the cell occupies the entire
width
and height of the table raw that contains it (this translates into
occupying the entire width and height of the containing table row,
which further means that the cell will occupy the entire width and
height of the browser window
in the example above).
The attributes align='center' and valign='middle' attached to the table
cell (<td align='center' valign='middle'>) specify that
whatever
content will be displayed inside the table cell, it will be
horizontally and vertically centered.
The sixth line in the example above describes an 'image'
element to be displayed inside the containing table cell. The 'src'
attribute of the image (<img src='http://images.com/my_icon.jpg'/>)
specifies the internet address of the image file that is
to be displayed. Because the container 'table_cell' element specifies
that its content
must be centered (both horizontally and vertically) inside its area,
and because the container cell occupies the entire bowser window area,
the image will thus be displayed in the center of the browser window.
Note
that unlike all the other elements in the example above which have both
a <startTag> and and </endTag>, the 'image' element uses a
simplified single-tag syntax <img [attributes]
/> (i.e. it does not have separate start and end tags);
this simplified syntax is used for HTML elements that cannot contain other
elements, and the HTML syntax specifies that elements of type
'image'
cannot contain other elements.
While the above introductory example is a very simplistic one,
it does however show how the HTML elements are arranged "inside
one-another", and
how attaching attributes to an element can change various properties
related to the associated display element. The HTML language provides
many complex options for describing the visual
properties of the display objects by means of two dedicated attributes
'style' and 'class', and the bulk of these options is
detailed in the 'Introduction
to CSS' document.
the element containment hierarchy inside an HTML document is not
"strict", in the sense that a container element can contain
more
than one other element. On the browser window, this translates in
the fact that a container display object can contain more than one
other display object.
For example, let us change the example above by adding a second image
inside the table cell:
The HTML document above now contains two 'image' elements inside
the same container (i.e. inside the 'table_cell' element), and this
will translate in having two images placed next to each other at
the center
of the browser window.
the various types of elements that may be contained in an HTML
document (e.g. images, tables, etc) cannot
be arbitrarily arranged inside one another; instead, the HTML
syntax specifies a precise set of rules that restricts their possible
order.
For example, a 'table_cell' element cannot be included inside any other
elements except 'table_row's, and 'table_rows' elements cannot be
included inside any other elements except 'table's.
an important feature of HTML is that it allows to embed
executable programs (known as 'scripts') inside an HTML document.
Specifically, the
'head' element of an HTML document can contain the source code of one
or more
scripts, which scripts may contain functions that can change the
content of the browser window when they are invoked (e.g
executing a script function may cause the browser to load a new page,
or to
change the background color of a table, etc). Typically these
script functions are invoked by events that happen in
conjunction with a display object (e.g. a script function may be
executed when a button is pressed on a
web page, or when an image is clicked, etc), but other ways of
triggering a script program also exist (e.g. a script might
be started at a certain time, or it may be repeatedly executed with a
certain
frequency, etc)
The most common scripting language that can be embedded in HTML
documents is JavaScript; this
programming language is
detailed in the 'Introduction
to JavaScript' document.
For example, let us change the previous example by adding an 'event
handler' to the image such that when the image is "clicked" the
browser will load the Google home page:
Starting from the previous example, the following additions have been
made:
The HTML element 'head' has been added to the document: this
was necessary because, according to the HTML syntax, script functions must
be placed inside the header of an HTML document, and not inside the
document's
body
A 'script' element was introduced inside the 'head' element:
according to the HTML syntax, the script functions embedded in a
document must always be defined inside a 'script' element.
A JavaScript function has been placed inside the 'script'
element, with the code of the function causing the reload of the
document with the google home page (specifically,
'location' is a JavaScript variable that represents the browser
window's "address bar", and by
changing the value in the "address bar" the function determines the
browser to load a new web page). The details about how a script
function can interact with the contents of a browser window are
presented in the 'Introduction
to the HTML
DOM' document.
Finally, an 'onclick' event handler attribute has
been added to the 'image' element, specifying that each time the
'image' display object is "clicked", the script function
'loadGoogle()' should be executed. The 'onclick' attribute is one of
many others in a large collection of predefined attributes that
together are dedicated
to "catching events" that can occur in connection with a
specific display object (e.g. to "catch" when a display object is
clicked, or when the
mouse pointer hovers over an object, etc).
The diagram below presents the basic terminology that will be
used throughout this document together with the way the terms relate to
each other, and it is followed by a more detailed
description for each of the included terms:
/---------> HTML Syntax <------------\ / complies |syntax specified by / with | HTML DOCUMENT HTML Element Types ^ \ ^ | \ contains |are instances of | \--------> HTML elements ------------/ | | ^ | | contain | implements |interprets \---------/ (i.e. "contains") | /--------------------- HTML Browser <-----------------------\ | | | | |creates/accesses/destroys | | | | | | | | | /--------------------------+ | | | | | rendered by v v | | /------------- Browser objects | | | ^ | |communicate with | | | | | | | | \--------------------------/ | V | \--> Renderer | | |user-interface for | | | | | | interacts with \----------> Display objects <------------------------ USER generates ^ |contains | DOCUMENT VIEW
HTML Element Types:
The HTML element types, or HTML types, are syntactical
patterns
specified by the HTML syntax. Each such syntactical pattern has
an associated name, called the HTML element type name, or HTML
type name.
for example, the HTML element type named 'table'
consists of a specific syntactical pattern that always starts with
'<table', the element type named 'image' consists of another syntactical
pattern that always starts with '<img', etc
HTML Elements:
An HTML element is an actual string of characters that matches
an HTML element type (i.e. its syntactical structure matches a
syntactical pattern). Any HTML element that matches the syntax of an
HTML element type is also called an instance of that element type.
for example, 'image' is an HTML element type, and it is
described by a
specific syntactical pattern in the HTML standards; in this context,
any specific
string of characters that matches the syntax of the 'image' element
type is an element of type image (or simply an image element,
or an image), e.g. the string '<img
src="http://www.WebCollection.com/images/a_nice_image.jpg"/>'
contained in an HTML document is an element of type 'image'.
HTML Document:
An HTML document is a string of
characters that complies with the HTML
syntax, and which represents an ordered collection of HTML
elements.
The
rules governing the order in which the HTML elements can be arranged
inside an HTML document are specified by the HTML DOM (see below).
HTML DOM - the Document Object Model:
As it was previously mentioned, in order to
allow for complex document structures to be built, each HTML
element contained in an HTML document is allowed to "contain"
other elements; specifically, an HTML
element may contain of an ordered
list of other HTML elements, where each contained
element may itself contain other HTML elements, a.s.o. In this context,
the HTML DOM exhaustively specifies what types of elements, how
many of them, and in what order, can be contained by an element of
a given type.
for example, the HTML DOM that specifies that a
'table' element must always contain one optional element of type
'table_head', followed by one or more successive elements of type
'table_row' (additionally, the DOM also specifies that an element of
type
'table_row' must in turn contain exclusively elements of type
'table_cell', and also specifies what elements can be contained inside
the 'table_cells'). In other words, the structural diagram below which
represents the mandatory structure
of a 'table' element is dictated
by the HTML DOM rules:
Syntax#1 is used for describing a specific instance of an element that contains
other elements
Syntax#2 is used for describing a specific instance of an element that does
not contain any other
Example: HTML
description of a paragraph that contains an
image: in this example the 'paragraph' element contains other elements
(specifically, a single 'image' element), while the 'image'
element does not (and can not) contain other elements:
the 'HTML_tag' field is an ASCII code
representing one of the inbuilt HTML types; for example, 'p' is the tag
for the 'paragraph' type,
'img' is the
HTML tag for the 'image' type, etc
note: the HTML tag names inside an HTML document
are not case sensitive
the 'attributes' field contains various
information regarding a specific instance of an element, and the list
of possible attributes that can be attached to an HTML element depends
on the element type. For example, an
attribute of the element type 'image' is 'source' which specifies the
location of the image file; specifically, the associated attribute
syntax is 'src="imageFileLocation"',
such that the complete syntax for a valid instance of type 'image' is:
<img src="imageFileLocation" />.
note: the value of any attribute attached to an
HTML element can always be represented as a string of characters,
including the case of numeric values. For example, an attribute of the
HTML element 'image' that specifies the image's display size is 'width'
which is an integer (e.g. 123), but the HTML attribute can also be
represented as a single-quoted or double-quoted string (e.g. "123" or
'123').
the 'content' field forms the syntactical
basis that allows the use of HTML elements as containers for
other elements; it consists of a list of HTML elements, with the
number, order, and type of these elements being subject to the HTML DOM
restrictions. For example, if 'HTML_tag' is 'table' (designating a
'table' element), then 'content' can only be a list of one or
more 'table_row' elements, preceded by an optional 'table_head'
element; if 'HTML_tag' is 'img' (designating an 'image' element) then
an element of this type can nolonger contain any other elements, etc
The representation of Text Strings inside
an HTML document:
There are two important characteristics that distinguish the
representation of Text Strings
from the rest of HTML elements inside an HTML document:
a string of text can represent one or more HTML elements, depending on
the whitespace characters that are part of the string;
specifically, each succession of whitespace characters splits the
string into separate "words",
and it is the "words" themselves that each represent a unique HTML
element. This distinction is important because the "words" that are
part of a text string behave as inline
elements, i.e. a long multi-word string whose length exceeded
the length of the line where it is displayed will wrap at the end of
the line by being broken strictly at
the "word" boundaries inside the string; this is illustrated in
the diagram below where a string of text (composed of inline "word
elements") placed after an image element (which is also an inline
element) wraps at the end of a display line in a web browser:
HTML document ============= <html> <body> <img src="http://someServer.com/imageFolder/IMG.gif"/> This is a long Text String which has to wrap at the end of the display lines in a web browser window. </body> </html>
View in a browser window ======================== +------------------------------------------+ |/-----\ | || IMG | | |\-----/This is a Text String which has to | |wrap at the end of the display lines in a | |web browser window. | | | +------------------------------------------+
text strings are represented inside an HTML document using a
simplified syntax, namely they do
not have any associated, text-specific start tags and end tags;
instead, a text
string is simply inserted directly between
other HTML elements;
this is illustrated in the example below
where a text string is placed between two 'image' elements (note that
because
both the 'image' elements and the "words" inside the text are inline
elements, they will all be displayed on the same line in a web browser):
HTML document ============= <html> <body> <img src="http://someServer.com/imageFolder/img1.gif"/> This is a Text String <img src="http://someServer.com/imageFolder/img2.gif"/> </body> </html>
View in a browser window ======================== +------------------------------------------------+ | /------\ | | | | | |/------\ | img2 | | || img1 | | | | |\------/This is a Text String\------/ | | | | | +------------------------------------------------+
HTML Browser, Browser Objects, Display Objects:
The notion of HTML browser is introduced by the HTML semantics:
an HTML browser is a software agent (i.e. software application) that
parses an HTML source file according to the HTML syntax, and
then behaves according to the HTML semantic rules. The
behavioral model specified by the HTML semantics contains the following
essential requirements:
after the browser parses (analyzes) a
complete HTML document, it must create internal representations
for every HTML element it identified inside the document; these
internal representations are called 'browser objects', or
simply 'objects'.
An HTML element is said to describe its corresponding browser object.
just like the HTML elements are each of a given type, the
browser objects must also be each of a given type called 'browser
object types', or simply 'object types'; moreover, for each
HTML type there must be one, and only one, corresponding object type.
each browser object type has a set of associated properties
and methods that will "reflect" the attributes of the
corresponding element type. By reading the attributes of an HTML
element (or assuming default values when the attributes are not
specified), the browser will assign specific values to the
corresponding browser object's properties and/or methods.
For example, for a given 'table' element in a document (i.e. an element
of type 'table') the browser will build an internal object of type
'table_object', etc. One of the attributes of a 'table' element is
'width', and this attribute will be "reflected" by a property 'width'
of the internal 'table_object'
the browser objects must be implemented as software
agents that can communicate with each other, and with the browser
the browser must offer an interface to the browser
objects such that they can request the browser to perform certain
actions. For example, a browser object may request the browser to
process an event that occurred in conjunction with it (e.g. when an
image is "clicked" the browser may be requested to process the click
event in a certain way); additionally, the browser objects may request
the browser to create,
access, and destroy (other) browser objects
the browser should implement some sort of a user
interface for each of the browser objects, such that the user can
somehow access these objects. For example, a browser object of type
'button_object' (which is the internal representation of a 'button'
element in the HTML document) must be visible to the user such
that the user can click on it, etc.
the process of making the browser objects accessible to
the user is called the the rendering process of the browser
objects. The end result of the rendering process is the creation of a
unique user interface for each internal browser object; these
user interfaces of the browser objects are called display objects.
By interacting with the display objects, the user (indirectly)
interacts the internal browser objects.
A display object is said to interface its corresponding browser
object.
A display object is said to reflect an HTML element when it is
the interface of the browser object described by that HTML element.
the collection of all the display objects as they are
rendered by the browser is called the 'document view'
corresponding to the HTML document
The rendering process
The way an
HTML document is actually rendered inside a browser window is
determined by the HTML rendering rules; these rules are part of
the HTML specifications, and they are implemented by the HTML
browsers as a key part of their functionality. In other words,
whereas displaying an HTML file inside a text editor will only show a
long string of characters, opening the same file inside an HTML browser
will actually produce the graphical interpretation of the HTML
document as it was designed by the document's author.
This paragraph is an introductory
overview of the general rendering
rules as specified by the HTML language, but it is not intended
either to be rigorous or to cover all the special cases that may occur
in a real-life HTML document.
although most modern browsers can start the rendering process
before loading an entire HTML document, for the purpose of this
introduction it will be considered that the browser actually starts
rendering only after the document has been completely loaded, and thus the
entire structure of the HTML document is "known" to the browser.
as it has been previously
mentioned, the
HTML elements contained in an HTML document are
organized in a containment hierarchy where each element is always
contained in another element, and the containment hierarchy of the HTML
elements translates into a graphical
containment hierarchy of their corresponding display objects. In
this context, there
are two essential rules that govern the overall rendering process of an
HTML document:
each element inside an HTML document gets allocated a
rectangular area for its corresponding display object inside the
browser window, staring with the
'body' element of the HTML document. When the rendering process
starts, the 'body' element is initially allocated a 0x0 (i.e.
zero-sized) area in the top-left corner of the browser window, and this
area will be incrementally increased during the rendering of the
document such that it can accommodate all the elements that the
'body' element contains.
the rendering process is an incremental process of
"inserting" display objects into one another, with the top-level
container for the entire document being the browser window (where the
'body' object gets allocated its initial 0x0 display area). As a
general rule, if during the rendering process an object must be placed
inside another object and the object to be placed cannot
fit in the display area of its container at the moment when it is
inserted, then the container area gets expanded such that it can
accommodate all the display objects that it must contain.
For example, in the case of the HTML document below, the
rendering process starts with allocating a 0x0 area in the top-left
corner of the browser window for the 'body', and then the 'image'
object has to be inserted inside the 'body' object. Because the 'image'
object cannot fit inside the initial 0x0
area of the 'body' object, the 'body' area is expanded (on both
directions) such that it can
accommodate the size of the image, and thus the 'image' object gets its
required area inside the 'body object that was forced to expand.
because an HTML element can contain more than one other element, a
display object's screen area may be a container for more than one
display object (e.g. a table may contain more than one table row,
each table row may contain several table cells, etc). When such a
situation occurs (i.e. when more than one display object must be
inserted into another display object's area), each display object is
graphically arranged inside its corresponding container depending on the type of
objects involved.
Specifically, each type of HTML element (e.g. 'image',
'table',
'text', etc) falls into one of two categories, namely it is either
an inline element type or a block
element type, and the category to which an element belongs determines
the
way its corresponding display object is rendered in a web browser:
inline element types:
the characteristic of inline
element types is that they are displayed horizontally one after the other
until they reach the right margin of their containing object, and any
remaining elements that do not fit on a single line are displayed one
after the other on the following line, a.s.o.
For example, the 'image' element type is an inline
element type, and thus if an HTML document contains e.g. a succession
of
images directly inside the document's 'body' element, then said
images are displayed one after the other on a single line in the web
browser window, and if the right margin of the browser window is
reached
then the "string of images" is wrapped on a new line a.s.o.:
Note: a
similar result with the one depicted in the diagram above will be
observed for any succession of
inline elements
placed inside any container
element.
block element types:
the characteristic of the block element types is that their
corresponding display objects are always displayed by themselves on a
separate line, i.e. with no preceding and no following display
objects on the same line.
For example, the 'table' element type is a block element
type, and consider two 'table' elements contained in an HTML document
in between a series of 'image' elements: because 'table' is a block
element type and 'image' is an inline element type, the browser will
render the HTML document as a line of images "broken" by the two
'table' elements, with each of the two tables being placed all by
themselves on a
separate line:
HTML document <html> <body> <img src='http://images.com/img1.jpg' /> <img src='http://images.com/img2.jpg' /> <table> [Contents of TABLE 1] </table> <table> [Contents of TABLE 2] </table> <img src='http://images.com/img3.jpg' /> <img src='http://images.com/img4.jpg' /> </body> </html>
Note: a similar result with the one depicted in the diagram above will
be seen for any block
element(s) placed inside any
container element in between other inline or block elements.
combining block elements
with inline elements: as it has been previously explained,
inline elements and block elements can be positioned inside a container
element in any order, and the way they are displayed inside a browser
window dependes on the types of elements involved; however, the
following restrictions apply
with respect to the container
element of a given element:
an inline element can be contained both inside another
inline element, as well as inside a block element
a block element can only
be contained inside another block element
an HTML element can specify (via dedicated HTML attributes)
its corresponding display object's dimensions (i.e. width and/or
height) on the browser window, and the size specification can be done
as absolute
values in pixels,
or as relative values in percents;
if the values are specified in percents, then the percentage numbers
are considered to be
requested from the object's container area.
if an object is contained directly inside the 'body'
element of an HTML document, and if the corresponding HTML element
specifies the object's dimensions as percentage numbers, then
these numbers are considered to be requested from the entire
browser window's display area, i.e. because no absolute
dimension was requested, the 'body' display object is first expanded to
the entire browser window's area before the contained object will be
granted a percentage of its area.
Example: the code below specifies percentage-based
dimensions for a table that is directly contained in the 'body'
of the HTML document, and will produce a visible empty square
box that occupies
the 25% of browser window area (the box visibility is indicated by the
attribute border='1' of the 'table' element, and the box size is
indicated by the 'width' and
'height' attributes of the table, each measuring 50% of the
browser window's horizontal/vertical dimensions):
note: percentage specifications for object dimensions may be
tricky, and they don't always yeld what seems intuitive, especially
when using nested containers-in-containers that have percentage-based
dimension specifications. Moreover, they don't alway yeld the same
visual results on different browser platforms.
the process of inserting a series of display objects inside a
container's display area defaults to staring from the top-left
corner of the container. However, the positioning of the objects to
be inserted inside a container can be influenced by some alignment
attributes specified for the container: specifically, some
container objects can have horizontal and/or vertical alignment
specifications (via dedicated attributes) such that when an object will
be rendered inside their display area it will be positioned in a
specific way.
For example, the HTML code below creates a visible
single-cell table (the visibility is determined by the border='1'
attribute) that occupies the entire browser window
(as specified by the width='100%' and height='100%' attributes relative
to the document's 'body' element), and then
places an image in the center of the cell by "attaching"
vertical and horizontal alignment attributes to that cell (the
horizontal and vertical alignment attributes are
align='center' and valign='middle'):
If the vertical alignment specification was omitted for the cell, then
the image would be placed vertically at the top of the cell, but
horizontally it would still be centered inside the cell:
According to the DOM, an HTML document consists of a single element
of type 'html', while the rest of the document structure results
from the containment of various other HTML elements inside the 'html'
element. Thus, the 'html' element acts as the root of a containment
hierarchy of elements which together describe the HTML document
The 'html' element:
Like with all HTML elements, the exact types, order, and number of
elements that may be contained inside the 'html' element are regulated
by the DOM specification: an 'html' element must contain an
optional 'head' element, followed by either of (but not both) a 'body'
element or a 'frameset' element. Depending on whether a 'body' or a
'frameset' element is contained inside the document's 'html' element,
the document can be either a single-frame document or a multi-frame
document.
Single-frame documents:
A single-frame document's structure consists of an optional 'head'
element followed by a mandatory 'body' element, with the 'body' element
being the container for all the rest of the HTML elements inside the
document. In this case the entire graphical area of the browser
window is used to display the contents of the 'body' element of the
document.
Example: an HTML document that contains a 'head' element
and a single image.
The hierarchical containment structure:
'html' element | +- 'head' element | | | +- 'title' element | +- 'body' element | +- 'image' element
The HTML syntax for describing the above structure is:
<html> <head> <title> Single-frame Document Containing Only An Image </title> </head> <body> <img src="http://someServer.com/imageFolder/myImage.gif" /> </body> </html>
Inside this hierarchy of elements, each element is a child of
the element in which it is contained, and is the parent of the
elements that it contains.
Multi-frame documents:
A multi-frame HTML document is a document that contains an optional
'head' element, followed by a mandatory 'frameset' element. A
multi-frame document "encapsulates" a collection of several
independent HTML documents that must be displayed together inside a
single display area (e.g. inside a browser window or on a single
printed page). Multi-frame documents will be detailed later in this tutorial.
This section introduces the most frequently used types of HTML elements
that can be contained inside the 'head' element of an HTML
document, together with a very limited sub-set of attributes that are
applicable to each of them.
Syntax of the 'head' element:
<head> elements contained in the document's 'head' element </head>
The most common elements that can be contained inside the 'head'
element are the following:
The 'title' element type
This element type is used to give a title to a document, and there can
be only one instance of a 'title' element in the 'head' of a
document. The title of a document is meant to be (somehow) displayed by
the browser (for example, a browser may display the document title in
the browser window's title bar, etc).
Syntax:
The title element can only contain a string of text characters:
The 'style' element type
As mentioned in the introduction section, the HTML elements contained
in a document may have a number of attributes attached to each of them,
according to what type of element they are. One such attribute
that is applicable to a large number of element types is called
'style', and it describes what it commonly referred to as "formatting
characteristics" of the associated display object. For example, the
syntax: <img src="http://myServer.com/my_image.jpg"
style="border-style: solid; border-width:10px; border-color:blue;"
/> represents an image that is surrounded by a 10-pixel-thik blue
border.
Apart from being able to attach formatting attributes to the individual
elements contained in an HTML document (as described above), a
special-purpose 'style' element can be placed in the 'head' of an
HTML document in order to describe various "formatting characteristics"
for all (or some of) the elements contained in the document's
'body', i.e. the information contained inside a 'style' element
affects the way all/some elements contained in the document are
formatted when they are displayed (see Introduction to CSS
for a detailed descriptions of styles and the ways they can be
associated with the HTML elements inside a document).
- language_used_for_style_specification defaults to "text/css", which
is the "Cascading Style Sheets" desciption language
- inline_style_description is a sequence of style descriptors (using
the syntax of the specified style description language)
- external_style_file describes the complete file name and location, or
URL, of a separate file contaning style descriptors; the contents of
this file will be "imported" by the 'style' element.
Example1:
An inline style descriptor - set the background color of the entire
document to blue:
Example2:
External style descriptor - the variouls style descriptors contained in
the file 'funnyStyle.css' are "imported" inside the 'style' element,
and will thus be used by the document:
The 'script' element type
An element of type 'script' acts as a container for a source code
module (in various scripting languages) that is said to be attached
to the document. The source code contained in a 'script' element
generally groups only declarations and/or definitions of
functions, variables, etc. Just like with any other HTML element, after
the browser parses the 'script' element it creates an associated
browser object of type 'script_object' which has a number of properties
and methods; these properties and methods are created based on the code
inside the 'script' element, and they can be used by other browser
objects described in the HTML document. For example, if a 'script'
element contains the code for a function that brings up a "hello world"
pop-up message, then this function will be "reflected" into a method
associated to the script browser object. The details on exactly how the
various objects described by the HTML elements in a document will be
able to access a script object's methods (and properties) are presented
in conjunction with each such object in the following sections (where
applicable).
- scripting_language_specification specifies the language contained
inside the 'script' element. The default value is "text/javascript"
which specifies the JavaScript language
- inline_script_code is a source code sequence that defines functions,
varibales, etc
- external_script_location describes the complete file name and
location, or URL, of a separate file contaning script code; the
contents of this file will be "imported" by the 'script' element.
This section introduces the most frequently used types of HTML elements
that can be contained inside the 'body' element of an HTML
document, together with a very limited sub-set of attributes that are
applicable to each of them.
Syntax of the 'body' element:
<body bgcolor="document background color in 6-digit hex format RRGGBB, e.g. #FF8301"> [elements contained in the document's 'body' element] </body>
The following list presents the most common types of elements that can
be contained inside the 'body'
element, together with their rendering
rules-based classification as inline
or block element types:
'image_file_name' is the path to the image file (relative
path or full URL)
the 'width' and 'height' attributes can be used to
rescale/distort the image from its real size to a new set of horizontal
and vertical dimensions (the 'size_x' and 'size_y' dimensions can be
expressed both in pixels and as percentage of the size of the image's
container element).
note: if an image is rescaled, then the 'width' and
'height' attributes should always be
used together, or else they may give unexpected results
the 'align' attribute can be used if the image is
placed at the beginning of a succession of inline elements in the
HTML code: in this
case, 'align_position' specifies how the series of inline elements will
"flow around" the image
and on which side the image will be placed.
The possible values are "left" and "right".
For example, if the series of inline elements is simply a string of
text,
then:
align="left" :
HTML document ============= <html> <body> <!-- {image+text} --> <img src="IMG.png" align="left"/> Text that will flow around the image and the image will be placed in the top-left corner of the {image+text} paragraph </body> </html>
Display ======= +----------------------------------+ |/---\ Text that will flow around | ||IMG| the image and the image will| |\---/ be placed in top-left corner| |of the text+image paragraph | | | | | +----------------------------------+
align="right" :
HTML document ============= <html> <body> <!-- {image+text} --> <img src="IMG.png" align="right"/> Text that will flow around the image and the image will be placed in the top-right corner of the {image+text} paragraph </body> </html>
Display ======= +----------------------------------+ |Text that will flow around /---\| |the image and the image will |IMG|| |be placed in top-right \---/| |corner of the text+image paragraph| | | | | +----------------------------------+
'text_tip' is a text to be displayed by the browser when
the mouse cursor hovers over the image
'image_text' is a text to be printed by the browser
instead
of the image until the image loads
As it was described in the "Introductory
overview", the "Document text" chapter, the HTML syntax does not
provide a "true" element type for text strings; instead, the text
strings contained inside an HTML document are
represented by contiguous strings of characters organized into
"words", and which reside between
two consecutive HTML tags inside the
document. Because of historical reasons, and because text elements
are by far the most frequent objects in an HTML document, text strings
do not have any associated text-specific HTML tags, thus both reducing
the
size of an average document, and
also allowing the text sequences contained inside an HTML document to
be easily
readable by directly inspecting the document in a text editor.
Example:
HTML document HTML containment hierarchy
<html> document <body> +---body some loose text here +---text <img src="myImage.jpeg"/> +---image </body> </html>
In the example above, the string of characters "some loose text here"
is a text element. As it can be seen, it has no associated tag, and it
can be identified as a text element solely by the fact that it
is placed between two other HTML tags; specifically, the text starts
right after the start tag of the 'body' element, i.e. it is the first
element inside the document body, and it is followed by an 'image'
element inside the document body. The above HTML document will be
rendered in a browser window as a line containing the text,
immediately followed by the image in the file 'myImage.jpg' (on the
same line).
Each character in a text string can be represented in
several ways, including via their unicode character code, while a
number of characters have to be represented in a special way
inside the HTML document. The following list is a summary of how
characters can be represented:
textual_ASCII_character - for regular ASCII
characters
&#ASCII_code; - for the ASCII character set
&#unicode_code; - for the extended unicode
character set
escpae sequences - for the situation when a
character that has special HTML significance (such as '<', '&',
etc) would be interpreted instead of taken as-is:
< - for '<'
> - for '>'
- for successive spaces (otherwise
successive spaces are collapsed into one)
& - for '&' itself
etc...
the 'CR' character (i.e. new line) can be
represented, apart from its unicode code, as a standard single-tag HTML
element: '<br/>' . However, unlike most HTML elements, the
'<br/>' representation of the 'CR' character can have no
attributes attached to it.
Example (expansion of the one above):
<html> <body> some loose text here <br/> <img src="myImage.jpeg"/><br/> the HTML code sequence for the character '<' is '&lt;' </body> </html>
The above code will be rendered in an HTML browser as one line of text
(i.e. "some loose text here"), followed under it by the image
'myImage.jpeg' (the image is placed under the first line of
text because of the <br/> that follows the first line of text),
and the last line of text will finally be placed under the
image (because of the <br/> that follows the image) and it will
read: "the HTML code sequence for the character '<' is
'&lt;'"
whitespace characters inside an HTML document
(i.e. tabs, blanks, CR) are all rendered by the browser as one space.
Also, any succession of whitespace characters within an HTML
document is also rendered as one single space character by the
browser.
for example, if an HTML document contains a text
string with two words separated by several whitespace characters (e.g.
several consecutive spaces, or several spaces alternated with newline
characters, etc), then the two words will be displayed in a web browser
separated by a single space.
The fact that the individual characters in a text are
not described using the typical HTML element syntax also implies that
there is no way to specifically "attach" attributes to an individual
character. If HTML would have had a special element type character
(e.g. <char character_representation />) then such a
syntax could have permitted attaching attributes to each
individual character (i.e. <char attributescharacter_representation
/>), but since characters are represented textually in the HTML
document this possibility does not exist; instead, the characters
contained in a text string inherit attributes from their parent
element.
Example - characters inheriting the 'background_color'
attribute from their parent element 'body':
<html> <body bgcolor="#0000ff"> this sequence of characters inherits the background color property: <br/> thus, this text will be displayed on a blue background (#0000ff is blue) </body> </html>
For historical reasons, this type of element constitutes the
basis of the HTML hyperlink system. The main roles of an element of
type 'anchor' are:
to create hyperlinks; this is achieved by placing a
sequence of elements inside an 'anchor' object; in this way the
entire display area on which the corresponding anchor display
object is rendered will become a hyperlink (the display objects
rendered inside an anchor object' display area may be 'text' display
objects, 'image' display objects, etc)
to serve as a "bookmark" within a document, such that a
hyperlink can target it
Note: most browsers offer some way of visually hinting objects
contained inside an anchor object if the anchor specifies a
hyperlink target; the most common scenario is the text inside an
anchor object being underlined
Syntax
<a href="target" [name="bookmark_name"] [target=_blank]> objects contained by the anchor object </a>
<a name="bookmark_name"/>
- target is a file location or a complete URL that specifies the
hyperlink target
- the target format is:
- bookmark_name is a string specifying the name of the anchor
- the optional construct 'target=_blank' will cause the
hyperlink to open in a new browser window when it is clicked (or in a
new browser tab, depending on the browser type and its default
settings)
Example1:
The text "click here" in the example below will be displayed by a
browser as a hyperlink text; by clicking this text, most
browser will redisplay this text at the top of the page (this happens
becuase the document will actually be reloaded by the browser, and then
the document page will be positioned with the referenced anchor at the
top of the screen)
<a href="thisDocument.html#thisBookmark" name="thisBookmark"> click here </a>
Example2:
The anchors in the examble below will be completely invisible on the
screen because no visible object is contained inside the anchor object'
rendering area (i.e. the anchor will still exist in the document as a
bookmark, but it will not be usable as a hyperlink):
These two types of elements are very similar with each other,
and are the basic block-formatting types used inside an HTML
document: specifically, the
purpose of a block-formatting element is to group together an arbitrary set of
other elements that will be rendered in a browser window inside a
rectangular "box", i.e. a 'div' or 'paragraph' will always start
at a new
vertical position in the document view, and will always be followed
by the following object described in the HTML document at a new
vertical position.
note: if no
formatting attributes are attached to a 'paragraph' or 'div'
element, it is up to the browser to chose how much space will separate
their corresponding display objects inside the browser window, or how
to indent the text contained inside them, etc; however, the typical behavior of a web browser
is to vertically separate a "paragraph box" from its preceding and
following display objects with several pixels, while a "div box" will
have no vertical spacing between it and its preceding and following
display objects.
Syntax:
<p [align="left"/"center"/"right"/"justify"]> objects contained inside the paragraph </p>
Note: there are many more 'paragraph' and 'div' formatting attributes
which are not described in this document; see the Introduction to CSS
document for a detailed description of the various block-formatting
options available in conjunction with the 'div' and 'paragraph'
elements
Example: two sets of elements, each grouped inside a 'div'
element, will be rendered inside two vertically-adjacent "boxes" in a
web browser window
HTML document ============= <html> <body> <div> Div#1 and an image: <img src="IMG1.jpeg"/> </div> <div> Div#2 and an image: <img src="IMG2.jpeg"/> </div> </body> </html>
note: the "div
boxes" themselves are not
visible in the web browser, and their illustration above is only
informative
The 'span' element type (inline element)
This type of element is the basic inline-formatting tool used
in HTML documents. The primary role of a 'span' element is to group
together other HTML elements without modifying in any way the
aspect of their associated display objects in the document view (as
compared to the situation when the elements would not have been grouped
inside a 'span' element). Thus, gathering a set of HTML elements inside
a 'span' element that has no attributes specified will not have
any perceivable effect in the document view; however, if various
attributes are attached to a 'span' element (e.g. specifying a font for
the text contained inside, etc), these attributes will
potentially alter the way the contained elements are displayed and/or
how they behave in a browser window.
A 'list' element groups together only elements of
type 'list_item' in order to be laid out in a list format, where every
'list_item' element will be eventually displayed preceded by some type
of visual hint (bullet, number, letter, etc).
There are two frequently used types of list elements, the un-ordered
list 'ul' which hints each list item with some non-alphanumerical
characters (bullets, squares, etc), and the ordered list 'ol'
which uses numbers, letters, or both.
Syntax:
<ul> list_item_1 ... list_item_N </ul>
Note: the above syntax describes an un-ordered list for an ordered list 'ol' is used instead of 'ul'
An element of type 'list_item' can group together other
HTML elements (including 'list' elements, but excluding other
'list_item' elements), and it can only be used when contained
in a 'list' element
Syntax:
<li> HTML elements contained inside the list_item element </li>
Example: an ordered list containing one list item, where
said list item contains a link, a line break, and another list; the
latter list then contains one list item of type text
HTML code: ==========
some text before the ordered list <ol> <li><a href="http://google.com">a link to google</a> <ul> <li>some more text</li> </ul> </li> </ol> some text after the ordered list
The 'iframe' element type (block element)
An 'iframe' ("internal frame") element is used for embedding a
complete HTML document inside another HTML document. From a visual
point of view, a rectangular area is created in the "parent" document,
and inside that area a fully-featured HTML "child" document is
displayed as specified via a file name (or a complete URL).
Note: Most browsers do not allow the "empty element"
syntax to be applied to iframe elements: The syntax: <iframe attributes
/> is not valid on most browsers!
- embedded_document_fileName is the complete path, or the URL, of the
document to be embedded inside the 'iframe'
- the 'scrolling' and 'frameborder' attributes default to 'yes' on most
browsers
Example: create a 500x300 frame and display the google home
page inside
The 'table' element type (block element)
A 'table' element is a container for table_cell elements, and
its visual representation consists of a rectangular display area (with
our without a broder, depending on specific attributes) that contains
the table cells.
The syntax of a 'table' elements is: <table> contents
of the table </table>, and the contents of the table always
consists of a succession of table_row elements ('<tr>')
with each table_row element being a container for a succession of table_cell
elements ('<td>'):
<table [attributes]> <tr [attributes]> <td [attributes]> cell contents </td> [other optional cells inside the table row] </tr> [other optional rows inside the table, each containing other cells] </table>
Table rows
the table_row elements inside a table are always
displayed vertically in the document view, one under the other
the sytnax of a 'table_row' element is <tr> contents
of the row </tr>
table_row elements can only be contained inside a
'table' element
table_row elements cannot contain any other type of
element except 'table_cell'
Table cells
the table_cell emements inside a table_row are always
displayed horizontally in the document view, one after the other
the sytnax of a 'table_cell' element is <td> cell
contents </td>
table_cell elements can only be contained inside a
'table_row' element
table_cell elements may contain any type of HTML
element that ca be contained inside the 'body' element of an HTML
document (e.g. text, images, lists, even entire tables)
Based on their cell structure, table elements can be classified in regular
tables and irregular tables:
A regular table consists of a succession of table_row
elements, where each table_row has the same number of
table_cell_elements. Thus, a regular table will be rendered as a uniform
grid of rows and columns, with a table_cell elements positioned at each
row/column intersection. However, the width and/or height of
the rows and columns does not need to be the same for all rows and
columns: these are determined during the rendering process of the
table, and they will depend on the contents of the table cells.
Example: a table with two rows, each row containing three
cells
An irregular table is similar to an regular table,
but it allows a table_cell to occupy more than one column and/or row
inside the table. Specifically, each table cell can have attached
attributes that specify how many rows and/or columns that cell
"captures" during the rendring process.
the 'colspan' attribute attached to a cell
specifies that the respective cell will occupy the specified number of
columns in the table
the 'rowspan' attribute attached to a cell
specifies that the respective cell will occupy the specified number of
rows in the table
Note: becuase the posision of the third cell in the
first row of the table is "captured" by the second cell, the definition
of the third cell inside the first table_row element is simply omitted.
Note: becuase the posision of the second cell in the
second row of the table is "captured" by the first row's second cell,
the definition of the second cell inside the second table_row element
is simply omitted.
Common formatting attributes for tables:
The most frequently-used attributes in conjunction with 'table'
elements (and table_rows and table_cells) are:
bgcolor, align, valign:
these attributes can be attached to the 'table' element, to a
'table_row' element, or to a single 'table_cell' element, and they
specify the background color, the horizontal alignment, and the
vertical alignment for the contents of that element. A "child
element" (e.g. a 'table_row' inside a 'table') can specify a different
value for an attribute than the parent element, but if no such
attribute is specified than it will be inherited from the parent.
'align' and 'valign' cannot be specified for 'table'
elements
the value for 'bgcolor' has the form: "#6_hex_digits"
(e.g.: bgcolor="#ffab01")
the values for 'align' are: "left", "center", "right"
the values for 'valign' are: "top", "middle", "bottom"
defaults: vlaign="middle" and align="center"
width, height:
these attributes are applicable to the entire 'table' element, to a
'table_row' element contained inside a table, or to a 'table_cell'
element, except as follows:
'width' cannot be applied to 'table_row' elements (a
rows' width is the width of the entire table)
'height' cannot be applied to 'table_cell' elements (a
cells' height is determined from the height of the row(s) in which it
resides)
The values for these attributes can be either absolute numbers, or they
can be expressed in percents of the width/height of the parent
element in which they are contained (e.g. specifying that the width
of a cell is "50%" means that it is half the width of the row in which
it resides, which also means half the width of the table that contains
it).
border, cellspacing, cellpadding:
these attributes are applicable only to the 'table' element, and they
affect the overall graphical appearence of the table in the document
view.
A graphical illustration of a one-row, two-cell table:
+-------------------------------------+ | S | | ............... ............. | | : P : : P : | | : OOOOOOO : : OOOOO : | | S : P OOOOOOO P : S : P OOOOO P : S | | : OOOOOOO : : OOOOO : | | : P : : P : | | :.............: :...........: | | S | +-------------------------------------+
OOOOOO OOOOOO the contents of a cell OOOOOO ------ table border ...... inner cells borders S cellspacing P cellpadding
the border attribute:
the value is an integer number specifying the thickness of the outside
border of the table. On most browsers, if the value is non-zero then
the internal cells' borders are also displayed as a thin (1 pixel)
line, while if the value is zero then both the outer border and the
ineer cells' borders are not displayed.
The default value on most browsers is zero, i.e. the table borders are
invisible.
the cellpadding attribute:
the value is an integer number specifying how much space will be left around
the contents of a cell, from an imaginary rectangular box tightly
surrounding the cell's contents up to the actual cell's borders.
Example: if a cell contains an 100x100 pixel image
and the 'cellpadding' is set to 10, then the cell will have an area of
120x120 pixels (i.e. 100x100, plus 10 "padding pixels" on each side of
the image).
If both the dimensions of a cell and a cellpadding are specified (i.e.
via 'width', 'height', and 'cellpadding' attributes), then the actual
size of the cell on each direction will be the maximum size that
results from the two specifications.
Example: for an 100x100 pixel image example placed
inside a cell, if the size of the cell is set via 'height=105' and
'width=115' attributes, and padding is apecified as 'cellpadding=20',
then the effective height of the cell will be max(100+10, 105)=110,
while the width will be max(100+10, 115)=115 (i.e. the height will
actually result from the padding, while the width will result from the
explicit cell width specification).
The default value for 'cellpadding' is browser-dependent (usually a
couple of pixels)
the cellspacing attribute:
the value is an integer number specifying the distance between the
cell borders inside the table (i.e. the thickness of the imaginary
grid lines that separates the cells inside the table).
The default value for 'cellspacing' is browser-dependent (usually a
couple of pixels)
the paragraph element in which the table is placed causes the
table to be centered horizontally on the screen
the table occupies 70% of the screen horizontally, and it
fills the screen vertically
the 'broder=5' specifies a 5 pixel thickness for the table's
external border, and also specifies that the internal cells borders are
visible
the 'cellspacing=3' determines a 3-pixel thickness of the
grid that separates the table cells
the height of the first row is set to be half the height of
the table (which for this table means half the height of the screen)
the width of the first cell automatically specifies the width
of the first column in the table, and it is set to hald the width of
the table (which for this table means half of 70% of the width of the
screen); the remaining width of the table (after the first column)
is automatically distributed by the browser for the two remaining
columns
the 'height=50' specification on the second row determines a
height of 50 pixels for this row; the third row in the table will
automatically occupy the rest of the table height
the overall background color of the table is set to a
relatively dark grey (#aaaaaa)
the table structure is as follows:
the background color for the third rwo is specified as a
relatively light grey (#cccccc) which overrides the overall
specification of the background color of the table, and the second cell
on this row yet again overrides the row color specification with a whit
background color
the third row makes a global alignment specification
align='right' to be used for displaying the elements in all the cells
inside the row; however, this specification is only used inside the
second cell, because the first and third cells defined their own
alignment which overrides the row-wide apecification
Note on Liquid Design using tables
A document whose contents are arranged such that the proportions
between the distances separating the various document elements is
maintained when resizing the display window is referred to as a 'liquid
design'. Liquid designs have a number of advantages over
fixed-layout designs, with the most important of them being that they
allow the reader to choose the display window size and the paper size
for printing without losing the proportions of the layout (i.e. one can
position elements relative to one-another, or to the document edges,
based on golden numbers, etc, and these characteristics of the layout
will not be lost when the document is resized).
A number of variations can be made based on the "purely" liquid design
generic rules, such that certain parts of the document have fixed
sizes, or their positon can be fixed relative to a corner of the
browser window, etc.
The feature set of the 'table' element (e.g. merged cells in irregular
tables, both percentage-based and pixel-based dimension specifications,
background colors, etc), together with the fact that any HTML element
that can be placed inside a document's 'body' can also be placed inside
a 'table_cell' element, make tables appropriate for desiging the
"supporting framework" of liquid (or partially liquid) documents.
Note:
A 'horizontal_line' object is always rendered inside the display
area of its container display object, i.e. obeying the same rules
as any other HTML object. Thus, if a 'horizontal_line' element appears
inside a list, table cell, etc, it will be displayed inside that
object's display area only and not on the full width of the
document page.
These legacy elements have the characteristic that they inherit
almost every html attribute (i.e. they allow almost all html attributes
to propagate through them down the document hierarchy), while
setting only a few specific text formatting attributes to new values.
These elements are always invisible (i.e. they do not have any
associated display objects), and are only intended to be used as
containers for text elements, paragraphs, span, div, and lists
(note however that these elements can be nested):
Synatx:
»bold: <b> contained elements </b> »italic: <i> contained elements </i> »underline: <u> contained elements </u> »strikeout: <strike> contained elements </strike> »blinking: <blink> contained elements </blink> »subscript: <sub> contained elements </sub> »superscript: <sup> contained elements </sup> »text heading: <hN> contained elements </hN> N=1,2,3,4,5,6 »preformatted: <pre> contained elements </pre> »font: <font [face="font_name"] [size="size"] [color="#32-bit_hex_RGB"]> containd elements </font>
Example:
<html> <body> <font face="arial"> <p> <!-- the paragraph inherits the "arial" font specification--> This text is displayed using *arial* font <br/> <font face="monospace"> This text is displayed using *monospace* font <br/> </font> This text *reverts to using arial* font <br/> </p> </font> </body> </html>
Non-wrapping
lines (inline element)
As it was explained in the 'Rendering process'
paragraph, a succession of inline elements will be displayed by
default on a line that will wrap when it reaches the right margin of
its container element; in this context, the HTML element
'non_breaking_line' ('<nobr>') forces a series of successive inline elements to be
displayed on a single line without
wrapping when the line size exceeds the container's width, and
instead force the containing element's width to expand as needed by the
length of the non-wrapping line of inline elements.
For example, consider a single-cell table that occupies 50%
of the browser window's width, and consider that the table cell
contains a long line of text; in this case, the default behavior of the
text line is to wrap inside the table cell without affecting the
table's width, but if the text is placed inside a <nobr>
container element then it will be displayed on a single non-wrapping
line and the table width will be expanded as required to accommondate
the entire line (if the line length forces the table to grow larger
than the width of the browser window, then the table will be expanded
horizontally "outside" the browser window area and the browser will
display a horizontal scroll bar to allow horzontal scrolling through
the entire table):
HTML without <nobr> Display in browser <html> +----------------------------+ <body> |+------------+ | <table width="50%"> ||a long line | | <tr> ||of text... a| | <td> ||long line of| | a long line of text... ||text... a | | a long line of text... ||long line of| | a long line of text... ||text... | | </td> |+------------+ | </tr> | | </table> | | </body> | | </html> +----------------------------+
HTML with <nobr> Display in browser <html> +----------------------------+ <body> |+---------------------------| <table width="50%"> ||a long line of text... a lo| <tr> |+---------------------------| <td> | | <nobr> | | a long line of text... | | a long line of text... | | a long line of text... | | </nobr> | | </td> | | </tr> | | </table> | | </body> <<O=========================>> </html>
Interactive elements
The 'interactive elements' are a special class of elements that
have as main purpuse to enable the reader of a document to interactively
respond to the contents of a document. For example, an HTML page
may contain two "buttons" that the reader may "press" in response to a
message contained in the page (e.g. a button labeled with the text
"Agree" and another one with "Disagree"); another example of
interactive element is a text-entry line (e.g. as found on the search
engines' pages where the search terms are being input), etc.
As with all HTML elements, for each interactive element found in an
HTML document the browser creates an internal interactive browser
object, which in turn is then rendered by the browser as an interactive
display object when the HTML document is displayed.
Note: the 'HTML element', 'browser object', and 'display
object' terms were introduced in the Terminology
-> browser section
There are three basic ways in which an interactive object can be used
to "capture" user actions:
generating events:
In this scenario, an interactive element has a special kind of "event
catcher" attribute attached to it, which specifies that when a certain
event happens in conjunction with the corresponding interactive display
object, a specific script function (which must be contained in the
document) gets executed.
Example: a button with an associated "event catcher"
attribute which specifies what script function to be executed when the
button is clicked
<html> <head> <script> // a script that defines a JavaScript function function doSomething() { alert("Button was clicked!"); } </script> </head>
Note: this way of using an interactive element is not
specific to interactive element types, and can be used in conjunction
with almost any type of HTML element (e.g. one can specify a certain
JavaScript function to be exectuted when a button is clicked, but also
when an image is clicked, or a table cell, etc)
reading the object state using a script:
A distinctive feature of some of the interactive element types
is that their corresponding display objects have a state. For
example, a checkbox display object (which is the graphical
representation of a checkbox-type interactive element) can be either
'checked' or 'unchecked' when displayed as part of an HTML document,
etc. All the interactive display objects that have a state allow
their state to be read (and modified) by scripts embedded in the
document, and thus allow the scripts to perform different actions
depending on the state of the objects.
Note: there are two ways to access the interactive
display objects' states from embedded scripts: a "JavaScript/HTML"
way and a "JavaScript/DOM" way. Accessing the interactive
objects' states via JavaScript/HTML is presented in the Introduction to JavaScript
document - the HTML Integration
-> Legacy JavaScript/HTML section, while the JavaScript/DOM
method is detailed in the Introduction to the HTML DOM
document.
using forms:
A form is a special container element for one or more interactive
elements, designed to enable user interactivity with an HTML page
by means of a dedicated application running on a remote server.
In brief, a form specifies the URL of an application running on a
remote server, where said application is designed such that it can
process the states of all the interactive elements containd inside a
form. A distinctive feature of a 'form' element is that it contains a
'submit' button: when the 'submit' button is "pressed", the states of
the interactive objects contained inside the form is automatically sent
by the browser to the remote application for processing.
Example: for a form with a checkbox, a server application
may analyze the state of the checkbox and then send a page to the
browser with the text "Checkbox is checked" or "Checkbox is unchecked",
depending on the state of the checkbox. Thus, the user can start by can
either checking or unchecking the checkbox, then press the form's
'submit' button, and as a result the browser will display an HTML
document received from the remote server application cotaining a text
such as "Checkbox is checked" or "Checkbox is unchecked".
Form elements are discussed in greater detail in the 'Forms'
section below.
The following paragraphs introduce the most commonly used interactive
elements in HTML pages. Only a limited number of properties are
presented in conjunction with each element, with the optional
properties being shown in square braces.
Text entry (inline element)
A text entry element is displayed by the browser as a one-line
editable text box in which one can edit a string of characters.
Note: the 'text entry' element does not have its own
dedicated
HTML tag name, and is an 'input' element type.
Text area (inline element)
A text area element is displayed by the browser as a multi-line
editable text box in which one can edit a string of characters.
Note: most browsers do not allow the "emty element"
syntax to be applied to 'textarea', i.e. the syntax <textarea attributes
/> is not supported by most broswers.
Password entry (inline element)
A password entry element is displayed by the browser as a one-line
editable text box in which one can edit a string of characters, but
where all characters are "masked" by being displayed as a '*' (or other
"masking character").
Note: the 'password entry' element does not have its own
dedicated
HTML tag name, and is an 'input' element type.
Button (inline element)
A button element is usually displayed by browsers as a pressable
button, similar to the "buttons" that are available in various
other applications.
Syntax:
<input type='button' [value="text displayed on the button"] />
Note: the 'button' element does not have its own
dedicated HTML tag name, and is an 'input' element type.
Checkbox (inline element)
A checkbox element is an interactive element that can have two
states: "checked" or "unchecked". Each checkbox state is displayed
differently by the web browser, using a "checkbox is checked" or a
"checkbox is unchecked" icon (depending on the state of the checkbox).
Note: the 'checkbox' element does not have its own
dedicated HTML tag name, and is an 'input' element type.
Radio buttons (inline element)
A radio button element is an interactive element that can have two
states: "selected" or "unselected". Each radio button state is
displayed differently by the web browser, using a "button is selected"
or a "button is unselected" icon (depending on the button's state).
a radio button is intended to be used as part of a
collection of two or more radio buttons: a group of radio buttons
is created by having several radio buttons with the same 'name'
attribute, and each button in the group having a distinct
'value' attribute which acts as an identifier of the button
within the group (e.g. a radio button group name might be
'color-selector', and the 'value' property of each radio button in the
group might be 'red-button', 'green-button', 'blue-button', etc).
within a radio button group at most one button may
have a 'checked' attribute, which will cause that radio button in
the group to be displayed as "selected", while the other radio buttons
will be displayed as "unselected".
a radio button group is rendered by the browser by
displaying the individual radio buttons just like if they were any
other HTML element (i.e. they can reside successively on a single line,
or they may on successive lines separated by <CR>s, etc), with
the distinctive characteristic that when one radio button is
"selected" the other radio buttons in the group are always automaticaly
"unselected".
when used inside a 'form', a radio button cannot be
treated independently from the other buttons in the same group;
instead, the form treats an entire radio button group as a whole and
"synthesizes" a unique "value of the button group" based on the
'value' attribute of the button that is selected in that group.
for example, let us consider a button group with two
buttons, where both buttons have the 'name' attribute 'myGroup' (i.e.
the "name of the group" is 'myGroup'). Let us also consider that the
first button has the 'value' attribute "button_1", the second button
has the 'value' attribute "button_2", and the second button is the
selected one in the group. In this situation, the "value of the button
group" 'myGroup' (as transmitted to the server when the form's 'submit'
button is pressed) is "button_2". If the first radio button is pressed,
then the second button automatically gets deselected and the "value of
the button group" becomes "button_1".
Note: the 'radio' element does not have its own dedicated
HTML tag name, and is an 'input' element type.
Drop-down list (inline element)
A drop-down list element is an interactive element that is usually
displayed the way a standard drop-down list is displayed in other
applications, i.e. a "hidden" list of elements out of which only one is
shown at any given time. The shown item in the list is said to be "the
selected item", and the user can interactively change which item is
selected by clicking on the list, except for the items that
have the 'disabled' attribute which are displayed in the list but
cannot be selected.
When a drop-down list element is used inside a form, the "value of
the drop-down list" (which will be sent to the server when the
form's 'submit' button is pressed) will be the 'value' attribute of the
selected element.
Syntax:
<select name='checkbox_name'> <option value='item_ID' [selected] [disabled]> list item text </option> [more 'option' items, but only one 'option' can have 'selected' attribute] </select>
the item marked with the 'selected' attribute is the
one that is selected by default when the HTML document is rendered
(i.e. before the user interacts with the drop-down list and changes the
selection). There can be only one item marked with a 'selected'
attribute in a drop-down list.
items marked with a 'disabled' attribute are shown in
the list, but they cannot be selected by the user.
Example with two disabled items:
<html> <body> <form action='http://myServer.com/server_application.exe'> Pick a color: <select name='color_choice'> <option value='black' disabled> Black </option> <option value='white' disabled> White </option> <option value='red' selected> Red </option> <option value='green'> Green </option> <option value='blue'> Blue </option> </select> <input type='submit' /> </form> </body </html>
Selector (inline element)
A selector list element is an interactive element that is usually
displayed as a list of several items arranged one under the other, with
zero or more items being "marked as selected" in
a certain way (e.g. by being displayed in inverse-video colors). Whether
a single item in the list can be selected, or multiple-selection is
allowed, is determined by the 'multiple' attribute of the selector
element.
The number of shown items in the list may be smaller than the total
number of itmes, in which case the list will be featured with a
vertical scroll bar to allow access to all its items by scolling its
contents. The user can interactively change which items are selected
such that either only one item is selected, or several items are
selected simultaneously (e.g. on firefox 2.x browsers, clicking on an
item will force that item to be selected while deselecting any other
items if they were previously selected, and by clicking on an item
while holding the 'Control" key pressed several items can be
successively selected). Like in the case of the drop-down list, an item
marked with the 'disabled' attribute is displaed in the list, but it
cannot be selected.
When a selector element is used inside a form, the "value of the
selector" (which will be sent to the server when the form's
'submit' button is pressed) will be synthesized based on the 'value'
attributes of the (zero or more) selected elements.
As it can be seen from the description above, a selector element is
very similar to a drop-down list element, except for the fact that
a multiple selection is allowed, and for the way the itmes in the list
are displayed: all-but-one of the list items are hidden in the case of
a drop-down list, vs explicitly specifying the number of items to be
displayed in the case of a selector element. The number of elements to
be displayed (one under the other) in a selector is specified via a
'size' attribute.
Syntax:
<select name='checkbox_name' size=number_of_items_to_display [multiple]> <option value='item_ID' [selected] [disabled]> list item text </option> [more 'option' items, several can be 'selected' in a 'multiple' selector] </select>
the items marked with the 'selected' attribute are
selected by default when the HTML document is rendered (i.e. before the
user interacts with the drop-down list and changes the selection).
There can be zero or more items marked with a 'selected' attribute in a
drop-down list.
items marked with a 'disabled' attribute are shown in
the list, but they cannot be selected by the user.
Example with two disabled and three selected items
(derived from the drop-down list example above):
<html> <body> <form action='http://myServer.com/server_application.exe'> Pick a color: <select name='color_choice' size=3 multiple> <option value='black' disabled> Black </option> <option value='white' disabled> White </option> <option value='red' selected> Red </option> <option value='green' selected> Green </option> <option value='blue' selected> Blue </option> </select> <input type='submit' /> </form> </body </html>
HTML Forms (block element)
A form is a special container element for one or more interactive
elements (e.g. checkboxes, text entry fields, etc). On most
browsers, a form does not have a dedicated graphical representation
(i.e. it is an invisible container by default), but it does
however have a distinctive visual characteristic: a 'form' container
provides a special-purpose 'Submit' button, and optionally a dedicated
'Reset' button.
Note: one way to make a form visible inside a browser window
is by placing it inside a visible-border element (e.g. inside a
one-cell table), in which case all the various elements in the form
will be displayed inside the cell's border, together with the forms
'Submit' and 'Reset' buttons.
The function of the 'form' element is to automatically
synthesize a unified 'form status' that "encapsulates" the states of
all the interactive display objects that it contains (i.e.
checkboxes, text-entry fields, etc), and send said status as a message
to a an application running on a remote server (usually when the form's
'submit' button is pressed). Based on the message it receives from the
form (i.e. the state of checkboxes, text-entry fields, etc), the server
application can then perform certain actions and respond by sending
a new web page to the browser. Thus, the interactivity provided
by a 'form' element is provided by running a remote application on a
remote server, and not by running a script embedded inside the HTML
document where the form resides.
The essential components of a form element are:
a '<form>' start tag and a '</form>' end tag
which together are used to "encapsulate" a number fo interactive
elements (non-interactive elements may also be placed inside a form)
an attribute that specifies the complete URL of the
server application that must receive the form message, e.g.
'http://myServer.com/my_application.exe'
an attribute that specifies a protocol name that
should be used to send the states of the form's interactive objects to
the server application; this may be one of two standardized protocols:
'PUT' or 'GET'
a 'Submit' button: when this button is "pressed" by the user,
the form sends the message containing the states of its contained
interactive elements to the server application using the specified
protocol, and the browser automatically requests a new HTML
document from the URL of the application.
For example, if the remote application URL is
'http://myServer.com/my_application.exe', then the browser will send a
message to the server application containing the states of the
interactive display objects inside the form, and then it will
immediately try to load a new HTML document from the URL
'http://myServer.com/my_application.exe'. As it can be seen, this URL
does not represent an HTML file, but rather an application name, which
in turn means that the application itself must be able to generate
an HTML document and transmit it to the browser. Because the HTML
document is generated "on the spot" by the application, the contents of
this HTML document can be created such that it represens a response to
the message received from the form (i.e. a "custom" HTML document can
be created depending on the message received from the form).
a 'Reset' button: when this button is "pressed" by the user,
the interactive display objects are brought back to their states as
they were when the document was first displayed in the browser window,
i.e. whatever modifications were made by the user to the states of the
interactive display objects, these modifications are all reversed.
interactive and non-interactive elements contained in the form
<input type='submit' value="text displayed on the 'submit' button" /> [<input type='reset' value="text displayed on the 'reset' button" />] </form>
Example:
<html> <body> <hr/> <form action='http://myServer.com/respond_to_checkbox.exe' method='GET'> Check or un-check the checkbox: <input type='checkbox' name='my_checkbox' checked> <input type='submit' value="Submit choice to the server application"/> <input type='reset' value="Reset the form"/> </form> <hr/> </body </html>
in torder to clearly mark which elements are contained in
the form, the form has been placed between two horizontal_line
'<hr/>' elements
the form contains one non-interactive element (the text
"Check or un-check the checkbox:"), and one interactive element (the
checkbox named 'my_checkbox').
the form is featured with both a 'submit' and a 'reset'
button. Because the default state of the checkbox is 'checked', when
ever the user will press the 'reset' button the checkbox will be
brought in a 'checked' state.
the server application that must receive the state of the
elements contained in the form when the 'submit' button is pressed is
located on the server 'myServer.com', the complete URL being
'http://myServer.com/respond_to_checkbox.exe'
the protocol by which the states of the elements
contained in the form are sent to the server application is 'GET'
In the example above, the user can set the checkbox state (to checked
or unchecked) before pressing the 'submit' button. Then, when the
'submit' button is pressed, the state of the 'my_checkbox' checkbox is
sent to the 'respond_to_checkbox.exe' server application, the server
application processes the checkbox state, and generates "on the spot" a
new HTML document who's contents depend on the checkbox state (e.g. it
may be a simple HTML document that only contains one line of text:
'Checkbox was checked' or 'Checkbox was unchecked').
The GET and PUT methods:
As mentiond in the paragraph above, the protocol by which the form's
interactive elements' states are sent to a server application can be
either 'PUT' or 'GET'. From an HTML document developer perspective, the
main differences between these two protocols are:
The 'GET' protocol is the form's default protocol, i.e.
the one used if the 'method' attribute is not specified
The 'GET' protocol does not allow "long" messages to be
sent to the server (typically the length is limited to about 100
characters). If a form contains many interactive elements, or if an
interactive element may send a long state (e.g. a text entry field
where a long text can be entered), then this method is not recommended
The 'GET' method displays the states that it sends to the
server application in the browser's "address bar", which means 'GET' is
not indicated for sending sensitive or private information (e.g.
passwords, etc)
The 'PUT' method allows unlimited length for the message
it sends to the server application, and hides the message from the
user. However, because 'PUT' does not include the form data in the
"address bar", it is not possible to make a bookmark that represents an
HTML document with filled-in form data.
SUMMARY: It is strogly recommended to use 'PUT' in all cases except
if one wants to be able to create a bookmark for an HTML document with
filled-in form data. Also, NEVER use 'GET' for forms where
sensitive information (e.g. passwords, etc) is being input.
A multi-frame HTML document is a container for other HTML
documents, where the contained documents may themselves be either
single-frame or multi-frame documents. In order to display multiple
documents inside a single display area (e.g. a browser window or a
document page), the graphical display area is split into adjacent
"parcels", with the dimension and positioning of the parcels being
determined by the dedicated HTML 'frameset' element.
Example:
An HTML document that is a container for three other HTML documents,
and has no 'head' section, has the following hierarchical containment
structure:
'html' element | +-'frameset' element | +- 'frame' element | | | +- first contained HTML document | +- 'frame' element | | | +- second contained HTML document | +- 'frame' element | +- third contained HTML document
The HTML syntax for describing the above structure is:
The graphical positioning of the display parcels in which the contained
documents are individually displayed is determined based on several
attributes attached to the 'frameset' element; similarly, the various
graphical properties of the borders that separate a multi-frame
document's display parcels (e.g. the visibility of the borders, their
thickness, etc) are also determined via attributes attached to the
'frameset' element.
The 'frameset' and 'frame' element
types
A 'frameset' element is used to describe the graphical partitioning
of a multi-frame document's display area: via a set of dedicated
attributes attached to a 'frameset' element, one can determine the
layout of the parcels that hold the HTML documents "encapsulated" in a
multi-frame document.
There are three ways in which a multi-frame document's display area can
be partitioned: 'vertical, horizintal, and mixed:
vertical partitioning: the 'cols'
attribute
In this case the multi-frame document's display area is partitioned in
adjacent vertical strips, with the width of each strip being specified
by the 'frameset' attribute 'cols':
The 'cols' attribute contains a list of widths, one value for each
vertical column that will display the individual documents that are
"encapsulated" in the multi-frame document. The values can be either
absolute numbers representing widths in pixels, or they can be
percentage-based specifications in which case the numbers represent
percents of the graphical container's width in which the 'frameset'
element is contained (i.e. percents of the full window width). Finally,
any one of the values (but at most one value) can be represented via
the special '*' character, in which case that value will "fill up" the
remaining space in the multi-frame document's display area.
Example: a two-frame document that partitions the browser
window in two eually sized verical strips, and fills the two strips
with the google search page in the left half of the window, and the
yahoo search page in the right half of the window.
<html> <head> <title>Google and Yahoo Search: Vertical</title> </head>
horizontal partitioning: the 'rows'
attribute
In this case the multi-frame document's display area is partitioned in adjacent
horizontal strips, with the height of each strip being specified by
the 'frameset' attribute 'rows':
The 'rows' attribute contains a list of heights, one value for each
horizontal row that will display the individual documents that are
"encapsulated" in the multi-frame document. The values can be either
absolute numbers representing heights in pixels, or they can be
percentage-based specifications in which case the numbers represent
percents of the graphical container's height in which the 'frameset'
element is contained (i.e. percents of the full window height).
Finally, any one of the values (but at most one value) can be
represented via the special '*' character, in which case that value
will "fill up" the remaining space in the multi-frame document's
display area.
Example: a two-frame document that partitions the browser
window in two horozontal strips, and fills the two strips with the
google search page in the top side of the window, and the yahoo search
page in the bottom of the window. The height of the top strip is fixed
at 200 pixels, while the height of the bottom strip is the remaining
area in the browser window.
<html> <head> <title>Google and Yahoo Search: Horizontal</title> </head>
mixed partitioning:
In order to allow the display area to be partitioned into rectangular
parcels (instead of horizontal or vertical strips only), the
'frameset' element allows nesting. Specifically, a 'frameset'
element can be included in another 'frameset' element (but not in a
'frame' element), and thus a complex partitioning of the display area
into rectangular parcels of different sizes can be obtained.
Example: a three-frame document: a top rectangle with
height of 200 pixels and full window width, followed by two equal-width
frames under it.
important note: the fact that a 'frame' element
can be contained "at different 'frameset' levels" in a multi-frame
document (because a 'frameset' element can be contained inside another
'frameset' element) does not create any hierarchy among the
documents that are "encapsulated" in a multi-frame document. In
other words, no matter what nested structure of 'frameset' elements is
used in order to create a certain graphical appearence in a
browser window, the documents that are contained inside a
multi-frame document are all siblings, with no hierarchy of any kind
among them.
frame borders: the 'frameborder'
attribute:
When a multi-frame document is displayed, the borders that separate the
frames that contain the individual "encapsulated" documents can be made
visible or hidden by a 'frameboder=1/0' attribute attached to a
'frameset' element. On most browsers, this attribute is '1' by
default, which causes the frame borders to be displayed, and by
clicking and dragging a frame border the sizes of the frames can be
changed by the user. By setting the 'frameborder' attribute to '0' the
frame borders will be invisible, and the sizes of the frames are no
longer adjustable by the user.
Example (modification of the above example, to include a
'frameborder' specification):
<html> <head> <title>Three Frames with Invisible Borders</title> </head>
In the example above, the partitioning of the browser's graphical
display area is maintained as in the previous example, but the borders
that are delimiting each frame are nolonger visible (nor adjustable by
the user).
Notes:
although the 'frameborder' attrbute is attached to a
'frameset' element, this actually transaltes in "migrating" the
visibility specification down to the borders of each individual
frame. In other words, the 'frameborder' visibility attribute,
although specified at the 'frameset' element level, is an attribute
that is actually attached to each individual 'frame' element that
is contained in the frameset.
setting a 'frameborder' attribute for a certain
'frameset' element determines the appearence of all the borders of
all the frames contained in that frameset, both the frames directly
contained in the frameset element for which the 'frameborder' has been
specified, as well as all the frames indirectly contained in
other possible 'frameset' elements. In the example above, 'frameborder'
has been set for the "top" 'frameset' element, which causes all the
borders of all the frames in the mult-frame document to be
invisible.