Date: Mon, 04 Nov 1996 23:54:24 GMT Server: NCSA/1.5 Content-type: text/html Last-modified: Thu, 24 Oct 1996 22:13:43 GMT Content-length: 18716 An Introduction to HTML

An Introduction to HTML

Written by Kurt Revis, modified by Steven Fought

This document is an introduction to HTML, the Hypertext Markup Language. The vast majority of the documents that you have seen on the Web have been written using HTML.

HTML was intended to be a structural markup language. In a traditional word processor or desktop publishing program you indicate how you want the different parts of your document look. In a structural publishing system, you indicate what the different parts of your document are, and elsewhere, outside of the document, you indicate how each element should be shown to a user. In the case of HTML, each browser decides how it will render the HTML elements. The original developers of HTML wanted browsers to be developed on a wide variety of systems, so they kept it simple and tried to avoid including elements that made assumptions about a Web user's computer (such as the ability to display pictures). The visually-oriented tags that are now in HTML were simply added into browsers by their developers, and have since become popular.

It is important to understand that HTML is designed to specify the structure of the document, and not its visual appearance, because while you might like for your document to show up in 12-point Times, double spaced, with 2 inch margins on all sides, those kinds of decisions will be made not by you, but by the person reading your document.


Tags

HTML files are just plain text, with tags mixed in for formatting. Tags are used like this:

<h1>Operating Systems</h1>

Tags usually come in pairs called containers that surround pieces of text. The tags themselves are made up of <, the tag name, and >. The first tag (<H1>) is called the start tag, the text inside ("Operating Systems") is called the content, and the second tag (</H1>) is called the end tag.

The <h1> tag signifies that the content text should be displayed as a header (the 1 means that it is at level 1, the largest). When viewed in the browser (Netscape, lynx, or whatever), the above line will look like:

Operating Systems

Although most tags work this way, some tags don't have end tags. For example, the <P> tag for specifying the beginning of a paragraph doesn't need a </P> at the end of the paragraph. We will discuss this tag (and others) later in this document.

Tag names are not case-sensitive, so <H1> and <h1> are equivalent.

Some tags may also have optional attributes, which modify the interpretation of the tag. For example, the <IMG> tag, which tells the browser to display an image, can have a SRC attribute to specify the image's name or an ALIGN atttribute to specify the alignment compared to the surrounding text. With attributes, a tag might be used like this:

<IMG SRC="os.gif" ALIGN="top">


Structure of an HTML file

All HTML files should follow this general format:
<HTML>

<HEAD>
<TITLE>This is the title of the document.</TITLE>
</HEAD>

<BODY>
This is the body, or main text, of the document.
</BODY>

</HTML>

The first tag, <HTML>, indicates that the contents of the file are, in fact, in HTML format. It is a good idea to enclose the whole file in <HTML></HTML>, although it's not strictly required right now. Although your documents may be interpreted correctly by browsers currently in use, this may change in the future.

Each HTML file is split up into two parts: the header, marked by <HEAD></HEAD>, and the body, marked by <BODY></BODY>. (All text in the file should be enclosed within these tags.)

Note that in HTML, character and line spacing are unimportant. In the example above, I could have written (for example)

<HTML>
<HEAD><TITLE>This is the title of the document.</TITLE></HEAD>
<BODY>This is the body, or main text, of the document.</BODY>
</HTML>

and gotten the same result.

Note that there are a huge number of documents on the Web that are not written in "strict HTML". Often, the <HEAD> section will be completely missing. When Mosaic became popular, NCSA's documentation, which does not stress proper construction, was what many people used to learn HTML. The emphasis was not on using HTML correctly, but on creating documents that looked good in Mosaic, which is very forgiving of bad HTML. The result is that many documents look good in Mosaic and bad in many other browsers. The same holds true for Netscape and MSIE, which support and encourage a wide variety of features outside of the HTML spec.


Paragraphs and Spaces

One of the problems beginners have with HTML is that spaces, tabs, and newlines--known collectively as "white space"--are all equivalent. If I type
         This    is a     test.
or
This is
a
    test.
or
This is a test.
it will always be shown in the browser as:

This is a test.

So just adding a blank line to your document will not start a new paragraph. To make a paragraph, use the <P> tag at the beginning of the paragraph. For example:

<P> This is the first paragraph. It babbles on and on, never really going
anywhere. 

<P>
The second paragraph, however, is a little more weird.
It has some
very short lines,
for instance.
There are also other lines that seem to go forever; in fact, they are rather long, extending for more than 80 characters, but decidedly finite in scope.
Also, it contains blank

lines in the middle of itself.
This HTML is rendered as:

This is the first paragraph. It babbles on and on, never really going anywhere.

The second paragraph, however, is a little more weird. It has some very short lines, for instance. There are also other lines that seem to go forever; in fact, they are rather long, extending for more than 80 characters, but decidedly finite in scope. Also, it contains blank lines in the middle of itself.

Related to <P> are the <BR> and <HR> tags. <BR> produces a line break, without the extra vertical space that <P> puts in. <HR> inserts a horizontal rule--a line extending across the page.


Section Headings

The most common HTML tags are the section headings, like the <H1> tag we used earlier. In fact, there is a whole range of these tags, from <H1> for top-level headings to <H6> for the bottom. These tags tell the browser to use a different type size, indentation, or color.

Heading level 1 <H1>

Heading level 2 <H2>

Heading level 3 <H3>

Heading level 4 <H4>

Heading level 5 <H5>
Heading level 6 <H6>

These tags do not denote section numbers (as in "See Chapter 3, Section 4"), but section levels. You can, of course, use a section level tag more than once in a document.


Type Styles

HTML allows the use of a number of different type styles. The most useful are <EM>, for adding emphasis like this, and <STRONG> for stronger emphasis. Browsers may display the text in italics, in bold characters, underlined, in all capital letters, or with asterisks around it.

In general, it is best to use <EM> and <STRONG>, since they give good results on all browsers. If you need to, however, you can use <I>, which produces italics, or <B> which produces bold text.

There is also typewriter style, like this, produced with the <TT> tag.


Entities

Entities are names that start with & and end with ; that represent characters which cannot normally be written in ASCII, and characters used for markup. For example, because the angle bracket characters are used in tags, to avoid ambiguities the codes &lt; and &gt; are used to produce < and > respectively. Similarly, the ampersand character '&' is produced with &amp;. The semicolon at the end of the code is necessary.

Other codes include &quot; which produces the double-quote character ". (I'm not sure why this code is necessary; I've never seen a browser get confused by the character " itself.) All of the characters in the ISO-Latin-1 set can also be expressed--CERN provides a table of the codes. For example, to produce a u with an umlaut (ü) the &uuml; code is used.

Another useful structure is the HTML comment, produced with <!-- and -->. The contents of the comment should not be displayed by the browser. (However, some versions of Mosaic are broken and still interpret tags within comments, so you can't comment out tags. This is unfortunate.) For example, the text

This here <!-- right here --> is a comment.
should appear as

This here is a comment.


Lists

Lists are a very useful construct in HTML. There are three types of lists: ordered, unordered, and descriptive.

Ordered lists produce a list of numbered items. To make an ordered list, use the <OL> (ordered list) and <LI> (list item) tags, like this:

<OL>
<LI>Get two slices of bread, peanut butter, and jelly.
<LI>Spread jelly on one side of one slice of bread.
<LI>Spread peanut butter on the other side of the other slice.
<LI>Put the sandwich together and eat!
</OL>

The result is this:

  1. Get two slices of bread, peanut butter, and jelly.
  2. Spread jelly on one side of one slice of bread.
  3. Spread peanut butter on the other side of the other slice.
  4. Put the sandwich together and eat!

As you can see, the items are automatically numbered in order.

Unordered lists are similar, but aren't numbered. Instead of <OL> and </OL>, use <UL> and </UL>. For example:

Descriptive lists are a little different: they are used when giving lists of items and their descriptions, like in a dictionary. The tag for descriptive lists is, predictably, <DL>, and each item has a <DT> (term to be defined) and <DD> (definition) tag. The example should make the usage clear:

<DL>
<DT>This is the first term to be defined
<DD>This is the definition of that term.
<DT>piz-za \'pe^-t-se\
<DD>an open pie made typically of thinly rolled bread dough
    spread with a spiced mixture (as of tomatoes, cheese, and
    ground meat) and baked
</DL>
This is the first term to be defined
This is the definition of that term.
piz-za \'pe^-t-se\
an open pie made typically of thinly rolled bread dough spread with a spiced mixture (as of tomatoes, cheese, and ground meat) and baked

In all of the different kinds of lists, you are not limited to short items. In fact, they can be as long as you want, and can even have multiple paragraphs.


Pictures

Adding pictures to an HTML document is not difficult. The picture itself must be in GIF format, and have a filename that ends in .gif. If you have a file in some other format you can make a GIF file using a utility such as xv or the pbm programs.

To include an image, use the <IMG> tag. The usage is a little different than the other tags we have seen so far. By itself, the <IMG> tag does nothing; you must use the SRC="" attribute in the tag to specify the name of the image. Inside the quotes of the SRC="", insert the URL of the image. An example:

Here is a picture of Felix: <IMG SRC="felix.gif">

Here is a picture of Felix:

By default, the text was lined up with the bottom of the picture. By using the ALIGN=MIDDLE or ALIGN=TOP attributes, you can change this. For instance:

Felix is the center of all things:<IMG SRC="felix.gif" ALIGN=MIDDLE>

Felix is the center of all things:

Felix: <IMG SRC="felix.gif" ALIGN=TOP> on top of the world!

Felix: on top of the world!

Another important attribute that you can add to <IMG> tags is ALT="". If a browser is not capable of displaying graphics, it will instead show any text that you put inside the quotes. (Lynx, a text-only browser, is an example.) If your page would be incomprehensible without the pictures, it is a very good idea to use this alternative.

<IMG SRC="felix.gif" ALT="Felix"> was quite a skier.
Felix was quite a skier.


Links

A full explanation of how to create links between documents is beyond the scope of this document. However, we can give a quick introduction.

Links, sometimes referred to as "anchors", are specified with the <A> tag. Like the <IMG> tag, <A> does nothing by itself. The HREF="" attribute specifies the URL that the link points to:

If you click <A HREF="link.html">here</A>, you will see
the file "link.html".

If you click here, you will see the file "link.html".

As you can see, the text inside the <A> and </A> tags is marked as a link.

To create a link using a URL, just put the URL in the HREF="" part of the anchor. For example, a link and its original HTML source:

The UWCS home page.

The <A HREF="http://www.cs.wisc.edu/">UWCS home page</a>.

Another attribute for <A> is the NAME="" attribute. This attibute specifies that the text within the anchor should be associated with a name. Other links may jump directly to that anchor by specifying theURL#name as a target. For example, I might surround the heading for Chapter 3 of my book with the anchor

<A NAME="Chapter3">Chapter Three</A>

Then, from anywhere else in the document, I could provide a link to Chapter 3 by writing

<A HREF="#Chapter3">Jump to Chapter 3</A>

If Felix had a book with the URL http://www.cs.wisc.edu/~felix/book.html, any document could jump to Chapter 3 of the book by specifying

<A HREF="http://www.cs.wisc.edu/~felix/book.html#Chapter3">
Chapter 3 of Felix's book</A>


Other Tags

There are a number of other tags, more and less useful. A quick survey:

<PRE>
Preformatted text: rendered in a monospace font. Unlike most HTML, white space is significant. Tags are still honored, though. This style is useful for tables of information, since tabs work.
<LISTING>
Similar to <PRE>, but in a smaller size. Meant for program listings. Supposedly, embedded tags are ignored.
<BLOCKQUOTE>
Indented text, meant for long quotes from other material.
<ADDRESS>
Address formatting--different browsers give different results. Use the <BR> code for multiple lines.

Some other tags are of not very useful, since no one seems to know exactly what they should be used for:

<DFN>
Definition
<CITE>
Citation
<CODE>
HTML codes?
<KBD>
Keys on a keyboard, like Return.
<SAMP>
Sample codes?
<VAR>
Variable names


Other Sources of Information

By now, you should have a good idea of what HTML is all about, and you should be able to create your own HTML documents. If you want to see more, we have links to other more complete sources of information here. Or, you might want to read some of our other documents.