Markup
Continuous speech, or
computer data or characters,
is a linear phenomenon.
Markup is any system of
distinguishing one segment
of text from other segments
which imposes an order on
the underlying byte stream.
There are several different
types of markup, ranging
from the most
appearance-oriented
(physical) to the most
structure-oriented (logical):
(Coombs, Renear, and DeRose 1987)
:
- Punctuational
- Punctuational markup includes
commas, dashes, and semicolons.
We often think of these as
an integral part of text,
but some ancient languages
were written without any
punctuation whatsoever, including
spaces.
- Presentational
- Presentational markup
includes horizontal and vertical
spacing such as page breaks,
double line spacing, and indentation.
Presentational markup is what is
generated with a mechanical
or electric typewriter.
- Procedural
- Procedural markup takes the
form of instructions to a computer
program that will then generate
presentational and/or punctuational
markup. Languages such as TEX
and Postscript are often considered
procedural markup languages.
- Descriptive
- Descriptive markup concerns
the logical structure of a document
rather than its appearance.
Descriptive markup is designed
to be easy for both people and
computers to read and understand.
User-defined styles in
word processors can be used for
descriptive markup.
- Metamarkup
- Languages that can generate
various types of descriptive
markup are considered "metamarkup"
languages.
Standard Generalized Markup Language (SGML)
is probably the best example in this
category.
These types of markup are not
exclusive. Punctuational and
presentational markup are somewhat
independent, but one can often serve
as a substitute for the other (such
as when lists separated by commas
are replaced by bulleted lists.)
Procedural markup can produce
all of the effects on the page
that descriptive markup can,
but it loses (theoretically
irrecoverably) the information
about the logical structure
of the document provided
by descriptive markup.
With metamarkup, you can
generate specific markup
languages that will meet
particular needs.
Even though punctuational
and presentational markup could be
replaced with appropriate descriptive
markup generated by metamarkup,
it is hard to imagine
people actually encoding sentences
as
<sentence>the name of my dog is
<propernoun>rover</propernoun>
</sentence>
even if computer processing would be
easier as a result.
If we ever do get to that point
it is just one step away from writing
completely in terms of the logical
significance of the document;
such documents might look like
<sentence><defnarticle>
<name1><of3><pronounfirstsingposs>
<canine2informal><tobe>
<propernoun>rover</propernoun>
</sentence>
where the only remaining text
is text that refers to items
not yet represented inside the computer,
such as my dog Rover.
A more likely
alternative is that the "period" key
on the keyboard would actually generate
a <sentence></sentence> pair,
but even that remains unlikely for
the near future. Punctuational
markup at least will be with us for
a while.
There is another category of markup,
referential markup, that "refers to
entities external to the document"
(Coombs, Renear, and DeRose 1987)
but it describes a different aspect
of markup than the previously
listed categories. Referential markup
is the basis of hypertext.