RTF to HTML Conversion Release 1.1
July 19, 1993 Chris Hector cjh@cray.com
Changes in version 1.1
Added suggestions from Paul Dubois:
Changes for THINK C 6.0:
- renamed TextStyle to textStyle to eliminate incompatibility with
existing Macintosh Toolbox symbol.
- Added stuff to put up console window under THINK C.
- Rewrote malloc()/realloc() calls in terms of RTFAlloc(). This
eliminates need to include malloc.h, which isn't present on some
machines, anyway. (Add terminating null byte in SaveText() as part
of this change so strcpy() can be used.)
- rtfParam is not 0 if no param value is given. (The comment before
CharAttr() was incorrect. This used to be true but no longer; the
comment in rtf2troff, from which this was copied, was incorrect, too!)
Changed Footnote processing to set the string to null after printing.
Fixed a bug in special character processing - was calling PutHTML when
the Output destination was not a file.
As per suggestion from Bob Bagwill, a table row will output "\n".
(If the table is
then it should look OK)
Fixed footnote processing to put an anchor at the footnotes
New Features:
Pictures imbedded in rtf used to generate a link to pic[n].multi
and a pic[n].pict file. This was because of my misunderstanding of
the .multi extension. Now the reference is to pic[n].gif.
NOTE: this still assumes that a SEPARATE program will be used
to convert the .pict file to a .gif file.
If you want to change the extension of the reference, you use
the -P option.
Added a -i option which allows pictures to be viewed inline with the
text of your file. When this option is used, the reference to
the picture will be made using the
tag.
Added a translation that turns footnotes into hypertext links.
If the -H option is used, the translator will assume that text
formatted with a dotted underline is the text of a link. The
destination of the link is expected to follow as a footnote.
If you wanted to create a link of the form:
now is the time for all good men
you would format "good" with a dotted underline. Immediately
following good, you would insert a footnote of:
file:fred.html
This provides a reasonable mechanism for inserting hypertext links
into a document. The RTF version when viewed by Microsoft Word will
have all of the links at the bottom of the page or end of the document
depending on your preferences.
This distribution contains source and documentation for rtftohtml - a simple
RTF to HTML translation tool. To build this you will also need an RTF
reader which is available at ftp.primate.wisc.edu/pub/RTF.
This tool was based on release 7 of the RTF reader.
To build this tool.
1) Obtain the RTF reader and place the files in this directory.
2) Look at the RTF reader distribution and set up it's makefile
for your machine.
3) Edit Makefile.html for your machines configuration
4) type make -f Makefile.html
rtfhtml -RTF to HTML translator
Initial Implementation by Chris Hector (cjh@cray.com)
This translator was based on rtfskel.
Kudos to Paul DuBois for his work in developing the
rtf reader and skeleton code.
In this translator we will capture all of the text of an RTF file
and then use the paragraph style (heading 1, Normal,...), text
style (Bold, Italic) and the destination (header, footnote,
title) to choose appropriate HTML markup. In addition to
capturing text, pictures (Macintosh PICT format) will be captured
and each will be generated into it's own file.
Most of the transformations are straightforward. The list
transformations (
, ...) are a little more complicated. I
have created multiple RTF styles that map to the same HTML
markup. These styles differ in how tab characters are treated
within the input. The description of the various list styles
follows:
list and markedlist styles
Thes styles both map to the HTML markup . In the "markedlist"
style, the translator assumes that each paragraph in the list
begins with :
string
Where the string is some sort of leader (like a "-" or bullet)
Since the markup will cause most browsers to add bullets to
the list, this translator will strip all characters up to and
including the first tab. If there is no tab in the paragraph, all
text will be lost! (a warning is generated for this situation.)
If you don't want anything stripped off of the input, use the
"list" style.
numberedlist and orderedlist styles
These styles both map to the HTML markup . In the
"numberedlist " style, the translator assumes that each paragraph
in the list begins with :
string
Where the string is some sort of numeric leader (like 1 or 1.4.5)
Since the markup will cause most browsers to add sequence
numbers to the list, this translator will strip all characters up
to and including the first tab. If there is no tab in the
paragraph, all text will be lost!(a warning is generated for this
situation.) If you don't want anything stripped off of the input,
use the "orderedlist" style.
glossary style
This style maps to the HTML markup . In the "glossary" style,
the translator assumes that each paragraph in the list begins
with :
string
Where the string is the term being defined. The remaining text in
the paragraph will be assumed to be the definition. If there is
no tab in the paragraph, you will get entries with terms but no
definitions. This may cause problems with some HTML browsers.
dir style
This style maps to the HTML markup . In the "dir" style, the
translator assumes that directory entries are separated by either
tabs or paragraph marks. The directory entries may also come as a
table, in which case each cell contains a separate entry.
menu style
This style maps to the HTML markup