module: html Author: Robert Stockton (rgs@cs.cmu.edu) synopsis: Converts a file in WWW "HyperText Markup Language" into formatted text. Provides a small demo of a 'complete application' in Dylan. //====================================================================== // // Copyright (c) 1994 Carnegie Mellon University // All rights reserved. // // Use and copying of this software and preparation of derivative // works based on this software are permitted, including commercial // use, provided that the following conditions are observed: // // 1. This copyright notice must be retained in full on any copies // and on appropriate parts of any derivative works. // 2. Documentation (paper or online) accompanying any system that // incorporates this software, or any part of it, must acknowledge // the contribution of the Gwydion Project at Carnegie Mellon // University. // // This software is made available "as is". Neither the authors nor // Carnegie Mellon University make any warranty about the software, // its performance, or its conformity to any specification. // // Bug reports, questions, comments, and suggestions should be sent by // E-mail to the Internet address "gwydion-bugs@cs.cmu.edu". // //====================================================================== //====================================================================== // This program is a filter which converts text in WWWs "HyperText Markup // Language" into simple formatted text. Although it is a complete and useful // application, it is included in this distribution primarily as a // demonstration of a "real" (albeit small) Dylan (tm) program. // // Usage is typical for a UNIX (tm) program. It may be invoked either with a // set of files on the command line: // mindy -f html2txt.dbc file1.html file2.html .... // or with no arguments, in which case it reads from "standard input". At // present, it accepts no command line switches, although the behavior may be // changed by changing several constant declarations towards the top of this // source file. // // On most unix systems you should be able to make it into an executable // script by prepending the the line // #!BINDIR/mindy -f // to the compiled "dbc" file. You must, of course, remember to specify the // MINDYPATH environment variable so that it points to the libraries "dylan", // "streams", "collection-extensions", and "string-extensions". // // The basic translation strategy used by html2txt is to scan the file line by // line, looking for HTML "tags" and accumulating text that lies between any // two tags. For each tag type, there is a set of routines (stored in tables) // which define the appropriate actions for starting and ending the // "environment" defined by the tag and for dumping the collected text from // within that environment as formatted text. A basic control loop in // "process-HTML" is responsible for calling the appropriate tag actions. // This routine may be called recusively by some of the tag actions. // // The "interface" between adjacent environments is handled via the "blank" // parameter which is passed around extensively. This variable states whether // a blank line has just been printed. Thus environments which believe that // they must be preceded or followed by a blank line can determine whetehr // they must do anything about it, and we lessen the risk that multiple // routines will emit blank lines when we only want a maximum of one. // // The primary advantage of this organization is that it allows the // specialized actions for a single tag to be grouped together, and allows new // tags to be cleanly added. It benefits greatly from Dylan's ability to // create anonymous methods and manipulate them as first class data objects, // as well as from the rich set of available collection types. //====================================================================== define generic html2text(input) => (); define constant normal-font = "-adobe-courier-medium-r-normal--12*"; define constant H1-font = "-adobe-courier-bold-r-normal--12*"; define constant text-frame = make(, height: 500, fill: "both", side: "bottom", expand: #t); define constant text-window = make(, in: text-frame, relief: "sunken", font: normal-font, fill: "both", side: "right", expand: #t); define constant end-mark = make(, in: text-window, name: "end"); define constant bold-tag = make(, font: H1-font, in: text-window); // This will eliminate the text window's built-in tendency to encourage text // editing and entry. It's probably best to simply consider it black magic // rather than expecting it to make sense. bind(text-window, "", "%W mark set anchor @%x,%y;focus none"); define constant text-scroll = scroll(text-window, in: text-frame, fill: "y"); define method tell-about() let top = make(, name: "information"); unmap-window(top); Make(, In: Top, Aspect: 300, relief: "raised", borderwidth: 1, Text: "Html2Text is a demonstration application which converts " "text in the hypertext markup language into simple formatted " "ascii.\n\nThe primary purpose of this application, however, " "is to demonstrate the feasibility of adding windowed " "interfaces to a Mindy program.", side: "top", fill: "both", pady: "3m"); let frame = make(, in: top, fill: "both", relief: "raised", borderwidth: 1, expand: #t); make(