Newsgroups: comp.text.sgml,comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!nntp.coast.net!news00.sunet.se!sunic!news99.sunet.se!newsfeed.tip.net!news.seinf.abb.se!nooft.abb.no!Norway.EU.net!nntp.uio.no!ifi.uio.no!naggum.no!comp-text-sgml
Date: Sun, 17 Dec 1995 21:25:41 GMT
From: Robin Cover <robin@utafll.uta.edu>
Message-ID: <9512172325.AA01649@utafll.uta.edu>
References: <4a3tfs$lfv@ibbr.ib.be> <ACF33D1296681E82AB@poolb10.pavilion.co.uk> <TOMAZ.95Dec13145726@alibaba.ijs.si>
Subject: Re: SGML and machine translation
Lines: 102
Xref: glinda.oz.cs.cmu.edu comp.text.sgml:12024 comp.ai.nat-lang:4307

[Maurice Bauhahn]

|   Have you visited the Web pages on www.sil.org? This month they are
|   supposed to release CELLAR a marvelous package (Windows and Mac)
|   which integrates linguistics and SGML (though not machine translation).

[Tomaz Erjavec]

|   As far as I understand, CELLAR does no such thing. In fact, they
|   explicitly say that SGML is not a good platform for current linguistic
|   practice. For example, when discussing six requirements that linguistic
|   SW must meet (and CELLAR purports to meet) they say:
|   
|   ---------------
|   But SGML-based representations of information are too abstract and too
|   cumbersome for the average researcher to work with directly. This
|   suggests a sixth fundamental requirement for a computing environment
|   that will meet the needs of the linguistic researcher:
|   
|   Users need to be able to manipulate data with tools that present
|   conventionally formatted displays of the information; the database
|   must therefore be supported by a complete computing environment that
|   can provide user-friendly tools for creating, viewing, and
|   manipulating information.
|   
|   Until this requirement is met, it will be difficult for the community
|   of linguistic researchers to realize the promise of the kind of data
|   interchange that the TEI guidelines make possible.
|   ---------------URL:http://www.sil.org/cellar/mlingdp/mlingdp.html
|
|   While they are essentiall[y] right, I think it is a shame they ignore
|   SGML/TEI and the resulting system will be of less value because of
|   it. My impression is that this is because CELLAR has been a long time
|   in the making and when starting out ('90) they adopted the then very
|   fashionable OO paradigm and did not give too much consideration to
|   data portability.

At least three statements about CELLAR made by Thomaz are misleading, and
merit correction:

 (a) "they explicitly say that SGML is not a good platform for current
     linguistic practice"

 (b) "they ignore SGML/TEI"

 (c) "[they] did not give too much consideration to data portability"

Statement (a) is true only in the sense that CELLAR is designed as a
programming environment (built in Smalltalk) and not as a markup language.
SGML is neither a programming language nor a programming environment.  As
we all know, SGML can be used to define textual markup for documents, but
marked-up documents do not DO anything.  CELLAR enforces many kinds
integrity constraints that are impossible to enforce in SGML, and models
relationships that cannot be expressed in SGML languages.

The following summary is extracted from the most visible Web page
describing CELLAR (Computing Environment for Linguistic, Literary, and
Anthropological Research):

   CELLAR is an object-oriented database system that is being developed by
   the Academic Computing Department of SIL to meet the data management
   needs of our field workers.  Two of its special features are the ability
   to cope simultaneously with data in many languages, and design which
   separates the conceptual model of a data set from multiple
   (interchangeable) views for display and encoding formats for import and
   export.  While important aspects of the design were motivated by the
   needs of linguistic research, the system is fully programmable and can
   be used to develop text-related (as opposed to number crunching)
   applications for any discipline.

Statement (b) is completely false in that the CELLAR programming
environment uses TEI-flavored SGML markup constructs in many of the DTDs
used within the project.  Objects in CELLAR have encoding models, and the
parser definitions for object classes characteristically address textual
representations of CELLAR data in terms of SGML markup.  Statement (b) is
also false in that two papers presented by Project Director Gary Simons
have demonstrated how TEI-SGML may be generated for CELLAR objects (as a
view of the information), particularly within the realm of linguistic
databases and document databases.  Statement (b) is misleading in that two
participants in CELLAR (including the Project Director) have made
supportive contributions to TEI in design meetings, work papers, and
published papers, since about 1988.

Statement (c) is false in that encoding models -- defining markup
representations for CELLAR objects -- are quite important in the design of
CELLAR classes.  CELLAR design makes the assumption that a significant
percentage of data manipulated in the system by linguists and other
researchers will be imported from existing documents and databases.
Similarly, CELLAR design assumes that some information created within
CELLAR will need to be exported to find alternative or even ultimate
expression outside CELLAR (e.g., desktop publishing systems).  The
preferred interchange format for both import and export is SGML.

Robin Cover

-- 
Robin Cover                Email: robin@utafll.uta.edu  ("uta-ef-el-el")
6634 Sarah Drive
Dallas, TX  75236  USA     In case of link failure, use:
Tel: (1 214) 296-1783 (h)     robin@acadcomp.sil.org
Tel: (1 214) 709-3346 (w)     
FAX: (1 214) 709-3380      SGML Page: http://www.sil.org/sgml/sgml.html
