








                       Spanish LINGER

                     J.Uren & M.Yazdani

                 Dept. of Computer Science,
                   University of Exeter,
                   Prince of Wales Road,
                          Exeter.


Abstract




     This paper describes a system currently being developed

for  the  teaching  of  Modern  Languages  that incorporates

"human like" knowledge of the domain that  it  is  teaching.

The  project  is a progression from the previously (Barchan,

Woodmansee and Yazdani, 1986) developed method of construct-

ing  tutoring  systems  for specific tasks towards producing

tools capable of greater generality and  use.   We  hope  to

clarify  the  issues  arising  from  the attempt to  build a

flexible tutoring system, for more than one language domain,

which  is  capable of being used by teachers in a variety of

ways.  LINGER  (Language   INdependent   Grammatical   Error

Reporter)  was  originally  developed  by Barchan(1987) as a

prototype tool and since then has  been  the  subject  of  a

series of interrelated projects to evaluate its potential as

a basis for the teaching of Modern Languages. In this  paper

we describe the architecture of LINGER in the context of the

experiences gained when  we tried to extend it to deal  with

the Spanish language.











                           - 2 -





1. INTRODUCTION


     A neglected area of study  in  Artificial  Intelligence

(AI)  is  its  application to the teaching of languages. The

prime reasons for this appear to be :



     1. a lack of a formal linguistic descripion of  natural

language.


     2. a lack of  formalised  skills  on  how  to  teach  a

foreign or second language


     3. a lack of understanding of what constitutes a "good"

(or "bad") language teacher.


     4. a lack of understanding of what constitutes a "good"

(or "bad") language learner.


     Many systems are currently being  developed  to  handle

knowledge  in  various  areas of human thought and activity,

from medical diagnosis to complex  engineering  design,  yet

very few are capable of handling language parsing, let alone

diagnosis or remediation of language errors.


     Artificial  Intelligence  (AI)  and  language  learning

share  a common concern in how a language can be represented

and how it can be communicated.  The appreciation that human

languages have a reasonably clear syntactic structure (Chom-

sky, 1956) and possible language universals has lead to much









                           - 3 -


research   within  the field of natural language understand-

ing. It is within this framework (Yazdani, 1987) that LINGER

was developed.



     The architecture of LINGER (Barchan, 1987) has  several

requirements  for  its  application to cover a new language.

The system being developed  needs to incorporate the follow-

ing components:



             - a well structured linguistic grammar


             - knowledge of deviations  from  the  "correct"

grammatical  structures exhibited by novices and the associ-

ated remedial advice.


             - a flexible dictionary


      LINGER currently embraces not only these  characteris-

tics but also supports the reporting and correcting of gram-

matical errors encountered.  A further aim  is  that  LINGER

should  be easily configured for a particular language by an

"expert" in that language but not in computer science.  That

is, it should serve as a tool for the language teacher.


     LINGER is fulfilling its  language  independent  objec-

tive;  it  supports French and German and is currently being

extended for Italian, Spanish and English.


     Given the high modular approach adopted by  LINGER  how

can  a  non-computer  scientist  do anything with it?  Could









                           - 4 -


language independency still be maintained and  LINGER  still

be  capable of efficient parsing of inputs and reporting the

errors?  We shall present some tentative  indications  after

we have shared our experience of trying to extend LINGER for

Spanish.  A few examples will be given to  show  how  LINGER

has coped with this language.




















































                           - 5 -


     In the following sections we shall present the Indepen-

dent Language database as can be seen from figure 1.




























































                           - 6 -


     The intention will be to illustrate the  basic  princi-

ples  underlying  the grammar/dictionary modules of the sys-

tem. Examples will be used to show how a grammar writer  may

approach  the  task of writing his own modules. Each section

will be divided into a brief discussion  of  the  module,  a

definition  of  LINGER's specification and an example of the

approach adopted for the building of the Spanish modules.




2. FrameWork Of The System


     Four distinct design  features  typify  the  goals  and

strengths of LINGER. These are as follows:


     1. modularity &  extendability


                -  as dictated by the requirement of language independency.


     2. generation of correct sentences


                - as desired both in a teaching environment and in the long-
                  term extensibility of the system


     3. the handling of unknown/incorrect words


     4. the grammar writer's control over issuing  and  con-

tent of error messages for pedagogical purposes.


     A configuration of the system consists  of  three  main

modules:  the  language  specific  dictionary,  the language

specific  grammar  (strong/weak  syntax)  and  the  language

independent  shell.  Each  module will be briefly described.










                           - 7 -


However, for further implementation  details  see  (Barchan,

1987).



     The distinction  of  dictionary,  grammar  and  parsing

mechanism is a virtual requirement of a language independent

system. Central to LINGER therefore lies  the  two  flexible

formalisms of:


     1. language dependent grammar and dictionary


     2. language independent shell


which when combined provide a powerful tool.

We shall now move on  to  consider  the  three  main  LINGER

modules:


     1. the dictionary


     2. the grammar


     3. the shell



2.1 The Dictionary


     The dictionary is one of  the  two  language  dependent

data  files  required  by  the language independent shell to

function for a particular language.  It serves two purposes:

firstly,  and  most obviously from its name, it contains all

the words in the specific language which are  known  to  the

system; secondly, it holds information relevant to each word

including what  modifications  can  be  made  to  that  word









                           - 8 -


together  with  their significance. This information is usu-

ally, but not necessarily, grammatical in nature, but  there

is a distinction between this information and that contained

in the language's grammar file.  In the latter, the informa-

tion is concerned with what legal grammatical structures(eg.

sentences, noun-phrases, verb-phrases etc.) may be formed in

the  language  and  how  they are put together, while in the

former the concern is with what individual words are permis-

sible  in  the  language,  how they may be modified and what

significance each such modification entails. Hence the  dis-

tinction  is  that the grammar file specifies the language's

non-terminals, while the dictionary file deals with the pre-

cise form which the terminals may take.



2.2 The Grammar


     The grammar file is the  second  of  the  two  language

dependent  files  required by the language independent shell

to function for a particular language and includes both  the

strong  and weak syntax.  It serves two functions : firstly,

to permit the grammar writer to  specify   what  grammatical

constructs  exist  in  a  given language and how they may be

combined to form legal sentences, noun phrases, verb phrases

etcetera   in the language (strong syntax); and secondly, to

allow him to indicate what rules must be obeyed  to  produce

correctly  formed  sentences within the general framework of

the constructs permitted (weak syntax), such  as  appropiate

numbers, gender etcetera.









                           - 9 -


     If the grammar writer wishes to anticipate certain com-

mon  errors  he  may include messages to be presented to the

user if the input exhibits the appropriate features.



2.3 The Language Independent Shell/


     The shell is the language independent core of the  sys-

tem. The shell contains routines for such actions as accept-

ing the user's input, consulting the dictionary, interfacing

with  the  grammar, attempting to parse the input, reforming

the sentence correctly, comparing the new versions with  the

input  sentence  and  producing  its  final judgement to the

user. The behaviour of the shell can be viewed as consisting

of  three  stages  :  pre-parsing,  parsing and choosing the

'correct' sentence.



3. Using LINGER

     In order to understand how to use LINGER it is best  to

consider it from three levels:


     1. grammatical


     2. lexical


and


     3. semantic


     LINGER is based on  a  Definite  Clause  Grammar  (DCG)

notation  (Pereira  &  Warren,  1980)and  is  implemented in









                           - 10 -


Prolog. It is simple to understand and easy to  modify.  One

of  its  features  is  therefore  that  basic principles are

taught before more complex concepts are considered.


     We have faced a number  of  problems  in  adapting  the

grammatical  structure  for French, which was already incor-

porated in the system, to cope with the addition of Spanish.

For  example, small modifications were found necessary, such

as in the 'empty' determiner construct found in some Spanish

noun-phrases,  in order to use the original (French) grammar

as the basis for the grammar of the new language (Spanish).


      However, it is envisaged that as the linguistic struc-

tures  for  each  language  become more complex, as with for

example  the  introduction  of  conjunctive   prepositional-

phrases,  amendments  would  need to be made not only to the

old grammar but also to the higher level "guts" of the  sys-

tem.


     In providing  a  framework  for  language  analysis  an

observation  has  been made (Barchan, 1987) that a formalism

should not be so restrictive that it  prevents  experimenta-

tion  with  "new and diverging ideas". In LINGER this objec-

tive is achieved by a division of the grammar  into  two:  a

"strong" and a "weak" syntax. Also referred to as a "grammar

specification"  and  "checks".  Grammatical  categories  and

specific  attributes  which  may  appear  within the grammar

rules, checks or even the dictionary are  essentially  arbi-

trary tags selected by the writer of the files to encode the









                           - 11 -


characteristics and behaviour of the language.




3.1 Strong Syntax


     To describe how a grammar can be expressed, it is  use-

ful  to  firstly  consider a context-free grammar (CFG). For

these, the following notation is used which will also  prove

to be convenient later.  Each rule has the form:


     nt --> body


where nt is a non-terminal symbol and body is a sequence  of

one  or  more  items  separated  by  commas. Each of them is

either a non-terminal symbol or a sequence of terminal  sym-

bols.  The  meaning  of  the rule is that body is a possible

form for a phrase of type nt. As in the syntax  of  clauses,

it  is  possible to allow this basic notation to be extended

by allowing alternatives to appear in the body.


     We can now show a simple CFG to  illustrate  the  nota-

tion.  The  grammar  covers sentences such as "The dog bites

the woman".


     sentence --> noun-phrase, verb-phrase.


     noun-phrase --> determiner, noun.


     verb-phrase --> trans-verb, noun-phrase.


     verb-phrase --> intrans-verb.











                           - 12 -


     determiner --> [the, a].


     noun --> [dog,woman].


     trans-verb --> [bites, loves].


     intrans-verb --> [goes].


     CFG's can be generalised in a way  that  maintains  the

correspondence with definite-clauses to obtain the formalism

of DCG's.


     It is worth noting that rules for DCG's, in  the  words

of Pereira & Warren, are no more than "syntactic sugar for a

certain kind of definite clause". That is, terminal  symbols

are translated exactly as expected.


We shall now move on to discuss how the DCG mechanism  works

in terms of LINGER.


     In order to understand how this  representation  serves

as a straightforward top-down parsing it is best viewed as a

"sentence" building process which consists of  the  repeated

decomposition of the lhs into the rhs until appropriate ter-

minals are extracted to satisfy rules.


 The syntax for the French grammar is given as :


     <lhs> --> <rhs>


where <lhs> is <name>(formed(<name>,[<variables>]))


     where:









                           - 13 -


     <name> is the name of the non-terminal.


     <variable> is a list of names. Each one will be used in

the same order in the rhs.



     <rhs>  is  either  []  or  <name1>  (<var1>),   <name2>

(<var2>) ...


     where:


<name1>, <name2> are the names of the <lhs>.


<var1>, <var2> are the variables in the <lhs>.


To take a simple example the syntactic representation:



        noun-phr(formed(noun-phr,[D, A1, N, A2])) -->

        determiner(D),
        adj-list(A1),
        noun(N),
        adj-list(A2).


     would yield sentences of the French grammatical form :



                                   noun-phr


                        determiner   adj-list   noun   adj-list


                          la          belle     dame     [e]
                         (the)       (pretty)  (lady)


     This form of representation will  be  followed  as  the

basis  for the building of the grammatical representation of

the Spanish language.









                           - 14 -


To construct the  required  interpretation  for  Spanish  an

example  will  serve  to illustrate how a grammar writer may

set about the task. Given the Spanish sentence:


     la chica es guapa.


     (The girl is pretty)



this can be broken down into the following constituents:


          [la chica] [es] [guapa].


            det noun verb  adj

              np     v      adj

                 vp

                 s

These rules taken from the tree can then be translated  into

the grammar as:



        sentence(formed(sentence, [VP])) -->
                vp(VP).



        vp(formed(vp, [NP, V, A])) -->
                np(NP),
                v(V),
                adj(A).




















                           - 15 -



        np(formed(np, [D, N])) -->
                determiner(D),
                noun(N).


        noun(formed(noun,[N,Pos])) -->  it_is(noun,N,Pos)).

        determiner(formed(determiner,[D,Pos])) -->  it_is(determiner,D,Pos)).

        verb(formed(verb,[V,Pos])) -->  it_is(verb,V,Pos)).

        adjective(formed(adjective, [A, Pos])) -->  it_is(adjective,A,Pos)).


     Having considered how a grammar-writer may deal with  a

strong syntactic rule specification it remains necessary for

the writer to enter  the  weak  syntactic  rules  which  are

viewed as constraints or checks upon the language.



3.2 Weak Syntax


     A clear distinction must be maintained between the weak

and  strong  syntax  not  only  for reasons of simplicity or

modularity but to ensure that the strong syntax is reflected

in  the specification of attributes of the weak syntax. That

is, given a strong syntactic rule of the form:


        sentence --> subject, verb


if the attributes of the language are such that the declara-

tive  reading is 'for a sentence to be correct the main verb

must agree with the subject' the grammar-writer  can  impose

those  features  of  the  grammaticality  of the language by

adding 'checks' into the weak syntax module.


The specification of the weak syntax is essentially that of:









                           - 16 -


     check([<structure>],


     <scope>,


     [<precondition>], <requirement>


     ).


     where:


[<structure>] is a list  of  the  structures  to  which  the

requirement is to be applied.


     <structure> is either name of <lhs> of the strong gram-

mar rule or the name of <lhs> of the strong syntax.


     <scope> is the <structure> within which the check is to

be performed.


     [<precondition>] is a list of <conditions>'s which must

be true before the requirement is performed.


     <requirement> is the action to be taken by  the  check.

Each check is required to provide the following information:


     (i). the grammatical type of the terminal which  is  to

be checked.


     (ii). the  sub-structure  within  the  sentence  across

which the check is to be made.


     (iii). any preconditions which must be met if the check

is to be valid.










                           - 17 -


     (iv). the name of the check to be  performed,  together

with relevant parameters.


When the weak syntax specification is applied to the case of

adjective-noun  agreement  in  Spanish,  the  example  given

below, ensures that within a noun-phrase the determiner  and

adjective  should  agree in number and gender with the noun.

It can be read as "check that the determiner  and  adjective

in  a  noun-phrase  agree  in  number and plurality with the

noun."


        check([determiner, adjective],

        noun-phrase,

        [],

        concord([gender,plurality],noun,[])

        ).


There are further syntax specifications for <condition>  and

<requirement>  which shall not be discussed here. Interested

readers should consult (Barchan, 1987).


3.3 Lexical Processing


     There are no problems in the adaptation of the diction-

ary  and  categorisation  of  words  to  handle  the Spanish

language. However, related problems are  forseen  where  the

same word may be classified in one or more ways, for example

in instances when a word with membership of one  class  (i.e

noun) can also belong to another class(i.e adjective).











                           - 18 -


     A decision also needs to be made regarding the scope of

the  dictionary.  Should  it  be  of  a  "limited domain" or

"unrestricted"?  The former would concentrate on a  particu-

lar  text limited to a topic whilst the latter would involve

the insertion of words most frequently used in the language.


     At the present time the system can  cope  with  unknown

words  and  assign  them  to grammatical classes with a high

success rate. A possible useful extension would be  to store

unknown  words for a system administrator (i.e a teacher) to

add later to the dictionary. A further worthwhile  extension

would  be to combine the specialised "topic" dictionary with

a more general dictionary.


     As any dictionary writer will confirm there is  a  need

to distinguish between words of the language and the syntac-

tic  classes  to  which  they  belong  such  as  noun,  verb

etcetera.  The  dictionary  configuration for LINGER is con-

sidered to  be  simplistic  yet  powerful.  For  reasons  of

optimal  generality  there are three distinct kinds of entry

to be found in the dictionary:


        1. information about grammatical categories

        2. individual words

        3. sets of general endings for various word-types in the language.


     The syntax specified for the dictionary is easy to fol-

low. It falls into the following 3 specifications:












                           - 19 -


3.3.1  Grammatical/Word Endings :


     info( <name>,


     [<individual information>],


     [<general information>],


     [<global information>],


     ).


     where:


     <name> is the name of the grammatical  or  word  ending

type.


     [<individual  information>]  is  a  list  of  lists  of

<attribute>'s


     [<general information>] is a list of <attribute name>'s


     [<global information>] is a list of <attribute>'s.




3.3.2 Word Specification :


     word( <reference name>,


     [<word root>],


     <grammatical type>,


     [<word ending>] OR <word ending type>,











                           - 20 -


     [<general attribute>] ).


     where:


      <reference name> is the reference  form  of  the  word

enclosed in "".


     [<word root>] is a list of word root forms enclosed  in

"".


     <grammatical type> is a name.


     [<word ending>] is a list of ending forms  enclosed  in

"", OR <word ending type>.


     [<general attribute>] is a list of <attribute body>'s.




3.3.3 Word Ending Types :


     Info for word ending types must be given as  for  gram-

matical types.


     ending _type(<word ending type>,


     [<actual ending>] OR <actual ending type>,


     [<general attribute>]


     ).


     where:


     <word ending type> is a name.










                           - 21 -


     [<actual ending>] is a list of ending forms enclosed in

"", OR <actual ending type>,


     [<general attribute>] is a list of <attribute body>'s.


     The syntax for the actual ending type is:


     actual_endings(<actual ending type>, [<actual ending>]

 ).


     where:


     <actual ending type> is a name.


     [<actual ending>] is a list of ending forms enclosed in

"".



     A simple example of a noun will suffice  to  illustrate

how  this is entered. Given the attributes of a Spanish noun

as having gender, number, person the "high level" entry  for

the Spanish word 'barco'(boat) would thus be:


     info(noun,


     [[plurality(s)], [plurality(p)]],


     [gender],


     [person(3)] ).


Following the syntax for the level of word  specification  a

further entry would be:











                           - 22 -


     word("", ["barco"], noun, ["","s"], [m] ).


Although it is possible  to  generalise  these  patterns  of

word-ending  types  a  good  example  to  enhance the format

specification for this class is that of:


            info(gp_end,[[gender(m),plurality(s)],[gender(m),plurality(p)],

                [gender(f),plurality(s)],[gender(f),plurality(p)]],[],[]).

            ending_type(gp_end,["o","os","a","as"],[]).


which states that the group endings for the class  of  nouns

can  be  masculine  or feminine, singular or plural and that

the ending types for each of the Spanish attributes are  'o'

(masculine,  singular),  'os' (masculine, plural), 'a' (fem-

inine, singular), and 'as' (feminine, plural).


     At the lowest level little attention has been  paid  to

morphological  considerations.  It  should  be  possible  to

incorporate this "finer" requirement  in  the  module  at  a

later stage.



3.4 Semantics



     LINGER currently ignores any  semantic  considerations.

This brings us to the familiar "colourless green ideas sleep

furiously" dichtomy. If language is concerned with  communi-

cation and grammar is concerned with "correctness" of "form"

and motivation and personal achievement are prerequisites of

language  learners, is it the case that the learner discerns










                           - 23 -


for himself the "feel" for the  language  and  develops  the

semantic/syntactic  correctness  as  a  by-product  of  that

"feeling" at a later stage? Or is there some way in which we

can  constrain  and  limit  his  'learning' world until this

"feeling" has developed?






















































                           - 24 -


4. Conclusion


     The questions posed in this Spanish study indicate  the

extent  of  the  work  which  lies ahead and the numeracy of

problems still to be addressed. For the moment  we  plan  to

build  on the limited success of LINGER as a prototype which

will be tested comprehensively within the next two years.


LINGER is a workable system fulfilling its early  objectives

in its present form:


     (1) at present it has a language-independent shell  and

language  subsets  for  French, German, Italian and Spanish.

Work is developing for an English module.


     (2) it is robust


     (3) it has been designed for future work and is  adapt-

able.


However, it is limited in  its  capabilities  and  'intelli-

gence'.


1.Further developements specific to LINGER include:


     (i). modularity and extensibility


                - of domain knowledge


     (ii). multiple error handling


     (iii). separation of weak/strong syntactic errors











                           - 25 -


     (iv). treatment of unknown words


     (v). treatment of incorrect endings/ spelling errors


     (vi). improvements to checks and parsing techniques.


     There remain many areas/modules which must be  investi-

gated  and  incorporated  in  order for LINGER to be a fully

viable proposition for the teaching of  language.  Questions

and areas to be investigated include:



     1. User/Student Modelling.


                - adapting the systems behaviour to the individual user


     2. Explanation Module


                - why learners made the mistakes made.

                - how to explain the errors made to them


     3. Teaching Strategy Module


                - how to formulate the wide range of teaching strategies used
                  in the field of language teaching as a module which would be
                  beneficial in the learning environment

                - how to introduce LINGER as a teaching tool



     4. Machine Learning


                - how to build a catalogue of bugs automatically


     5. Contrastive Analysis Module












                           - 26 -



                - at present LINGER can only be configured for one
                  particular language at a time. Why not develop
                  'language awareness'?


     In our work the progress within LINGER can be viewed as

a move from constructing tutoring systems for specific tasks

towards creating tools of an even greater generality. LINGER

has  been  designed to be as general as possible, making few

assumptions about the nature of language for which it  might

be  used. It is expected that future identification of areas

of similarity between different languages may  lead  to  the

development  of  more  systems which can use Linger as their

core but augmented with  assumptions  about  a  language  or

groups  of  languages, enabling them to be more specific and

hence more powerful.



5. Acknowledgements

     The work reported here is sponsored by a grant from the

Economic  &  Social Research Council (ESRC). We are grateful

to our colleagues Paul O'Brien,  Keith  Cameron  and  Judith

Wusteman for their continuous support.























                           - 27 -


References.



        Barchan, J. (1987) "Language Independent Grammatical Error Reporter".   M. Phil Thesis. Dept. of Computer Science, University of Exeter.



        Barchan, J. Woodsmansee, M. Yazdani, M. (1986) A Prolog-Based Tool
                for  French Grammar Analysis. Instructional Science Vol.15.


     Chomsky, N. (1956) Syntactic Structures.   Mouton,  The Hague.



        Pereira, F. & Warren, D. (1980) Definite Clause Grammar for Language
                                   Analysis. Artificial Intelligence, Vol.13.



        Yazdani, M. (1987) Artificial Intelligence for Tutoring. in
                   J. Whiting & O.A. Bell(eds). Tutoring and Monitoring for
                   European Open Learning. Elsevier Science Publishers.







































