YAPC | talks

Sean Burke

Two talks.

Braille Encoding: A Case Study in Regular Expressions

45 minute talk

Most people are aware that Braille is a class of writing systems, for use by the blind, where letters are made of raised dots. However, most are not aware that Braille is not letter-by-letter identical with conventional ("flat") writing. English Braille contains a great number of "contractions" -- single letters that correspond to several flat latters; for example, <in>, <en>, <er>, <ea>, <the>, <en>, <ing>, <st>, <ch>, <th>, and <sh>. For example, "leather" is encoded in Braille as four letters, "l_ea_the_r". Some contractions are context-specific -- for example, the contraction for <bb> can apply only in the middle of words.

Moreover, there are hundreds of exceptions to the application of these contractions. Some exceptions are systematic, for example: the "ea" in "-eable" words ("impermeable", "peaceable", "knowledgeable", et al.) never contract; other exceptions are specific to just one word: idiosyncratically, the "ea" in "lineage" doesn't contract; the correct spelling is "l_in_e_a_g_e", not "l_in_ea_ge". These exceptions are formalized in a data file bundled with Braille typesetting software freely distributed by the National Federation for the Blind.

The task of Braille encoding (i.e., going from uncontracted text to normal, encoded text) is basically that of scanning each word, looking for sequences of characters that can be contracted, and then replacing those character-substrings with their contractions. The word is scanned in only one pass, going left to right, just as would be implementable with a common string scanning-and-replacement formalism. Moreover, I demonstrate that the task of matching all-and-only the character strings that should be replaced is feasable with regular expressions, specifically.

Consider compression based on a dictionary of rules that consisting simply of

"replace the -> W, ea -> X, th -> Y, and er -> Z",

where these are all unrestricted as far as where in the word these can apply. A RE-replacement to match all these target strings would be:

$word =~ s/(the|ea|th|er)/&lookup($1)/eg;

If $word is "leather", for example, this correctly yields "lXWr" ("l_ea_the_r"), instead of the incorrect "lXYZ" ("l_ea_th_er").

Further elaborations include implementing rule contexts with \b and \B, and automatic generation of a (very large) regexp from the NFB data table of general rules and exceptions.

Similarities and Differences between Natural Languages and Programming Languages

45 minute talk

In the design of programming languages (PLs) over the decades, it's been a recurring theme (often implicit), that PLs should be designed to be like natural languages (NLs). Underlying this is the practical idea that people are very skilled at using NLs, and that greater similarity to NLs would make PLs more learnable, more intuitive, and more familiar.

So far, efforts have focussed on making the syntax of PLs be like the syntax of NLs, or at least some subset thereof. However, in this talk I seek to point out that linguistic models of NLs have several layers of complexity, and that we should consider, at /each/ of these levels, what similarities already exist between NLs and PLs, whether the basic goals and methods of NLs and PLs are comparable, and the (im)practicality of PLs becoming more like NLs.

I focus on the levels of syntax, semantics, and pragmatics, but I include points from sociolinguistics, historical linguistics, and language typology.

At the level of syntax, I argue that PLs can be made to superficially resemble NLs, but that beyond a certain point, one faces several intractable problems, including unavoidable syntactic ambiguity in NLs; for example, in the NL/PL phrase "if X is greater than Y and Z, then print it", it is inherently unclear to both a human listener and to a PL parser just what "it" refers to, and whether the sentence means "if((X > Y) && (X > Z))..." or "if(X > (Y && Z))...".

At the level of pragmatics, my observations include: living NLs serve a wide range of functions -- the same NL you make a shopping list in, you can chat with a friend in, or give a conference talk in. For each of these functions, the language adapts, with peculiar lexicons and different standards for clarity, organization, and intelligibility to others; so I assert that PLs should be similarly responsive to the adaptabality to special tasks.

Sean is a columnist for The Perl Journal, amongst his many laudable thingies.

Kevin Lenzo

Last modified: Fri May 7 15:20:21 EDT 1999