Regular Expressions

Previous Page TOC Index Next Page See Page

Mindy Compiler Mindy Debugger Mindy Object Extensions Streams Library Standard IO Print Library Format Library Melange Interface TK Library Collection extensions Table Extensions String extensions Regular Expressions Transcendental Library Time Library Random Library Matrix Library


The Regular Expressions Library

Designed by the Gwydion Project

1. Introduction

The Regular-expressions library exports the Regular-expressions module, which contains various functions that deal with regular expressions (regexps). The module is based on Perl (version 4), and has the same semantics unless otherwise noted. The syntax for Perl-style regular expressions can be found on page 103 of Programming Perl by Larry Wall and Randal L. Schwartz. There are some differences in the way String-extensions handles regular expressions. The biggest difference is that regular expressions in Dylan are case insensitive by default. Also, when given an unparsable regexp, String-extensions will produce undefined behavior while Perl would give an error message.

A regular expression that is grammatically correct may still be illegal if it contains an infinitely quantified sub-regexp that may match the empty string. That is, if R is a regexp that can match the empty string, then any regexp containing R*, R+, and R{n,} is illegal. In this case, the Regular-expressions library will signal an <illegal-regexp> error when the regexp is parsed. Note: Perl also has this restriction, although it isn’t mentioned in Programming Perl.

In previous versions of the regular-expressions library, each basic function had a companion function that would pre-compute some information needed to use the regular expression. By using the companion function, one could avoid recomputing the same information. In the present version, the regular-expressions library caches this information, so the companion functions are no longer necessary and should be considered obsolete. However, they have been kept for backwards compatibility.

Companion functions differ in details, but they all essentially return curried versions of their corresponding basic function. For example, the following two pieces of code yield the same result:

            regexp-position("This is a string", "is");
            let is-finder = make-regexp-positioner("is");

is-finder("This is a string");

Both pieces of code should have roughly the same performance, even if the code is inside a loop.

2. Exported Names

The following names are exported by the Regular-Expressions module of the Regular-Expressions library:

regexp-position [Function]

(big-string, regexp, #key start, end, case-sensitive)
=> variable-number-of-marks-or-#f

            regexp-position("This is a string", "is");
            regexp-position("This is a string", "(is)(.*)ing");
            regexp-position("This is a string", "(not found)(.*)ing");

make-regexp-positioner [Function]

(regexp, #key byte-characters-only, need-marks, maximum-compile, case-sensitive)
=> an anonymous positioner
method (big-string, #key start, end)

regexp-replace [Function]

(big-string, search-for-regexp, replace-with-string, #key count, case-sensitive, start, end)
=> new-string

            regexp-replace("The rain in Spain and some other text",

"the (.*) in (\\w*\\b)", "\\2 has its \\1")
            regexp-replace("Hi there", "Hi there(, Bert)?", 

"What do you think\\1?")

make-regexp-replacer [Function]

(regexp, #key replace-with, case-sensitive)
=> an anonymous replacer function that is either
method (big-string, #key count, start, end)
or
method (big-string, replace-string, #key count, start, end)

translate [Generic Function]

(big-string, from-string, to-string, #key delete, start, end)
=> new-string

            translate("any string", "a-z", "A-Z")
            translate("any string", "a-z", "z-a")
            translate("any string", ".aeiou", ",", delete: #t)
            translate("any string", ",./:;[]{}()", " ");

translate [G.F. Method]

(big-byte-string, from-byte-string, to-byte-string, #key delete, start, end)
=> new-string

make-translator [Generic Function]

(from-string, to-string, #key delete)
=> an anonymous translator
method (big-string, #key start, end) => new-string

make-translator [G.F. Method]

(from-byte-string, to-byte-string, #key delete)
=> an anonymous translator
method (big-string, #key start, end) => new-byte-string

split [Function]

(regexp, big-string, #key count, remove-empty-items, case-sensitive, start, end)
=> a variable number of strings

            split("-", "long-dylan-identifier")
            split("-", "long--with--multiple-dashes)
            split("-", "really-long-dylan-identifier", count: 3)
            split("-", "really-long-dylan-identifier", start: 8)

make-splitter [Function]

(pattern :: <string>, #key case-sensitive)
=> an anonymous splitter
method (big-string, #key count, remove-empty-items, start, end) => buncha-strings

join [Function]

(delimiter :: <string>, #rest strings) => big-string

            join(":", word1, word2, word3)
            concatenate(word1, ":", word2, ":", word3)

<illegal-regexp> [Class]


3. Known bugs

The regular expression parser does a very poor job with syntactically invalid regular expressions. Depending on the expression, the parser may signal an error, improperly parse it, or simply crash.

A regular expression that matches a large enough substring can produce a stack overflow. This can happen much more easily under d2c than under Mindy -- as few as two dozen lines of 80 column text under d2c for Windows.

Mindy Compiler Mindy Debugger Mindy Object Extensions Streams Library Standard IO Print Library Format Library Melange Interface TK Library Collection extensions Table Extensions String extensions Regular Expressions Transcendental Library Time Library Random Library Matrix Library

Previous Page TOC Index Next Page See Page

Copyright 1994, 1995, 1996, 1997 Carnegie Mellon University. All rights reserved.

Send comments and bug reports to gwydion-bugs@cs.cmu.edu