## CS 15-212: Fundamental Structures of Computer Science II |

Many applications require some form of *tokenization* or
*lexical analysis* to be carried out as a preprocessing step.
Examples include compiling programming languages,
processing natural languages, or manipulating HTML pages to extract
structure. The computational framework for lexical analysis is best
described using finite state machines. After recalling and extending
our previous use of finite state machines and regular expressions, we
study an example of a lexical analyzer for a simple language
of arithmetic expressions.

- Tokenization
- Lexical analysis
- Regular grammar
- Finite state machine

- Overview of Standard ML (contains a sample parser for regular expressions)

John Lafferty lafferty@cs.cmu.edu