YAPC | talks

Michael McClennen

Two talks.

VGL/Perl: a metalanguage for information-gathering

45 minute talk

The legendary flexibility and adapatability of Perl makes it uniquely suitable as a base for specialized application languages. VGL/Perl, which I have been developing over the past two years, is a "little language" which provides high-level primitives for gathering information from sites on the Internet and organizing it into relational, hierarchical, or object-oriented frameworks. VGL/Perl provides the full power of Perl, with the addition of a more capable exception-handling mechanism and an object-management capability. At the same time, VGL/Perl allows (but does not mandate) a simplified syntax that is (in theory) easier for beginning programmers to use. This language is useful for applications such as website mirroring, web spidering, and distributed information gathering and sharing.

The implementation of the VGL/Perl interpreter is interesting in itself, making use of advanced features of Perl such as run-time code generation, dynamic exception handling, and typeglob aliasing. The talk will focus on the following two topics:

The design of the VGL/Perl language, as a model for designing Perl-based application-specific languages
The implementation of the VGL/Perl interpreter, covering in particular the use of advanced features of Perl, and the different implementation strategies that were considered and tried.

Handling very large data sets in Perl

45 minute talk

When dealing with data sets in the hundreds of megabytes, some of Perl's usual strengths can become weaknesses if you're not careful. On the other hand, there are many techniques available to the would-be data-jockey. To illustrate some of these, I will draw examples from my experience in building a comprehensive suite of tools for manipulating, summarizing, and analyzing web server access logs.

Techniques I will cover include:

pitfalls to avoid
use of DB and DBM to store large hashes
methods for effective use of temporary files
multiple techniques for indexing large data sets
writing efficient main loops, including run-time code generation
transparently working with compressed data
making effective use of existing tools such as sort and gzip

Michael McClennen is a founder and staff member of the Internet Public Library at the University of Michigan.

Kevin Lenzo

Last modified: Fri May 7 15:45:37 EDT 1999