YAPC
|
talks
Michael McClennen
Two talks.
VGL/Perl: a metalanguage for information-gathering
45 minute talk
The legendary flexibility and adapatability of Perl makes it uniquely
suitable as a base for specialized application languages. VGL/Perl,
which I have been developing over the past two years, is a "little
language" which provides high-level primitives for gathering
information from sites on the Internet and organizing it into
relational, hierarchical, or object-oriented frameworks. VGL/Perl
provides the full power of Perl, with the addition of a more capable
exception-handling mechanism and an object-management capability. At
the same time, VGL/Perl allows (but does not mandate) a simplified
syntax that is (in theory) easier for beginning programmers to use.
This language is useful for applications such as website mirroring,
web spidering, and distributed information gathering and sharing.
The implementation of the VGL/Perl interpreter is interesting in
itself, making use of advanced features of Perl such as run-time code
generation, dynamic exception handling, and typeglob aliasing. The
talk will focus on the following two topics:
- The design of the VGL/Perl language, as a model for designing
Perl-based application-specific languages
- The implementation of the VGL/Perl interpreter, covering in
particular the use of advanced features of Perl, and the different
implementation strategies that were considered and tried.
Handling very large data sets in Perl
45 minute talk
When dealing with data sets in the hundreds of megabytes, some of
Perl's usual strengths can become weaknesses if you're not careful.
On the other hand, there are many techniques available to the would-be
data-jockey. To illustrate some of these, I will draw examples from my
experience in building a comprehensive suite of tools for
manipulating, summarizing, and analyzing web server access logs.
Techniques I will cover include:
- pitfalls to avoid
- use of DB and DBM to store large hashes
- methods for effective use of temporary files
- multiple techniques for indexing large data sets
- writing efficient main loops, including run-time code generation
- transparently working with compressed data
- making effective use of existing tools such as sort and gzip
Michael McClennen is a founder and staff member of the
Internet Public Library at the
University of Michigan.
Kevin Lenzo
Last modified: Fri May 7 15:45:37 EDT 1999