Software and Datasets
Jangada is an API for signature block extraction and reply-to extraction from email messages. The ideas follow the ideas of the following paper (CEAS2004 - Learning to Extract Signature and Reply Lines from Email),, but performance was slightly improved by using a new set of features not mentioned in the original reference.
Some Features: Extracts signature blocks and reply lines in email messages with very good accuracy. Can be easily integrated in other Java applications (For instance, the entire email message as a String can be used as input). Can be easily integrated in other Minorthird applications (using the TextLabels format, it accepts as input email messages with other annotations - such as dates, personal names, speech acts, etc)
Licensing: University of Illinois/NCSA Open Source License
Documentation: Very poor. An initial javadocs page is here. There is some documentation on how to use Jangada in the example files below.
Requires: j2sdk1.4 or later. Uses MinorThird.jar.
Recommended: When using email files as input, results will be better if the messages are in mime (.eml) format.
1. create a new directory (for instance, jangadaDir)
3. Unzip (gunzip Demos.tar.gz) and Untar (tar –xvf Demos.tar) the example files, as well as the email files.
4. add jangadaDir, jangadaDir/minorThird.jar and jangadaDir/jangada.jar to the CLASSPATH
6. For a quick demo,
7. compile the example files. For instance: “javac Demo2.java” – (in case of errors, please check you CLASSPATH again)
8. run the examples on the email files directory: “java Demo2 emails/*”
9. Check the documentation on the DemoX.java files and try your own application.
Reminder 1: if you’d like to have access to the source code, please send me an email.
Reminder 2: If you used this package, please cite the following reference:
to Extract Signature and Reply Lines from Email, Vitor
R. Carvalho and William W. Cohen, CEAS-2004 (Conference on Email and Anti-Spam),
A java application that predicts the Email-Acts (or email speech-Acts) of email messages. The ideas follow the contents of the following papers (emnlp04 and sigir05), but performance was significantly improved by careful feature selection and additional features.
Predicts the following acts: Request, Commit, Deliver, Propose, Meet, dData.
Provides the confidence in each prediction.
Easy way to use these acts as features in your application.
Licensing: No guarantees are provided. Lots of bugs for sure. Use at your own risk!
Documentation: Very poor. An initial javadocs page is here. Please check Example.java on how to use it.
Requires: j2sdk1.4 or later. Uses MinorThird.jar (see below)
Questions: I’ll be happy to help, especially if you tell me what a good Ciranda is :-)
1. create a new directory called ciranda, and ciranda/lib
3. add ciranda/ and lib/ciranda.jar to the CLASSPATH
4. download the example file Example.java to ciranda/
5. compile it: “javac Example.java” – (in case of errors, please check you CLASSPATH again)
6. run the example: “java Example”
7. or run the main application on a directory with emails in text format (without headers)
8. create the test directory ciranda/testdir
10. run “java –jar lib/ciranda.jar testdir”
11. or try your own application.
Reminder: Send me an email if you'd like the source code. If you use this package, please use the following reference:
to Classify Email into ”Speech Acts”,, William W. Cohen,
Vitor R. Carvalho and Tom M. Mitchell, EMNLP-2004 (Conference on
Empirical Methods in Natural Language Processing),
These 617 email messages have signature lines and reply-to lines annotations. The messages are a subset of the 20 Newsgroups dataset (produced by Ken Lang at CMU in the mid-90's).