Cut Once cut once prediction
A Thunderbird extension for Recipient Prediction and Leak Detection

Cut Once is an extension to Mozilla Thunderbird - a popular open source email client. Cut Once implements methods for Email Leak Detection and Recipient Recommendation based on the papers [SDM-2007][ECIR-2008] by Vitor Carvalho and William Cohen from Carnegie Mellon University. The extension is entirely written in Javascript. Some initial comments on Cut Once can be found here. Details on the associated user study can be found here

Authors & Contact

Ramnath Balasubramanyan, Vitor R. Carvalho and William Cohen from Carnegie Mellon University. Please send all questions to email.research.cmu@gmail.com. 

Usage

Cut Once can be downloaded from here [Latest version (04/06/2008)  cut_once-0.0.7-tb.xpi ](Save it as a .xpi file!   ...NOT as a .zip file).
It is compatible with Thunderbird versions 2.0.0.0 or later.  Thunderbird extensions are distributed as .xpi packages. To install Cut Once:
  1. Open Thunderbird
  2. From the top menu, select "Tools"
  3. In the "Tools" menu, select "Add ons"
  4. In the "Add ons" window, click on the "Install" button
  5. Select the .xpi file (CutOncev0.0.xx.xpi)
  6. The installation will take a few seconds. If successful, you'll be asked to restart Mozilla Thunderbird.
  7. After restart, the main window should show the "Train" and the "Send Feedback" buttons on the top right. It should look like this:
 cut once
After the installation, Cut Once needs to be trained before it is able to make recipient predictions. Training is achieved by:
  1. With your mouse, select the directory that contains your sent messages (typically called "Sent Mail", "Outbox", or "sent")
  2. Click on the "Train" button training Cut Once model   on the top right of the window. 
  3. Click "Okay".
  4. The training window should look like this screenshot below.  The time taken for training depends on the number of messages in the sent folder, the speed of the processor, etc. A rough estimate is 150 messages per minute.
    1. note: DO NOT READ OR WRITE EMAILS USING THUNDERBIRD DURING TRAINING.
  5. cut once train
  6. After the message "Trained Successfully" is displayed, click on the "Close" button.

Once the train procedure is completed, a model file (called thunderbird_infoleak_model.dat) is created in the user’s home directory. The model file is then read in by Cut Once everytime Thunderbird starts up. A weekly reminder encourages users to retrain on a regular basis.

Cut Once recipient predictions can be seen in two different ways. In the first method, the user can explicitly seek recommendations by hitting the "Recommend Recipient” button cut once prediction on the toolbar in the Compose window. Clicking a recommended email address adds the address to the recipient list in the Compose window.

In the second method, a dialog box pops up when the user hits the Send button. This dialog box higlights possible leaks (defined as email addresses that have been chosen as recipients by the user which are unlikely to be valid recipients for the message composed, based on the history of past communication with this address) and also lists other recommended recipients. A countdown timer ensures that the message is automatically sent after 10 seconds if the user does not wish to use the dialog. The "Pause" button freezes the 10-second counter. The "Cancel" button closes this dialog and returns to the original message under composition.

An example is illustrated below: (1) the Compose window. (2) the Predictions by Cut Once on this message.
pred1
pred2



Another example is shown in the picture below:

shot2
The model file (thunderbird_infoleak_model.dat) created during the training process stores the following pieces of information about the user’s Sent folder.
 • Centroids: A centroid for each email address to which a message was sent to is computed by calculating a mean vector over all the messages addressed to the email address. Each email is represented by a TFIDF vector over the words in the subject and body.
 • Document frequencies: A table of words and its corresponding document frequency which is the number of messages in which the word occurred. This is necessary to compute TFIDF vectors for messages during runtime.
• Recency and Frequency Ranks: Candidate email addresses in the Sent folder are ranked by recency and frequency to establish a baseline ranking. The ranks assigned to each email address are saved in the model file to enable Cut Once to display a baseline ranking during runtime. The training procedure trims the size of the model by discarding words whose document frequency is below a threshold and by discarding email addresses which have very few messages addressed to them.

The "Einstein" button help CMU researchers : Helping researchers in Carnegie Mellon. 

User actions within the recipient recommendations dialog and the dialog box opened after the Sent button is hit are logged. This includes information such as the rank of a recommendation that the user clicks on, the time taken by the user to accept a recommendation and the position in the list of a leak that is removed by the user.  No personal information (such as email content or recipients) is logged.  The logging message does not contain any personal or private information from you, nor from any of your contacts. Users are asked every week if they would like to send this log file to the researchers who developed the extension. Log files can also be explicitly sent by hitting the ”Mail statistics” button (Einstein button) on the main window.

Under the protocol number HS08-026, this research was approved by the IRB (Institutional Review Board) --- a group formally designated by the United States government to approve, monitor and review research studies with the alleged aim to protect the rights of the research subjects. Please contact email.research.cmu@gmail.com for any questions or concerns.

Comments, Reviews and Related Links: