jangada
Class SigFilePredictor

java.lang.Object
  extended byjangada.SigFilePredictor

public class SigFilePredictor
extends java.lang.Object

Signature File extraction Algorithm It follows the description in "Learning to Extract Signature and Reply Lines from Email", V.R.Carvalho and W.W.Cohen, CEAS (Conference of Email and Anti-Span), 2004 *

Author:
vitor|AT|cs.cmu.edu, May 2004 OBS: this implementation assumes the incoming message has a sig file.

Nested Class Summary
static class SigFilePredictor.WindowRepresentation
          Inner class to represent the message as a sequence of features - using window features (neighbor lines)
 
Field Summary
 int CURRENT_VERSION_NUMBER
           
static long serialVersionUID
           
 
Constructor Summary
SigFilePredictor()
           
 
Method Summary
static void createModel(java.lang.String[] args, java.lang.String linetag)
           
 java.util.ArrayList DetectAndPredict(java.lang.String wholeMessage)
          Detects if there is a sig in the email message AND predicts (extracts) the signature lines .
static boolean detectFromName(java.lang.String tmp, java.lang.String testLine)
          From Line feature function: extracts a "name" from the fromLine of an email message and attempts to match any of its components with the words in the target line In other words, if a piece of the sender's name is detected in this line, it returns true.
 java.lang.String getMsgWithoutSignatureLines(java.lang.String doc)
          returns the original message, without the signature lines.
 java.lang.String getSignatureLines(java.lang.String doc)
          returns the signature file lines (usually the last lines of the messages), if any signature is found.
static void main(java.lang.String[] args)
           
 java.util.ArrayList Predict(java.lang.String wholeMessage)
          Predicts the sig file lines in the email message.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

public static final long serialVersionUID
See Also:
Constant Field Values

CURRENT_VERSION_NUMBER

public final int CURRENT_VERSION_NUMBER
See Also:
Constant Field Values
Constructor Detail

SigFilePredictor

public SigFilePredictor()
Method Detail

Predict

public java.util.ArrayList Predict(java.lang.String wholeMessage)
Predicts the sig file lines in the email message.

Returns:
ArrayList with instances (set of features)

DetectAndPredict

public java.util.ArrayList DetectAndPredict(java.lang.String wholeMessage)
Detects if there is a sig in the email message AND predicts (extracts) the signature lines .

Returns:
ArrayList with instances (set of features)g

getMsgWithoutSignatureLines

public java.lang.String getMsgWithoutSignatureLines(java.lang.String doc)
returns the original message, without the signature lines. (if signatures are found)

Returns:
String (parsed message)

getSignatureLines

public java.lang.String getSignatureLines(java.lang.String doc)
returns the signature file lines (usually the last lines of the messages), if any signature is found.

Returns:
String (signature lines)

detectFromName

public static boolean detectFromName(java.lang.String tmp,
                                     java.lang.String testLine)
From Line feature function: extracts a "name" from the fromLine of an email message and attempts to match any of its components with the words in the target line In other words, if a piece of the sender's name is detected in this line, it returns true. False, otherwise.

Parameters:
testLine - in String format
Returns:
true, if any part of the sender's name is found.

createModel

public static void createModel(java.lang.String[] args,
                               java.lang.String linetag)
                        throws java.io.IOException
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)