Teng et al, ICMLC 2004

From ScribbleWiki: Analysis of Social Media

E-Mail Authorship Mining Based On SVM For Computer Forensic

Authors: Gui-Fa Teng, Mao-Sheng Lai, Jian-Bin Ma, Ying Li

Paper: [1]

The ability to identify original author of e-mail misuse can help to prosecute an offender, and the authors of this paper focus on this particular appliation of authorship attribution. Various e-mail features (eg. linguistic features, header features, and structural characteristics) are used as features with SVM, with a co-locatation based kernel to classify or attribute authorship or e-mail messages to an author.

The authors adopted Vector Space Model (VSM) to store document information, representing each document as a vector of term and weight pairs. The weight of the vector is calculated in a standard fashion (term frequency - inverse document frequency). Not much is said about their feature selection process, except that they adopted chi-squared as the feature selection criteria.

Particular Attributes Used

The From message
The To message
Whether or not have title
Whether or note have attachments
Whether or not have reply
Uses a greeting acknowledgement
Uses a farewell acknowledgement
Contain signature text
Mean sentence length
Mean paragraph length
Number of blank lines / total number of lines

The paper gives a brief overview of SVM. The authors used LibSVM for their evaluation, in a 'one against all' binary classification model. They do not publish results, but only say that their preliminary work is promising. Further proposed research on the topic includes combining SVM with other ML alcgorithms, additional feature extraction, and authorship characterization.

Retrieved from "http://socialmedia.scribblewiki.com/Teng_et_al%2C_ICMLC_2004"

Teng et al, ICMLC 2004

From ScribbleWiki: Analysis of Social Media

E-Mail Authorship Mining Based On SVM For Computer Forensic

Views

Personal tools

Navigation

Search

Toolbox