Idea
Tool
Experiment
Data
Analysis
Algorithms
Future
Results
With a 5 user environment with a training set of 1950 alphabetical characters with 390 characters from each user and a test set of 530 characters with 106 characters from each user, we are able to correctly identify the writers of 80.6% of the 530 characters. The optimized SVM model was able to achieve a kappa statistic, a measure of degree of nonrandom agreement between observers and/or measurements of a specific categorical variable, of .748.

However, considering the more practical case of writing samples from only a single user, the accuracy varied from 91% to 53%. However, we were still able to successfully and confidently identify the author of each handwriting since the number of letters associated with the most likely candidate was at least double that of the next most likely candidate. If we consider the accuracy of identifying a particular handwriting sample with this process it would be very close to 100% since we make predictions for entries sample based on the user with the highest percentage of associated letters relative to the other users.

With the previous collector and identifiers I was able to achieve a kappa statistic of .70 and correctly predict 73.8% of the characters from the test set using an optimized SVM model. The previous test was identical to the current with 5 users and the same training and test set.

Varying the number of users by one or two did not have a drastic effect on accuracy, but with more than 8 users, accuracy dropped down to around 65% in a multi-user test set.


Zuye Zheng | Ananda Gunawardena