SPAM assignment Part 1) Develop a program that classifies spam vs. non-spam email. The program will be evaluated on a test set that includes spam and non-spam email sent to Atkeson and the TAs. Your program should take a list of filenames, and classify each file (each file is a single email message) and produce a text file which had each file name followed by SPAM or NON-SPAM in the format email1 SPAM email2 NON-SPAM email1 SPAM ... Your writeup of this part should include an explanation of how the classifier works (what features they selected, how the features are used to make a decision, etc.). Part 2) Given the characteristics of your SPAM classifier (error rates, etc.) explain how the SPAM classifier cab best be used by a human. How does the combined human/program handle errors, get better with experience, etc? Extra Credit Part 3) Compare the performance of several classification approaches on SPAM. Extra Credit Part 4) Develop a classifier that returns a confidence in its classification, and explain how to use that confidence value to more effectively handle SPAM. ********************************************************************** Assignment FAQ: 1) Can we work in groups? alone? Yes. The maximum group size is 3. You can work alone. 2) Can we use stuff off the web? Yes. As long as you clearly indicate what your contribution is, using other resources is fine. You will be graded on the "value" you add to whatever resources you use. 3) How do we turn this in? I would like a URL pointing to your writeup (and code), so we can make a class web page, and everyone can learn from what others do. Ideally, you can make your writeup available to the world, so others can build on what you do. 4) Where can I get some data? Example SPAM is on the class web page. Latanya Sweeney recently solicited SPAM. You need to generate real email (NON-SPAM).