Our Cascaded Learning Framework for Phish Detection and an Online Demo
Essentially, our online cascaded phish detector is composed of a client-side component and a server-side component. The client side is implemented as a Chrome extension, which injects content script to web pages and extracts the corresponding HTML DOMs. The server side is implemented as a Java web application that runs in the Java Servlet Environment provided by the Google App Engine (GAE).
The system diagram of our online cascaded phish detector is shown above. Basically, there are four major steps in classifying a given web page, among which the first step extracts the HTML DOM via a chrome extension and the third step handles the classification task in the backend server-side code.
The client-side Chrome extension can be found here. Installing it on your Chrome browser simply takes a mouse click.
Here is the screenshot of an example phish that we successfully detected using our cascaded detector.
And here is the result of our classification.
One observation is that sometimes a significant percent of the computation is spent on extracting the HTML DOM from the web page content string on the server side. Currently, we use JTidy as the DOM parser on the server side, and maybe there are better alternatives that will reduce the runtime on this part.