Contact Form Spam detection with Machine Learning (Part 1)

All the old tricks of preventing spam one could implement in a few seconds are long outclassed by smarter bots. While it's still useful to stop the most primitive and thus numerous scripts, mindlessly hammering your forms by catching them via hidden honeypot fields or time based validations, the state of the art spam bots evade these traps with ease.

In order to prevent them from flooding your inbox, it has in my opinion become indispensable to analyse the content of the submits and classify them accordingly.

For that purpose one can usually use existing services that classify the submissions for you. These services are well tested and there not a lot of reasons not to use them, but I wanted to take a shot at solving the problem myself.

Currently I am experimenting with a Naive Bayes Spam Filter but I am looking at a Neural Network approach for the future.

The Naive Bayes Filter is a technique in use since the 90ies. The results I have with it are generally acceptable but I also had a lot of false positives. That's why I am going to try my luck with a simple neural network approach. In my next post I will describe how I have implemented the Naive Bayes classifier.