Classification of phishing websites using machine learning techniques

Hadi Zamani; Muhamad Kamal Mohammed Amin

Phishing detection is a momentous problem which can be deliberated by many
researchers with numerous advanced approaches. Current anti-phishing mechanisms
such as blacklist-base anti-phishing, Heuristic-based anti-phishing does suffer low
detection accuracy and high false alarm. There is need for efficient mechanism to
protect users from phishing websites. The purpose of this study is to investigate the
capability of 6 machine learning algorithms i.e. Multi-Layer Perceptron (MLP), Support
Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Logistic Regression
(LR) and Naïve Bayes (NB) to classify phishing and non-phishing websites. These
algorithms were trained with two different groups of training in WEKA environment
and then were tested in terms of accuracy, precision, TP rate, and FP rate on a 3
different sets of dataset which contains dissimilar portion of phishing and non-phishing
instances. Results presented that Naïve Bayes classifier has better detection accuracy
between other classifiers for predicting phishing websites while Multi-Layer
Perceptron gave worst result in terms of detection accuracy. The result also showed
that Support Vector machine has better FP rate between other classifier. In addition,
Random Forest, Decision Tree, and Naïve Bayes can classify all phishing websites as
phishing correctly. It means that TP rate is 100% for these classifiers. In conclusion this
paper suggests using NB as the best classifier for predicting phishing and non-phishing
websites.

Classification of phishing websites using machine learning techniques

Affiliations

Abstract