Criminality recognition using machine learning on Malay language tweets

Nurul Hashimah Ahamed Hassain Malim; Sagadevan, Saravanan; Nurul Izzati Ridzuwan

A large scale of investigation had been carried out to predict the personality, or in precise, the behaviour of online users through user-generated texts, such as Tweets and status messages. Nevertheless, only a handful of machine learning (ML) studies have applied the personality model to assess criminality behaviour, particularly within the context of Malay social network messages. Based on the concept of sentiment valence, this study annotated a list of Malay Tweets that might be subjected to crime or illicit messages from the stance of Psychoticism trait. Consequently, the supervised-based text classification method was conducted by using NaÃƒÂ¯ve Bayes (NB), Sequential Minimal Optimisation (SMO), and Decision Tree (DT) on Tweets using several features determined via Chi Square (x2). The analyses outcomes signified that SMO outperformed other classifiers insignificantly by achieving 92.85% of accuracy. Based on x2, several swear terms, such as bontot, melancap, and kote, displayed significant correlation with Psychoticism Tweets due to the nature of the trait that has been subjected to criminality behaviour, for instance, aggressive and antisocial attributes. The findings illustrate the possibilities to adapt several personality aspects in order to enhance the effectiveness in detecting illicit social network messages.

Criminality recognition using machine learning on Malay language tweets

Affiliations

Abstract