Surajit Giri, Sayak Das, Sutirtha Bharati Das, and Siddhartha BanerjeeThis email address is being protected from spambots. You need JavaScript enabled to view it.
Department of Computer Science, Ramakrishna Mission Residential College, Narendrapur, West Bengal, India
Received: March 16, 2022 Accepted: November 30, 2022 Publication Date: February 21, 2023
Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.
Unwanted text messages are called Spam SMSs. It has been proven that Machine Learning Models can categorize spam messages efficiently and with great accuracy. However, the lack of proper spam filtering software or misclassification of genuine SMS as spam by existing software, the use of spam detection applications has not become popular. In this paper, we propose multiple deep neural network models to classify spam messages. Tiago’s Dataset is used for this research. Initially, preprocessing step is applied to the messages in the data set, which involves lowercasing the text, tokenization, lemmatization of the text, and removal of numbers, punctuations, and stop words. These preprocessed messages are fed in two different deep learning models with simpler architectures, namely Convolution Neural Network and a hybrid Convolution Neural Network with Long Short-Term Memory Network for classification. To increase the accuracy of these two simple architectures, BUNOW and GloVe word embedding techniques are incorporated with deep learning models. BUNOW and GloVe are popular choices in sentiment analysis, but in this work, these two-word embedding techniques are tried in the context of text classification to improve accuracy. The best accuracy of 98.44% is achieved by the CNN LSTM BUNOW model after 15 epochs on a 70% - 30% train-test split. The proposed model can be used in many practical applications like real-time SMS spam detection, email spam detection, sentiment analysis, text categorization, etc.
[1] What is Text Message Marketing. https://www.tatango.com/.
[2] A. A. Helmy, Y. M. Omar, and R. Hodhod. “An innovative word encoding method for text classification using convolutional neural network”. In: 2018 14th international computer engineering conference (ICENCO). IEEE. 2018, 42–47. DOI: 10.1109/ICENCO.2018.8636143.
[3] J. Pennington, R. Socher, and C. D. Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, 1532–1543.
[4] T. Almeida and J. Hidalgo. SMS Spam Collection v.1. http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/.
[6] T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami.“Contributions to the study of SMS spam filtering: new collection and results”. In: Proceedings of the 11th ACM symposium on Document engineering. 2011, 259–262. DOI: 10.1145/2034691.2034742.
[7] P. Sethi, V. Bhandari, and B. Kohli. “SMS spam detection and comparison of various machine learning algorithms”. In: 2017 international conference on computing and communication technologies for smart nation (IC3TSN). IEEE. 2017, 28–31. DOI: 10.1109/IC3TSN.2017.8284445.
[8] P. Navaney, G. Dubey, and A. Rana. “SMS spam filtering using supervised machine learning algorithms”. In: 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE. 2018, 43–48. DOI: 10 . 1109 /CONFLUENCE. 2018.8442564.
[9] A. Alzahrani and D. B. Rawat. “Comparative study of machine learning algorithms for SMS spam detection”. In: 2019 SoutheastCon. IEEE. 2019, 1–6. DOI: 10.1109/SoutheastCon42311.2019.9020530.
[10] T. Xia and X. Chen, (2020) “A discrete hidden Markov model for SMS spam detection" Applied Sciences 10(14): 5011. DOI: 10.3390/app10145011.
[11] T. Xia and X. Chen, (2021) “A weighted feature enhanced Hidden Markov Model for spam SMS filtering" Neurocomputing 444: 48–58. DOI: 10.1016/j.neucom.2021.02.075.
[12] B. Diallo, J. Hu, T. Li, G. Khan, and C. Ji. “Conceptenhanced multi-view clustering of document data”. In: 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE. 2019, 1258–1264. DOI: 10.1109/ISKE47853.2019.9170436.
[13] B. Diallo, J. Hu, T. Li, G. A. Khan, and A. S. Hussein, (2022) “Multi-view document clustering based on geometrical similarity measurement" International Journal of Machine Learning and Cybernetics 13(3): 663–675. DOI: 10.1007/s13042-021-01295-8.
[14] R. Taheri and R. Javidan. “Spam filtering in SMS using recurrent neural networks”. In: 2017 Artificial Intelligence and Signal Processing Conference (AISP). IEEE.2017, 331–336. DOI: 10.1109/AISP.2017.8515158.
[15] G. Jain, M. Sharma, and B. Agarwal, (2019) “Optimizing semantic LSTM for spam detection" International Journal of Information Technology 11(2): 239–250. DOI: 10.1007/s41870-018-0157-5.
[16] M. Popovac, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla. “Convolutional neural network based SMS spam detection”. In: 2018 26th Telecommunications Forum (TELFOR). IEEE. 2018, 1–4. DOI: 10.1109/TELFOR.2018.8611916.
[17] S. Annareddy and S. Tammina. “A comparative study of deep learning methods for spam detection”. In: 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC).IEEE.2019, 66–72. DOI: 10.1109/I-SMAC47947.2019.9032627.
[18] P. K. Roy, J. P. Singh, and S. Banerjee, (2020) “Deep learning to filter SMS spam" Future Generation Computer Systems 102: 524–533. DOI: 10.1016/j.future.2019.09.001.
[19] A. Chandra and S. K. Khatri. “Spam SMS filtering using recurrent neural network and long short term memory”. In: 2019 4th International Conference on Information Systems and Computer Networks (ISCON). IEEE. 2019, 118–122. DOI: 10.1109/ISCON47742.2019.9036269.
[20] S. Kotni, D. Chandrasekhar Potala, and L. Sahoo, (2022) “Spam Detection Using Deep Learning Models" International Journal of Advanced Research in Engineering and Technology 13(5): 55–64. DOI: 10.17605/OSF.IO/NT4.
[21] O. Abayomi-Alli, S. Misra, and A. Abayomi-Alli, (2022) “A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset" Concurrency and Computation: Practice and Experience 34(17): 1–15. DOI: 10.1002/cpe.6989.
[22] B. Diallo, J. Hu, T. Li, G. A. Khan, X. Liang, and Y. Zhao, (2021) “Deep embedding clustering based on contractive autoencoder" Neurocomputing 433: 96–107. DOI: 10.1016/j.neucom.2020.12.094.
[23] M. A. Shaaban, Y. F. Hassan, and S. K. Guirguis, (2022) “Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text" Complex & Intelligent Systems: 1–13. DOI: 10.1007/s40747-022-00741-6.
[24] A. Ghourabi, M. A. Mahmood, and Q. M. Alzubi, (2020) “A hybrid CNN-LSTM model for SMS spam detection in Arabic and english messages" Future Internet 12(9): 156. DOI: 10.3390/fi12090156.
[25] Z. Jianqiang, G. Xiaolin, and Z. Xuejun, (2018) “Deep convolution neural networks for twitter sentiment analysis" IEEE access 6: 23253–23260. DOI: 10.1109/ACCESS.2017.2776930.
[26] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, and H. Veisi, (2019) “Sentiment analysis based on improved pre-trained word embeddings" Expert Systems with Applications 117: 139–147. DOI: 10.1016/j.eswa.2018.08.044.
[27] A. K. Uysal and S. Gunal, (2014) “The impact of preprocessing on text classification" Information processing & management 50(1): 104–112. DOI: 10.1016/j.ipm.2013.08.006.
[28] J. Camacho-Collados and M. T. Pilehvar, (2017) “On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis" arXiv preprint arXiv:1707.01780: DOI: 10.48550/arXiv.1707.01780.
[29] S.Weidman. Deep learning from scratch: building with python from first principles. O’Reilly Media, 2019.
[30] A. C. Michalos. Encyclopedia of quality of life and wellbeing research. Springer Netherlands Dordrecht, 2014.
We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.