Political sentiment analysis using natural language processing on social media
DOI:
https://doi.org/10.58190/ijamec.2024.108Keywords:
Sentiment Classification, Sentiment Analysis Applications , Sentiment analysis, Natural Language ProcessingAbstract
In this contemporary era , social media has become an essential component of daily life as a result of the extensive use of the internet. This paper explores sentiment analysis of political topics through social media comments. We collected a large dataset of over 14,000 political comments and applied advanced machine learning models such as logistic regression , linear support vector classification , random forest, decision tree classification , and naive bayes to evaluate expressed sentiments. Performance metrics , including accuracy , precision , recall , and F1 scores , were utilized to assess these models , with Linear SVC achieving the highest accuracy at 91.18% , closely followed by Logistic Regression at 90%. This research not only evaluates model performance on political sentiment data but also addresses data imbalance, presenting actionable insights into each algorithm’s suitability. Our study introduces a refined approach to political sentiment analysis by optimizing model selection for high accuracy and robustness, thus setting a foundation for effective political sentiment understanding on social media platforms.
Downloads
References
[1] R. S. Jagdale, V. S. Shirsat, and S. N. Deshmukh, “Sentiment analysis on product reviews using machine learning techniques,” in Cognitive Informatics and Soft Computing: Proceeding of CISC 2017, Springer, 2019, pp. 639–647.
[2] A. Ceron, L. Curini, S. M. Iacus, and G. Porro, “Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France,” New media Soc., vol. 16, no. 2, pp. 340–358, 2014.
[3] M. Hassan et al., “Sentiment analysis on Bangla conversation using machine learning approach,” Int. J. Electr. Comput. Eng., vol. 12, no. 5, pp. 5562–5572, 2022.
[4] S. Datta, “Political violence in Bangladesh: trends and causes,” Strateg. Anal., vol. 29, no. 3, pp. 427–447, 2005.
[5] B. Agarwal, N. Mittal, P. Bansal, and S. Garg, “Sentiment analysis using common-sense and context information,” Comput. Intell. Neurosci., vol. 2015, p. 30, 2015.
[6] U. T. Gursoy, D. Bulut, and C. Yigit, “Social media mining and sentiment analysis for brand management,” Glob. J. Emerg. Trends e-Business, Mark. Consum. Psychol., vol. 3, no. 1, pp. 497–551, 2017.
[7] M. V Mäntylä, D. Graziotin, and M. Kuutila, “The evolution of sentiment analysis—A review of research topics, venues, and top cited papers,” Comput. Sci. Rev., vol. 27, pp. 16–32, 2018.
[8] N. Mishra and C. K. Jha, “Classification of opinion mining techniques,” Int. J. Comput. Appl., vol. 56, no. 13, 2012.
[9] L. Mathew and V. R. Bindu, “A review of natural language processing techniques for sentiment analysis using pre-trained models,” in 2020 Fourth international conference on computing methodologies and communication (ICCMC), IEEE, 2020, pp. 340–345.
[10] D. Das, S. Guha, J. Brubaker, and B. Semaan, “The" Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases,” arXiv Prepr. arXiv2401.10535, 2024.
[11] K. M. Islam, M. S. Reza, and M. D. Yeaser, “Sentiment analysis using Natural Language Processing (NLP) & deep learning.” Brac University, 2021.
[12] M. Tusar and M. T. Islam, “A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques on US Airline Twitter Data. arXiv 2021,” arXiv Prepr. arXiv2110.00859.
[13] M. A. Shafin, M. M. Hasan, M. R. Alam, M. A. Mithu, A. U. Nur, and M. O. Faruk, “Product review sentiment analysis by using nlp and machine learning in bangla language,” in 2020 23rd International Conference on Computer and Information Technology (ICCIT), IEEE, 2020, pp. 1–5.
[14] N. N. Moon et al., “Natural language processing based advanced method of unnecessary video detection,” Int. J. Electr. Comput. Eng., vol. 11, no. 6, pp. 5411–5419, 2021.
[15] G. Gautam and D. Yadav, “Sentiment analysis of twitter data using machine learning approaches and semantic analysis,” in 2014 Seventh international conference on contemporary computing (IC3), IEEE, 2014, pp. 437–442.
[16] V. Goel, A. K. Gupta, and N. Kumar, “Sentiment analysis of multilingual twitter data using natural language processing,” in 2018 8th International Conference on Communication Systems and Network Technologies (CSNT), IEEE, 2018, pp. 208–212.
[17] W. Jin, H. H. Ho, and R. K. Srihari, “A novel lexicalized HMM-based learning framework for web opinion mining,” in Proceedings of the 26th annual international conference on machine learning, Citeseer, 2009.
[18] G. Qiu, B. Liu, J. Bu, and C. Chen, “Opinion word expansion and target extraction through double propagation,” Comput. Linguist., vol. 37, no. 1, pp. 9–27, 2011.
[19] J. Camacho-Collados and M. T. Pilehvar, “On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis,” arXiv Prepr. arXiv1707.01780, 2017.
[20] Z. Jianqiang, G. Xiaolin, and Z. Xuejun, “Deep convolution neural networks for twitter sentiment analysis,” IEEE access, vol. 6, pp. 23253–23260, 2018.
[21] L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text preprocessing for text mining in organizational research: Review and recommendations,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022.
[22] S. Malik, S. A. Sani, A. Baqir, U. Ahmad, and F. ul Mustafa, “Preprocessing Techniques in Text Categorization: A Survey,” in Intelligent Technologies and Applications: Second International Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers 2, Springer, 2020, pp. 502–509.
[23] A. Kurniasih and L. P. Manik, “On the role of text preprocessing in BERT embedding-based DNNs for classifying informal texts,” Neuron, vol. 1024, no. 512, p. 256, 2022.
[24] P. Prakrankamanant and E. Chuangsuwanich, “Tokenization-based data augmentation for text classification,” in 2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE), IEEE, 2022, pp. 1–6.
[25] C. Toraman, E. H. Yilmaz, F. Şahinuç, and O. Ozcelik, “Impact of tokenization on language models: An analysis for turkish,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 22, no. 4, pp. 1–21, 2023.
[26] R. Friedman, “Tokenization in the Theory of Knowledge,” Encyclopedia, vol. 3, no. 1, pp. 380–386, 2023.
[27] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, p. e0232525, 2020.
[28] H. Woo, J. Kim, and W. Lee, “Validation of text data preprocessing using a neural network model,” Math. Probl. Eng., vol. 2020, pp. 1–9, 2020.
[29] M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving text preprocessing for student complaint document classification using sastrawi,” in IOP Conference Series: Materials Science and Engineering, IOP Publishing, 2020, p. 12017.
[30] B. Gupta, M. Negi, K. Vishwakarma, G. Rawat, P. Badhani, and B. Tech, “Study of Twitter sentiment analysis using machine learning algorithms on Python,” Int. J. Comput. Appl., vol. 165, no. 9, pp. 29–34, 2017.
[31] S. Sarica and J. Luo, “Stopwords in technical language processing,” PLoS One, vol. 16, no. 8, p. e0254937, 2021.
[32] M. Arshi Saloot and D. Nghia Pham, “Real-time Text Stream Processing: A Dynamic and Distributed NLP Pipeline,” in 2021 International Symposium on Electrical, Electronics and Information Engineering, 2021, pp. 575–584.
[33] R. Cabrera, X. Liu, M. Ghodsi, Z. Matteson, E. Weinstein, and A. Kannan, “Language model fusion for streaming end to end speech recognition,” arXiv Prepr. arXiv2104.04487, 2021.
[34] Z. Chen, Y. Zhang, A. Rosenberg, B. Ramabhadran, P. Moreno, and G. Wang, “Tts4pretrain 2.0: Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 7677–7681.
[35] N. Tabassum et al., “Semantic analysis of Urdu english tweets empowered by machine learning,” Intell. Autom. Soft Comput., vol. 30, no. 1, pp. 175–186, 2021.
[36] F. Es-sabery, K. Es-sabery, H. Garmani, J. Qadir, and A. Hair, “Evaluation of different extractors of features at the level of sentiment analysis,” INFOCOMMUNICATIONS J. A Publ. Sci. Assoc. INFOCOMMUNICATIONS, vol. 14, no. 2, pp. 85–96, 2022.
[37] A. Jalilifard, V. F. Caridá, A. F. Mansano, R. S. Cristo, and F. P. C. da Fonseca, “Semantic sensitive TF-IDF to determine word relevance in documents,” in Advances in Computing and Network Communications: Proceedings of CoCoNet 2020, Volume 2, Springer, 2021, pp. 327–337.
[38] M. Kwemou, M.-L. Taupin, and A.-S. Tocquet, “Model selection in logistic regression,” arXiv Prepr. arXiv1508.07537, 2015.
[39] E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” in Informatics, MDPI, 2021, p. 79.
[40] D. Makariou, P. Barrieu, and Y. Chen, “A random forest based approach for predicting spreads in the primary catastrophe bond market,” Insur. Math. Econ., vol. 101, pp. 140–162, 2021.
[41] Y.-Y. Song and L. U. Ying, “Decision tree methods: applications for classification and prediction,” Shanghai Arch. psychiatry, vol. 27, no. 2, p. 130, 2015.
[42] S. Taheri and M. Mammadov, “Learning the naive Bayes classifier with optimization models,” Int. J. Appl. Math. Comput. Sci., vol. 23, no. 4, pp. 787–795, 2013.
[43] D. Probst, “Aiming beyond slight increases in accuracy,” Nat. Rev. Chem., vol. 7, no. 4, pp. 227–228, 2023.
[44] R. Yacouby and D. Axman, “Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models,” in Proceedings of the first workshop on evaluation and comparison of NLP systems, 2020, pp. 79–91.
[45] D. Park and S. Kim, “Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20099–20109.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Applied Methods in Electronics and Computers
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.