IMPROVING ARABIC SENTIMENT ANALYSIS ON SOCIAL MEDIA: A COMPARATIVE STUDY ON APPLYING DIFFERENT PRE-PROCESSING TECHNIQUES
AbstractRegardless of the clear growth of Arabic texts on social networking sites (SNSs), it is still difficult to understand or summarize users' opinions or perspectives on a specific topic. Accordingly, Arabic text classification is one of the most challenging topics. This is because of several issues related to the nature of the Arabic language and words that have different variation in meaning. In this paper, after tokenizing the Arabic words, we investigate the role of several pre-processing techniques before classifying Arabic text into different categories. Arabic words were converted into vectors using the term frequency-inverse document frequency (TF-IDF) technique. The findings show that applying Linear Support Vector Machine (LSVC) with stop words and without stemming techniques can outperform the application of Decision Tree (DT) and Random Forest (RF) methods. It was found that the effectiveness of the proposed LSVC is 99.37%. These outcomes are significant to identify users' opinions on SNSs and can have many implications on political, social, economic, and business sectors.
N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. Al-Ayyoub, â€œArabic sentiment analysis: Lexicon-based and corpus-based,â€ in 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), 2013, pp. 1â€“6.
A. Al-Azawei, â€œPredicting the adoption of social media: An integrated model and empirical study on Facebook usage,â€ Interdiscip. J. Information, Knowledge, Manag., vol. 13, pp. 233â€“258, 2018.
â€œThe Statistics Portal.â€ [Online]. Available: https://www.statista.com/statistics/303681/twitter-users-worldwide/. [Accessed: 30-Nov-2018].
A. A. Jamal, R. O. Keohane, D. Romney, and D. Tingley, â€œAnti-Americanism and anti-interventionism in Arabic twitter discourses,â€ Perspect. Polit., vol. 13, no. 1, pp. 55â€“73, 2015.
E. Haddi, X. Liu, and Y. Shi, â€œThe role of text pre-processing in sentiment analysis,â€ Procedia Comput. Sci., vol. 17, pp. 26â€“32, 2013.
A. Wahbeh, M. Al-Kabi, Q. Al-Radaideh, E. Al-Shawakfa, and I. Alsmadi, â€œThe effect of stemming on Arabic text classification: an empirical study,â€ Int. J. Inf. Retr. Res., vol. 1, no. 3, pp. 54â€“70, 2011.
A. Ayedh, G. Tan, K. Alwesabi, and H. Rajeh, â€œThe effect of preprocessing on Arabic document categorization,â€ Algorithms, vol. 9, no. 2, p. 27, 2016.
S. F. Sayeedunnissa, A. R. Hussain, and M. A. Hameed, â€œSupervised opinion mining of social network data using a bag-of-words approach on the cloud,â€ in Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012), 2013, pp. 299â€“309.
S. A. Yousif, V. W. Samawi, I. Elkabani, and R. Zantout, â€œEnhancement of Arabic Text Classification Using Semantic Relations with Part of Speech Tagger,â€ W Trans. Adv. Electr. Comput. Eng., pp. 195â€“201, 2015.
Y. A. Alhaj, J. Xiang, D. Zhao, M. A. A. Al-Qaness, M. A. Elaziz, and A. Dahou, â€œA Study of the Effects of Stemming Strategies on Arabic Document Classification,â€ IEEE Access, 2019.
R. M. Sallam, H. M. Mousa, and M. Hussein, â€œImproving Arabic text categorization using normalization and stemming techniques,â€ Int. J. Comput. Appl., vol. 135, no. 2, pp. 38â€“43, 2016.
B. Al-Shargabi, W. Al-Romimah, and F. Olayah, â€œA comparative study for Arabic text classification algorithms based on stop words elimination,â€ in Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, 2011, p. 11.
A. M. Alayba, V. Palade, M. England, and R. Iqbal, â€œImproving sentiment analysis in Arabic using word representation,â€ in 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 2018, pp. 13â€“18.
F. S. Gharehchopogh and Z. A. Khalifelu, â€œAnalysis and evaluation of unstructured data: Text mining versus natural language processing,â€ Int. J. Acad. Res. Comput. Eng., no. November, 2011.
C. Aroran and Dr.Rachna, â€œSentiment Analysis on Twitter Data,â€ Int. Res. J. Eng. Technol., vol. 14, no. 2, pp. 831â€“837, 2017.
F. Thabtah, O. Gharaibeh, and R. Al-Zubaidy, â€œArabic text mining using rule based classification,â€ J. Inf. Knowl. Manag., vol. 11, no. 01, p. 1250006, 2012.
T. Kanan and E. A. Fox, â€œAutomated Arabic text classification with P-S temmer, machine learning, and a tailored news article taxonomy,â€ J. Assoc. Inf. Sci. Technol., vol. 67, no. 11, pp. 2667â€“2683, 2016.
D. Sarkar, Text analytics with Python: A practical real-world approach to gaining actionable insights from your data. Apress, New York, 2016.
A. Krouska, C. Troussas, and M. Virvou, â€œThe effect of preprocessing techniques on Twitter sentiment analysis,â€ in 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), 2016, pp. 1â€“5.
Z. Li, â€œA data classification algorithm of internet of things based on neural network,â€ Int. J. Online Eng., vol. 13, no. 09, pp. 28â€“37, 2017.
X. Wu, and V. Kumar, â€œThe Top Ten Algorithms in Data Mining,â€Data Mining and Knowledge Discovery Series, CRC Press, United Statesof America, 2009.
G. Stein, B. Chen, A. S. Wu, and K. A. Hua, â€œDecision tree classifier for network intrusion detection with GA-based feature selection,â€ in Proceedings of the 43rd annual Southeast regional conference-Volume 2, 2005, pp. 136â€“141.
S. Agarwal, G. N. Pandey, and M. D. Tiwari, â€œData mining in education: data classification and decision tree approach,â€ Int. J. e-Education, e-Business, e-Management e-Learning, vol. 2, no. 2, p. 140, 2012.
T. A. Wotaifi and E. S. Al-Shamery, â€œFuzzy-Filter Feature Selection for Envisioning the Earnings of Higher Education Graduates,â€ Compusoft, vol. 7, no. 12, pp. 2969â€“2975, 2018.
A.-L. Boulesteix, S. Janitza, J. Kruppa, and I. R. KÃ¶nig, â€œOverview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics,â€ Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 6, pp. 493â€“507, 2012.
B. Luo, Q. Zhang, and S. D. Mohanty, â€œData-Driven Exploration of Factors Affecting Federal Student Loan Repayment,â€ arXiv Prepr. arXiv1805.01586, 2018.
H. M. Habeeb, A. Al-Azawei, and N. Al-Aâ€™araji, â€œDeveloping a Healthcare Recommender System Using an Enhanced Symptoms-Based Collaborative Filtering Technique,â€ J. Comput. Theor. Nanosci., vol. 16, no. 3, pp. 925â€“931, 2019.
The submitter hereby warrants that the Work (collectively, the “Materials”) is original and that he/she is the author of the Materials. To the extent the Materials incorporate text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permissions. Where necessary, the undersigned has obtained all third party permissions and consents to grant the license above and has all copies of such permissions and consents.
The submitter represents that he/she has the power and authority to make and execute this assignment. The submitter agrees to indemnify and hold harmless the COMPUSOFT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above. For authenticity, validity and originality of the research paper the author/authors will be totally responsible.