Please use this identifier to cite or link to this item: http://dspace.aiub.edu:8080/jspui/handle/123456789/1681
Title: Strategies for enhancing the performance of news article classification in Bangla: Handling imbalance and interpretation
Authors: Khan Md Hasib, Nurul Akter Towhid
Kazi Omar Faruk, Jubayer Al Mahmud
M. F. Mridha
Issue Date: Oct-2023
Publisher: Elsevier
Abstract: The rapid increase in obtainable online text data has made text categorization an important tool for data analysts to extract relevant information on the web. However, incorrect or incomplete classification of marginalized groups may result from using biased text data. In order to remedy the disparity in available data, this research suggests a system for classifying and analyzing Bangla news articles. The suggested approach first uses both Random Under-Sampling (RUS) and Synthetic Minority Oversampling Techniques to balance the massive unbalanced Bangla News dataset consisting of 4,37,948 instances (SMOTE). Secondly, the proposed system employs three machine learning models: Logistic Regression, Decision Tree, and Stochastic Gradient Descent along with three deep learning models: Artificial Neural Network (ANN), Convolutional Neural Network (CNN), and Bidirectional Encoder Representations from Transformers (BERT) for Bangla text categorization. The experimental results signify the superior performance of BERT to other classification models of the system as well as other existing methods in this domain. The proposed system achieves the maximum accuracy of 99.04% in balanced dataset and 72.23% in imbalanced dataset using BERT. K-fold cross validation with varied K values is used to determine the performance consistency of BERT. Finally, both LIME (Local Interpretable Model agnostic Explanations and SHAP (SHapley Additive exPlanations) techniques are applied for interpreting each prediction made by BERT.
URI: http://dspace.aiub.edu:8080/jspui/handle/123456789/1681
Appears in Collections:Publications: Journals

Files in This Item:
File Description SizeFormat 
Dspace 4.docx4.66 MBMicrosoft Word XMLView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.