Please use this identifier to cite or link to this item: http://dspace.aiub.edu:8080/jspui/handle/123456789/2661
Title: Enhancing Bangla Local Speech-to-Text Conversion Using Fine-Tuning Wav2vec 2.0 with OpenSLR and Self-Compiled Datasets Through Transfer Learning
Authors: Hossain, S.K. Muktadir
Rihan, Rahat
Ahmed, Imtiaz
Boni, Pritam Khan
Gomes, Dipta
Keywords: Bangla Speech Recognition
wav2vec 2.0
Transfer Learning
Speech Technology
Automatic Speech Recognition (ASR)
Issue Date: 15-Mar-2025
Publisher: IEOM Society International
Abstract: An improved method to create an enhanced Bangla standard and local speech. The wav2vec 2.0 model has been fine-tuned using additional datasets collected alongside OpenSLR data. Our findings have shown that there are gains in transcrip- tion accuracy of as much as eleven percent, which is impressive given the low resources and languages employed, proving the merits of transfer learning and fine-tuning. The work of the research is aimed at expanding the knowledge base concerning the use of novel deep learning algorithms in small languages in the field of speech technology. The evaluation metrics included Word Error Rate (WER) and Character Error Rate (CER), with the fine-tuned model achieving an overall WER of 11.27% and CER of 6.03%. Comparative analysis with previous work shows a significant improvement from baseline models, highlighting the efficacy of the wav2vec 2.0 model in leveraging large and diverse datasets. The experimental setup was supported by a cluster computing environment with NVIDIA CUDA-compatible GPUs, underscoring the computational resources required for effective Automatic Speech Recognition (ASR) model training. The re- sults demonstrate substantial advancements in ASR performance for Bengali, with the fine-tuned model outperforming previous benchmarks and showcasing the benefits of self-supervised learn- ing approaches.
Description: None
URI: http://dspace.aiub.edu:8080/jspui/handle/123456789/2661
ISBN: 979-8-3507-4443-9
ISSN: 2169-8767
Appears in Collections:Publications: Journals

Files in This Item:
File Description SizeFormat 
Enhancing Bangla Local STT.pdfFirst paper Manuscript151.85 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.