Please use this identifier to cite or link to this item:
http://dspace.aiub.edu:8080/jspui/handle/123456789/2662
Title: | Enhancing Bangla Local Speech-to-Text Conversion Using Fine-Tuning Wav2vec 2.0 with OpenSLR and Self-Compiled Datasets Through Transfer Learning |
Authors: | Hossain, SK Muktadir Rihan, Md Rahat Imtiaz, Ahmed Boni, SK Muktadir Gomes, Dipta |
Keywords: | Bangla Speech Recognition Automatic Speech Recognition (ASR) Speech Technology wav2vec 2.0 Transfer Learning |
Issue Date: | 15-Mar-2025 |
Publisher: | IEOM Society International |
Citation: | Hossain, S., Rihan, R., Imtiaz, A., Boni, P., & Gomes, D. (2024, December). Enhancing Bangla Local Speech-to-Text Conversion Using Fine-Tuning Wav2vec 2.0 with OpenSLR and Self-Compiled Datasets Through Transfer Learning. In 7th IEOM Bangladesh International Conference on Industrial Engineering and Operations Management, https://doi.org/10.46254/BA07.20240161. |
Abstract: | An improved method to create an enhanced Bangla standard and local speech. The wav2vec 2.0 model has been fine-tuned using additional datasets collected alongside OpenSLR data. Our findings have shown that there are gains in transcrip- tion accuracy of as much as eleven percent, which is impressive given the low resources and languages employed, proving the merits of transfer learning and fine-tuning. The work of the research is aimed at expanding the knowledge base concerning the use of novel deep learning algorithms in small languages in the field of speech technology. The evaluation metrics included Word Error Rate (WER) and Character Error Rate (CER), with the fine-tuned model achieving an overall WER of 11.27% and CER of 6.03%. Comparative analysis with previous work shows a significant improvement from baseline models, highlighting the efficacy of the wav2vec 2.0 model in leveraging large and diverse datasets. The experimental setup was supported by a cluster computing environment with NVIDIA CUDA-compatible GPUs, underscoring the computational resources required for effective Automatic Speech Recognition (ASR) model training. The re- sults demonstrate substantial advancements in ASR performance for Bengali, with the fine-tuned model outperforming previous benchmarks and showcasing the benefits of self-supervised learn- ing approaches. |
URI: | https://index.ieomsociety.org/index.cfm/article/view/ID/28375 http://dspace.aiub.edu:8080/jspui/handle/123456789/2662 |
ISBN: | 979-8-3507-4443-9 |
ISSN: | 2169-8767 |
Appears in Collections: | Publications: Journals |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Enhancing Bangla Local STT-wav2vec.pdf | First Page Manuscript | 151.85 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.