Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

Mahmud, S. M. Hasan; Michael Goh, Kah Ong; Hosen, Md. Faruk; Nandi, Dip; Shoombuatong, Watshara

Please use this identifier to cite or link to this item: http://dspace.aiub.edu:8080/jspui/handle/123456789/2398

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mahmud, S. M. Hasan	-
dc.contributor.author	Michael Goh, Kah Ong	-
dc.contributor.author	Hosen, Md. Faruk	-
dc.contributor.author	Nandi, Dip	-
dc.contributor.author	Shoombuatong, Watshara	-
dc.date.accessioned	2024-09-22T04:07:34Z	-
dc.date.available	2024-09-22T04:07:34Z	-
dc.date.issued	2024-02-05	-
dc.identifier.citation	0	en_US
dc.identifier.issn	2045-2322	-
dc.identifier.uri	http://dspace.aiub.edu:8080/jspui/handle/123456789/2398	-
dc.description.abstract	DNA-binding proteins (DBPs) play a significant role in all phases of genetic processes, including DNA recombination, repair, and modification. They are often utilized in drug discovery as fundamental elements of steroids, antibiotics, and anticancer drugs. Predicting them poses the most challenging task in proteomics research. Conventional experimental methods for DBP identification are costly and sometimes biased toward prediction. Therefore, developing powerful computational methods that can accurately and rapidly identify DBPs from sequence information is an urgent need. In this study, we propose a novel deep learning-based method called Deep-WET to accurately identify DBPs from primary sequence information. In Deep-WET, we employed three powerful feature encoding schemes containing Global Vectors, Word2Vec, and fastText to encode the protein sequence. Subsequently, these three features were sequentially combined and weighted using the weights obtained from the elements learned through the differential evolution (DE) algorithm. To enhance the predictive performance of Deep-WET, we applied the SHapley Additive exPlanations approach to remove irrelevant features. Finally, the optimal feature subset was input into convolutional neural networks to construct the Deep-WET predictor. Both cross-validation and independent tests indicated that Deep-WET achieved superior predictive performance compared to conventional machine learning classifiers. In addition, in extensive independent test, Deep-WET was effective and outperformed than several state-of-the-art methods for DBP prediction, with accuracy of 78.08%, MCC of 0.559, and AUC of 0.805. This superior performance shows that Deep-WET has a tremendous predictive capacity to predict DBPs. The web server of Deep-WET and curated datasets in this study are available at https://deepwet-dna.monarcatechnical.com/. The proposed Deep-WET is anticipated to serve the community-wide effort for large-scale identification of potential DBPs.	en_US
dc.language.iso	en	en_US
dc.publisher	Nature	en_US
dc.relation.ispartofseries	14;2961	-
dc.subject	deep learning	en_US
dc.subject	DNA‑binding proteins	en_US
dc.subject	embedding techniques	en_US
dc.subject	weighted features	en_US
dc.title	Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features	en_US
dc.type	Article	en_US
Appears in Collections:	Publications: Journals

Files in This Item:

File	Description	Size	Format
Dspace.docx		4.66 MB	Microsoft Word XML	View/Open

Show simple item record

AIUB DSpace

Welcome to the Institutional Repository of American International University-Bangladesh. We preserve and enable easy and open access to all types of digital content including text, images, moving images, mpegs and data sets.