Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation

Shakil, Ahmed; A F M, Saifuddin Saif; Md, Imtiaz Hanif; Md, Mostofa Nurannabi Shakil; Md, Mostofa Jaman; Md, Mazid Ul Haque; Siam, Bin Shawkat; Jahid, Hasan; Borshan, Sarker Sonok; Farzad, Rahman; Hasan Muhommod, Sabbir

Please use this identifier to cite or link to this item: http://dspace.aiub.edu:8080/jspui/handle/123456789/240

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shakil, Ahmed	-
dc.contributor.author	A F M, Saifuddin Saif	-
dc.contributor.author	Md, Imtiaz Hanif	-
dc.contributor.author	Md, Mostofa Nurannabi Shakil	-
dc.contributor.author	Md, Mostofa Jaman	-
dc.contributor.author	Md, Mazid Ul Haque	-
dc.contributor.author	Siam, Bin Shawkat	-
dc.contributor.author	Jahid, Hasan	-
dc.contributor.author	Borshan, Sarker Sonok	-
dc.contributor.author	Farzad, Rahman	-
dc.contributor.author	Hasan Muhommod, Sabbir	-
dc.date.accessioned	2022-01-05T07:55:36Z	-
dc.date.available	2022-01-05T07:55:36Z	-
dc.date.issued	2021-12-29	-
dc.identifier.citation	Ahmed, S.; Saif, A.F.M.S.; Hanif, M.I.; Shakil, M.M.N.; Jaman, M.M.; Haque, M.M.U.; Shawkat, S.B.; Hasan, J.; Sonok, B.S.; Rahman, F.; Sabbir, H.M. Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation. Appl. Sci. 2022, 12, 317. https://doi.org/10.3390/app12010317	en_US
dc.identifier.issn	2076-3417	-
dc.identifier.uri	http://dspace.aiub.edu:8080/jspui/handle/123456789/240	-
dc.description.abstract	With the advancement of the technological field, day by day, people from around the world are having easier access to internet abled devices, and as a result, video data is growing rapidly. The increase of portable devices such as various action cameras, mobile cameras, motion cameras, etc., can also be considered for the faster growth of video data. Data from these multiple sources need more maintenance to process for various usages according to the needs. By considering these enormous amounts of video data, it cannot be navigated fully by the end-users. Throughout recent times, many research works have been done to generate descriptions from the images or visual scene recordings to address the mentioned issue. This description generation, also known as video captioning, is more complex than single image captioning. Various advanced neural networks have been used in various studies to perform video captioning. In this paper, we propose an attention-based Bi-LSTM and sequential LSTM (Att-BiL-SL) encoder-decoder model for describing the video in textual format. The model consists of two-layer attention-based bi-LSTM and one-layer sequential LSTM for video captioning. The model also extracts the universal and native temporal features from the video frames for smooth sentence generation from optical frames. This paper includes the word embedding with a soft attention mechanism and a beam search optimization algorithm to generate qualitative results. It is found that the architecture proposed in this paper performs better than various existing state of the art models.	en_US
dc.description.sponsorship	Self	en_US
dc.language.iso	en	en_US
dc.publisher	MDPI	en_US
dc.subject	Video captioning	en_US
dc.subject	Bi-directional long short-term memory	en_US
dc.subject	Attention-mechanism	en_US
dc.subject	Video to text	en_US
dc.subject	Video description generation	en_US
dc.title	Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation	en_US
dc.type	Article	en_US
Appears in Collections:	Publications: Conference

Files in This Item:

File	Description	Size	Format
DSpace_Att-BiL-SL Attention-Based Bi-LSTM and Sequential LSTM.docx		3.57 MB	Microsoft Word XML	View/Open

Show simple item record

AIUB DSpace

Welcome to the Institutional Repository of American International University-Bangladesh. We preserve and enable easy and open access to all types of digital content including text, images, moving images, mpegs and data sets.