Please use this identifier to cite or link to this item: http://dspace.aiub.edu:8080/jspui/handle/123456789/2919
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKazi, Redwan-
dc.contributor.authorSourav, Datto-
dc.contributor.authorMustakim, Ahmed-
dc.contributor.authorHabibur Rahman, Masum-
dc.contributor.authorMd. Faruk Abdullah Al, Sohan-
dc.contributor.authorAbu, Shufian-
dc.date.accessioned2025-12-14T08:56:12Z-
dc.date.available2025-12-14T08:56:12Z-
dc.date.issued2025-10-30-
dc.identifier.citation932en_US
dc.identifier.issn0142-0615-
dc.identifier.urihttp://dspace.aiub.edu:8080/jspui/handle/123456789/2919-
dc.description.abstractAccurate product price prediction is a major challenge in modern e-commerce. Product value depends on images, textual descriptions, and categorical attributes, yet many methods under utilize these modalities jointly. This paper presents a multimodal deep learning framework for price regression using two architectures. The first is an attention-based Functional API model that applies EfficientNetB1 for visual features, pretrained GloVe embeddings with a bidirectional LSTM for text, and trainable embeddings for categorical inputs, combined via late fusion. The second is a lightweight Sequential API model that uses a compact convolutional network for image features and a dense layer to merge modalities, targeting computational efficiency for resource-limited deployments. Both models are trained on the same category-filtered dataset (≥20 per class) with a single price–decile-stratified train/validation/test split (seed 42). Evaluation uses Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (𝑅2) on the original price scale after inverse transformation. Headline numbers are aggregated over repeated initializations on the same split (seeds {13, 21, 42, 77, 123}) and reported as mean ± SD with 95% confidence intervals. The Functional API model shows stronger performance under the same split and seeds, achieving MAE 5.37 ± 0.08 (95% CI [5.27, 5.47]), RMSE 7.32 ± 0.08 (95% CI [7.22, 7.42]), and 𝑅2 = 0.702 ± 0.007 (95% CI [0.693, 0.711]). The Sequential API model attains lower accuracy (MAE = 8.13, RMSE = 10.88, 𝑅2 = 0.43) but reduced training time and memory footprint. Ablation studies on the same split with repeated seeds isolate the effects of visual backbones, text encoders, fusion with/without attention, and loss functions. Preprocessing details, calibration checks, and decile and category-wise error summaries support transparency and reproducibility. The results establish a clear benchmark for multimodal regression in retail pricing, balancing predictive accuracy with operational feasibility.en_US
dc.language.isoenen_US
dc.publisherElsevier Arrayen_US
dc.relation.ispartofseries28;3-
dc.subjectCategory embeddingen_US
dc.subjectFunctional APIen_US
dc.subjectGloVe embeddingen_US
dc.subjectMultimodal price predictionen_US
dc.subjectSequential APIen_US
dc.titleA multimodal deep learning framework for integrating visual, textual and categorical features in retail price estimationen_US
dc.typeArticleen_US
Appears in Collections:Publications From Faculty of Engineering

Files in This Item:
File Description SizeFormat 
Shufian_2025_Elsevier (Array).docxShufian_2025_Elsevier (Array)4.14 MBMicrosoft Word XMLView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.