ADVANCED MULTIMODAL AI FRAMEWORK FOR ENHANCED DIABETIC RETINOPATHY DIAGNOSIS AND SEVERITY CLASSIFICATION
DOI:
https://doi.org/10.55197/qjoest.v6i4.268Keywords:
diabetic retinopathy, deep learning, multimodal fusion, convolutional neural networks, transformers, attention mechanismsAbstract
Diabetic Retinopathy (DR) continues to be a primary cause of visual impairment and blindness among individuals with diabetes globally, affecting millions and underscoring the urgent need for robust, scalable screening mechanisms. This extended study presents an advanced multimodal deep learning framework that synergistically combines fundus photography and Optical Coherence Tomography (OCT) imaging modalities to achieve superior detection and severity grading of DR. By employing a hybrid architecture that integrates Convolutional Neural Networks (CNNs) for spatial feature extraction and Transformers for capturing long-range dependencies in sequential data, the proposed model leverages attention mechanisms to fuse multimodal inputs effectively. Evaluated on benchmark datasets such as EyePACS, IDRiD, and Duke OCT, the framework demonstrates exceptional performance metrics: 98% accuracy, 96% sensitivity, 97% specificity, and an Area Under the Curve (AUC) of 0.99 for binary classification tasks, surpassing existing state-of-the-art approaches. To enhance interpretability, the model incorporates Explainable AI (XAI) techniques, including Gradient-weighted Class Activation Mapping (Grad-CAM), for precise lesion localization. Class imbalance issues are mitigated through sophisticated data augmentation strategies, including Synthetic Minority Over-sampling Technique (SMOTE) and generative adversarial networks (GANs)-based synthesis. This work provides detailed mathematical derivations for custom loss functions, evaluation metrics, and optimization algorithms, accompanied by comprehensive visualizations such as confusion matrices, Receiver Operating Characteristic (ROC) curves, precision-recall curves, and training convergence plots. From a Computer Science Engineering (CSE) perspective, the framework emphasizes computational efficiency, enabling real-time inference on edge devices and potential deployment in resource-constrained environments, thereby reducing healthcare costs and improving accessibility in underserved regions. This extension expands on the original contributions by including in-depth ablation studies, comparative analyses with recent 2025 models, ethical considerations, and deployment strategies.
References
[1] Akhtar, S., Aftab, S., Ali, O., Ahmad, M., Khan, M.A., Abbas, S., Ghazal, T.M. (2025): A deep learning based model for diabetic retinopathy grading. – Scientific Reports 15(1): 20p.
[2] Alqahtani, A.S., Alshareef, W.M., Aljadani, H.T., Hawsawi, W.O., Shaheen, M.H. (2025): The efficacy of artificial intelligence in diabetic retinopathy screening: a systematic review and meta-analysis. – International Journal of Retina and Vitreous 11(1): 12p.
[3] Parmar, U.P.S., Surico, P.L., Singh, R.B., Romano, F., Salati, C., Spadea, L., Musa, M., Gagliano, C., Mori, T., Zeppieri, M. (2024): Artificial intelligence (AI) for early diagnosis of retinal diseases. – Medicina 60(4): 15p.
[4] Rajalakshmi, R., Pramodkumar, T.A., Naziyagulnaaz, A.S., Anjana, R.M., Raman, R., Manikandan, S., Mohan, V. (2025): Leveraging artificial intelligence for diabetic retinopathy screening and management: history and current advances. – In Seminars in Ophthalmology 40(8): 719-726.
[5] Tiwari, V.K., Singh, P. (2025): Optimized Deep Learning Approach for Motor Imagery EEG Classification. – American Journal of Networks and Communications 14(1): 23-29.
[6] Venkatesan, B., Ragupathy, U.S. (2022): Integrated fusion framework using hybrid domain and deep neural network for multimodal medical images. – Multidimensional Systems and Signal Processing 33(3): 819-834.
[7] Wardhani, K.D.K., Kasim, S., Hassan, R., Hidayat, R., Sujon, K.M. (2025): Deep Learning for Diabetic Retinopathy Detection: A Review of Multimodal Data Fusion Approaches. – Research Square 46p.
[8] Wu, H., Jin, K., Jing, Y., Shen, W., Tham, Y.C., Pan, X., Koh, V., Grzybowski, A., Ye, J. (2025): DRAMA: Diabetic Retinopathy Assessment through Multi-task Learning Approach on Heterogeneous Fundus Image Datasets. – Ophthalmology Science 13p.
[9] Xie, F., Zhang, P., Jiang, T., She, J., Shen, X., Xu, P., Zhao, W., Gao, G., Guan, Z. (2021): Lesion segmentation framework based on convolutional neural networks with dual attention mechanism. – Electronics 10(24): 17p.
[10] Zedadra, A., Zedadra, O., Salah-Salah, M.Y., Guerrieri, A. (2025): Graph-Aware Multimodal Deep Learning for Classification of Diabetic Retinopathy Images. – IEEE Access 12p.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 VIRENDRA KUMAR TIWARI, JITENDRA AGRAWAL, SANJAY BAJPAI, KAVITA KANATHEY

This work is licensed under a Creative Commons Attribution 4.0 International License.