Abstract:
Lung cancer is among the most fatal cancers, necessitating advanced diagnostic tools for early detection. This study investigates the performance of a novel deep learning approach combining EfficientNet with an attention-based Transformer model to classify lung cancer images from Kaggle’s RSNA Pulmonary Embolism dataset. The dataset contains 250,000 CT scan images categorized into malignant and benign cases, requiring extensive preprocessing such as contrast enhancement and lung segmentation using U-Net architectures. The EfficientNet backbone is employed for high-resolution feature extraction, while the Transformer model enhances contextual understanding by focusing on critical regions of interest. The model is trained with a batch size of 64 for 100 epochs using the AdamW optimizer and cosine annealing learning rate scheduling. Experimental results indicate that the hybrid EfficientNetTransformer model achieves an accuracy of 94.2%, precision of 93.8%, and a recall of 96.4%, outperforming standalone CNN-based methods. The integration of attention mechanisms significantly improves classification robustness, emphasizing the potential of Transformer-based architectures in medical imaging applications. To further assess the model's reliability, Grad-CAM heatmaps were employed to visualize regions influencing predictions, ensuring interpretability in clinical settings. Additionally, domain adaptation techniques were explored to enhance generalization across different scanner modalities. These findings highlight the importance of combining EfficientNet with attention mechanisms to refine lung cancer diagnosis. The proposed model paves the way for AI-driven radiological assessments, improving early lung cancer detection and clinical decision-making.