Acta Anatomica Sinica ›› 2025, Vol. 56 ›› Issue (1): 22-29.doi: 10.16098/j.issn.0529-1356.2025.01.003

Previous Articles     Next Articles

A melanoma diagnosis method based on large-scale vision-language models

ZHAO Jia-yue1,2  LI Shi-man1,2  ZHANG Chen-xi1,2*   

  1. 1.Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China; 
    2.Shanghai Key Laboratory of Medical Imaging Computing and Computer-Assisted Intervention, Shanghai 200032, China
  • Received:2024-06-13 Revised:2024-10-01 Online:2025-02-06 Published:2025-02-06
  • Contact: ZHANG Chen-xi E-mail:chenxizhang@fudan.edu.cn

Abstract:

Objective To develop a melanoma diagnosis framework based on large-scale vision-language models, and to explore the feasibility and accuracy of the framework for melanoma diagnosis.   Methods The publicly available Derm7pt dataset, which was divided into a training set (346 cases), a validation set (161 cases), and a test set (320 cases) was utilized. A melanoma diagnosis framework based on large-scale vision-language models was proposed, comprising two text branches and one visual branch. In the text branches, one branch processed fixed clinical prompts, while the other handled learnable prompts. This design aimed to optimize the effectiveness of learnable prompts through guidance from fixed clinical prompts. The visual branch processed dermoscopic images and enhanced melanoma feature recognition through fine-tuning the image encoder.   Results On the Derm7pt dataset, our method  outperformd other existing method. It achieved an area under the receiver operating characteristic curve (AUC) of 87.35%, an accuracy of 84.17%, and an F1-score of 84.01%.   Conclusion The study demonstrates that with appropriate fine-tuning strategies, methods based on large-scale vision-language pre-trained models can effectively adapt to melanoma diagnosis tasks. This approach can serve as a powerful auxiliary tool for doctors, helping them make more accurate diagnostic decisions.

Key words: Melanoma, Large-scale vision-language model, Fine-tuning, Diagnosis, Deep learning, Human

CLC Number: