HYBRID EVENT: Join us in person in Tokyo, Japan or attend virtually from anywhere.
Yan Leyfman, Speaker at Oncology Conferences
NYP- Meyer Cancer Center, United States

Abstract:

Background: Large language models (LLMs) are increasingly used in oncology clinical workflows; however, real-world clinician adoption has outpaced formal institutional guidance and systematic safety validation. Existing evaluations often rely on aggregate performance metrics, which may obscure disease-specific safety risks. We sought to (1) characterize real-world AI use and verification behaviors among oncology clinicians and (2) assess disease-dependent safety failures of LLM-based clinical decision support systems.

Methods: We conducted a voluntary, anonymous cross-sectional survey of oncology clinicians evaluating AI access, patterns of use, verification behaviors, perceived accountability, and responses to a standardized oncology clinical scenario. Separately, we developed 216 simulated tumor-board vignettes across five oncology domains: leukemia (n=30), breast (n=50), gastrointestinal (n=50), central nervous system metastases (n=50), and gynecologic malignancies (n=50). Each vignette was evaluated using three LLM configurations: (1) unconstrained LLM, (2) NCCN guideline-anchored retrieval-augmented generation (RAG), and (3) literature-anchored RAG. Outputs were independently scored by two board-certified oncologists using a modified Generative Performance Score (mGPS; −1 to +1), incorporating guideline concordance and hallucination penalties. Safety disparity was conservatively defined as the highest severity across scoring axes.

Results: Thirty-one clinicians completed the survey, including fellows (45%) and attending oncologists (29%), representing academic and community practice settings. Despite limited or uncertain access to institution-approved AI tools, nearly all respondents reported independent AI use for professional tasks. Most clinicians reported routinely verifying AI outputs against guidelines or primary literature and maintaining clinician-centered accountability for AI-related errors, while formal institutional governance was frequently absent.

In vignette-based evaluation, NCCN-anchored RAG demonstrated improved guideline concordance and reduced hallucinations compared with unconstrained models; however, safety performance varied substantially by disease context. Leukemia demonstrated predominantly low-to-intermediate safety disparity (93%), whereas high disparity was observed in CNS metastases (80%) and gynecologic malignancies (70%), driven by concurrent hallucinations, staging errors, and inappropriate extrapolation. Readability scores did not correlate with safety, frequently obscuring clinically significant errors.

Conclusions: Oncology clinicians are already integrating AI into clinical practice with high levels of independent verification but limited institutional oversight. LLM safety is strongly disease-dependent and inadequately captured by aggregate accuracy metrics. Disease-stratified validation frameworks incorporating guideline concordance and hallucination detection are necessary to inform responsible clinical deployment. These findings support the need for clinician-led governance and disease-specific risk stratification prior to broad adoption of AI decision support in oncology.

Biography:

Dr. Yan Leyfman is a physician-scientist in oncology recognized as a 40 Under 40 Emerging Leader in Cancer at the 2023 ASCO Annual Meeting. During the COVID-19 pandemic, he led the Immunology Division of the Global COVID-19 Taskforce and conducted early translational work on SARS-CoV-2 and cancer interactions, presented at ASCO and published in peer-reviewed journals. His research spans cancer immunology, CAR T-cell therapies, and clinical AI. Dr. Leyfman is co-founder and Executive Director of MedNews Week, advancing global oncology education and equity. He is committed to mentorship, clinical innovation, and responsible translation of emerging technologies in oncology care.

Facebook Twitter XTwitter Youtube
Watsapp