Generative AI models are rapidly entering clinical practice, promising to transform healthcare by reducing administrative burden and improving efficiency. But there’s a problem: once these models go live, we rarely know how well they continue to perform—or whether they remain safe, ethical, and trustworthy over time. Therefore, the REAiHL Lab team from Erasmus MC presented a compelling challenge during the SAS Benelux HackSprint at the KNVB Campus in Zeist: How can we monitor and evaluate Large Language Models (LLMs) to ensure their safe, ethical, and effective use in clinical settings?
The team developed the Generative AI Control Center — a hospital-wide framework and dashboard that continuously monitors and validates Large Language Models (LLMs) used in clinical settings, such as ambient AI scribes that capture and summarize doctor–patient conversations.
Four Dimensions of Evaluation
To ensure a holistic and responsible approach, the “Generative AI Control Center” dashboard evaluates LLMs across four key domains:
Building the Dashboard
Using SAS Viya technology, the prototype dashboard combines transcript data, AI-generated summaries, and survey feedback from clinicians. A key innovation is the “LLM-as-a-judge” module, which automatically assesses summary quality. By combining this with sentiment analysis, the dashboard computes both performance and ethical metrics.
This intuitive, interactive dashboard gives evaluators and clinicians an at-a-glance overview of each model’s strengths, risks, and trade-offs—making AI evaluation not only transparent but actionable. The team is also exploring an explanatory LLM that can interpret and contextualize results for deeper insights.
Looking Ahead
Next steps include:
Long-term goals involve scaling the framework beyond AI scribes and establishing a maintenance team to oversee LLM implementations across hospital departments.
Why It Matters
Without robust monitoring, even promising AI tools may never reach the bedside. By uniting expertise in medical engineering, data science, and AI ethics, and drawing on experience building ICU dashboards with SAS software, the REAiHL Lab team excels at turning complex data into actionable insights for healthcare professionals. Their approach bridges the gap between innovation and implementation, ensuring that AI in healthcare is not only effective, but also ethical, sustainable, and trusted.
About the REAiHL Lab
The REAiHL Lab is a joint initiative of Erasmus MC, TU Delft, and SAS, united by a single mission: conduct research, design and implement AI systems, and translate the World Health Organization (WHO) ethical principles into clinically (and technically) feasible principles that can guide the development and deployment of AI technologies in healthcare.
By combining technical depth with ethical awareness, the REAiHL Lab aims to make AI in healthcare transparent, trustworthy, and truly usable at the bedside.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.