Unlock the full potential of generative AI

Generative AI agents, like ChatGPT and other large language models (LLMs), can produce large and complex outputs when they attempt tough tasks such as scientific data analysis and medical diagnosis.

How do we evaluate these agents? Is one better than another for a given task? Do we trust it? And, importantly, how can we automate agent evaluation to match the pace of AI development, where new LLMs and improved methods arrive every week?

Introducing our new COGnition seminar series. This monthly series will address the challenges of evaluating complicated answers from large language models (LLMs) and explore innovative solutions and strategies to automate agent evaluation to keep pace with AI advancements.

Measuring the performance and trustworthiness of agents is central to unlocking the full potential of generative AI in biomedicine.

Learn more about COGnition.

If you do not want to receive information about this seminar series, click on "Unsubscribe" at the bottom of this email.

Anil Palepu (Google Research) presents AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)--based AI system optimized for diagnostic dialogue. Anil will describe AMIE, its remarkable diagnostic accuracy and human interaction skills, and the challenges in evaluating its performance. The extensive testing included 149 case scenarios from clinical providers, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors.

Dexter Pratt (UCSD) will explore issues in developing AI agents that evaluate the behavior of other agents. Agent-based judges will almost certainly have problems with bias and consistency, but can we create judges that are good enough to be useful? He will end with a deliberately provocative proposal for a place to test robot judges: “AI-Rxiv, a publication space for original scholarly work by AI agents”

Department of Research Administration

3855 Health Sciences Drive, La Jolla, CA 92037

moorescancercenter.ucsd.edu

Unsubscribe | Privacy Policy | Submit Feedback

Complex Output Grading of AI in Biomedical Applications

“How do we assess the behavior of AI agents when the question is hard and the answer is complicated?”

SEMINAR DETAILS:

First Thursday of each month

Leichtag Biomedical Research Building & Zoom

10:00 am to 11:00 a.m. PT

UPCOMING SEMINARS:

“Towards Conversational Diagnostic AI – Challenges in evaluating AMIE, an AI agent for diagnostic dialog.”

May 2, 2024 | 10:00 am PT

“Who Judges the Robot Judges?”

June 6, 2024 | 10:00 am PT

Subscribe

About Us