Generative AI agents, like ChatGPT and other large language models (LLMs), can produce large and complex outputs when they attempt tough tasks such as scientific data analysis and medical diagnosis.
How do we evaluate these agents? Is one better than another for a given task? Do we trust it? And, importantly, how can we automate agent evaluation to match the pace of AI development, where new LLMs and improved methods arrive every week?
Introducing our new COGnition seminar series. This monthly series will address the challenges of evaluating complicated answers from large language models (LLMs) and explore innovative solutions and strategies to automate agent evaluation to keep pace with AI advancements.
Measuring the performance and trustworthiness of agents is central to unlocking the full potential of generative AI in biomedicine.
Learn more about COGnition.
If you do not want to receive information about this seminar series, click on "Unsubscribe" at the bottom of this email.