Log in or create a free Rosenverse account to watch this video.
Log in Create free account100s of community videos are available to free members. Conference talks are generally available to Gold members.
Summary
The secret ingredient for impactful AI products is “evals”—an architecture for ongoing evaluation of quality. Without evals, you don’t know if your output is good. You don’t know when you’re done. Because outputs are non-deterministic, it’s very hard to figure out if you are creating real value for your users, and when something goes wrong, it’s really tricky to figure out why. Simply Put’s Peter van Dijck will demystify evals, and share a simple framework for planning for and building useful evals, from qualitative user research to automated evals using LLMs as a judge.
Key Insights
-
•
AI products require a systematic framework for evaluation, involving three layers: model, context, experience.
-
•
Automated eval systems are essential for efficiently testing AI quality with open-ended inputs and outputs.
-
•
Iterative testing can be very time-consuming; thus, automation helps accelerate the process without sacrificing reliability.
-
•
Defining 'what good looks like' is critical to the success of AI systems and requires ongoing refinement.
-
•
Domain expertise plays a significant role in creating effective eval systems and datasets.
Notable Quotes
"Eval systems help scale the testing of AI product quality."
"The challenge is how to evaluate when inputs and outputs are open-ended."
"You need a semi-automated system that can help you test any change you make."
"It's crazy to retest everything every time you make a change; we need faster iteration processes."
"What you're trying to do is define what is good for your system."
















More Videos

"Learning should be an experience tailored around the learner, not the curriculum."
Kristin SkinnerFive Years of DesignOps
September 29, 2021

"Investing in building trust is crucial before expecting citizens to trust digital services."
Yalenka Mariën Marie MervaillieDesigning for Digital Inclusion in the Belgian Government
December 8, 2021

"About 90% of all the work that I did for booking.com ended up in the garbage can."
Erin WeigelUX Lessons from running more than 1,200 A/B Tests
July 10, 2024

"Stripe Climate allows businesses to direct a fraction of their revenue to carbon removal solutions."
Marissa Cui Rachel He Michael Leggett Manos SaratsisClimate Design Product Showcase
March 13, 2024

"Healthcare is often messy, often unimpathic."
Robert SchwartzWe're Here for the Humans
June 9, 2017

"Don’t weaponize your product; think about the implications of your design decisions."
Mariah HayEthics in Tech Education: Designing to Provide Opportunity for All
June 14, 2018

"We really wanted to understand the value that our teammates would find in having a role like this."
Kristin SkinnerTheme 1 Intro
September 29, 2021

"We have to understand that dynamic in order to improve security outcomes."
Heidi TrostTo Protect People, You Have to Protect Information: A Human-Centered Design Approach to Cybersecurity
January 23, 2025

"Having an equitable curation process is our foremost concern."
Rachael Dietkus, LCSW Victor Udoewa Jennifer StricklandEverything You Need to Know about the Civic Design 2022 Call for Presentations
May 17, 2022