Rosenverse

Log in or create a free Rosenverse account to watch this video.

Log in Create free account

100s of community videos are available to free members. Conference talks are generally available to Gold members.

Building impactful AI products for design and product leaders, Part 2: Evals are your moat
Wednesday, July 23, 2025 • Rosenfeld Community
Share the love for this talk
Building impactful AI products for design and product leaders, Part 2: Evals are your moat
Speakers: Peter Van Dijck
Link:

Summary

The secret ingredient for impactful AI products is “evals”—an architecture for ongoing evaluation of quality. Without evals, you don’t know if your output is good. You don’t know when you’re done. Because outputs are non-deterministic, it’s very hard to figure out if you are creating real value for your users, and when something goes wrong, it’s really tricky to figure out why. Simply Put’s Peter van Dijck will demystify evals, and share a simple framework for planning for and building useful evals, from qualitative user research to automated evals using LLMs as a judge.

Key Insights

  • AI product development involves three layers: model capabilities, context management, and user experience, with evals central to experience quality assurance.

  • Automated evals help scale testing of AI with inherently open-ended inputs and outputs, enabling faster iteration cycles with confidence.

  • LLMs can serve as judges (evaluators) of other LLM outputs, which works because classification is cognitively easier than generation.

  • Defining what 'good' means for an AI system is a detailed, evolving process informed by research, domain expertise, and observed risks.

  • A three-option evaluation (e.g., yes/no/maybe) works better than fine-grained scales for consistent automated scoring by LLMs.

  • Synthetic data, generated by LLMs based on manually created examples, efficiently expands dataset breadth and usefulness.

  • Domain experts are essential for tagging data and establishing quality criteria, especially for high-stakes areas like healthcare or legal.

  • Building effective evals requires substantial effort—expect 20-40% of project resources devoted to this work.

  • Cultural differences impact subjective evals like politeness, requiring localization and careful domain definition.

  • AI product quality management is a strategic ongoing commitment, extending beyond initial development into production monitoring and iteration.

Notable Quotes

"AI products almost always have both open-ended inputs and outputs, which makes testing really hard."

"You have to build a detailed definition of what is good for my system to do meaningful automated evals."

"It’s much easier to classify an answer than to generate an answer, and that’s why LLM as a judge works."

"You don’t want to give too many options like rating from one to ten because consistency gets lost between different LLM calls."

"Synthetic data is useful because it’s easier to generate more examples of something you already have than to create entirely new data."

"If you launch in the US and politeness is an issue, first try to fix it with prompts; only if that fails should you build an eval."

"Evals are really your intellectual property—they define what good looks like in your domain."

"Domain experts are crucial for tagging data because users might say ‘that’s great,’ but experts can tell it’s totally wrong."

"You should plan 20 to 40 percent of your project budget on evals—it’s a lot more work than most people expect."

"This is where UX and product strategy bring huge value—defining what good means rather than leaving it to engineers alone."

Jack Moffett
UX Metrics That Matter and The Future of our Design at Scale Conference: A Community Conversation
2022 • Enterprise Community
Bas Raijmakers, PhD (RCA)
What Design Research can Learn from Documentary Filmmaking
2022 • Advancing Research 2022
Gold
Melissa Eggleston
Practical People Skills for Building Trust on Teams and with Partners
2021 • Civic Design 2021
Gold
Tom Armitage
Day 2 Panel: Looking ahead: Designing with AI in 2026
2025 • Designing with AI 2025
Conference
Alfred Kahn
A Seat at the Table: Making Your Team a Strategic Partner
2023 • Design in Product 2023
Gold
Mark Interrante
Collaboration Flows in Product Development
2017 • Enterprise Experience 2017
Gold
Bria Alexander
Welcome
2022 • DesignOps Summit 2022
Gold
Etienne Fang
Power of Insights: Why sharing is better than silos with Uber’s Insights Platform
2019 • Advancing Research Community
Zariah Cameron
ReDesigning Wellbeing for Equitable Care in the Workplace
2024 • DesignOps Summit 2024
Gold
Kristin Sundermeyer
Design Ops Metrics
2021 • DesignOps Summit 2021
Gold
Prayag Narula
Dialing for Research: How to Reach the Unreachable
2022 • Advancing Research 2022
Gold
Sam Proulx
SUS: A System Unusable for Twenty Percent of the Population
2021 • Civic Design 2021
Gold
Katie Hansen
Finding the unknown in the known: Harnessing meta-analysis and literature review
2025 • Advancing Research 2025
Gold
Laura Weiss
There is No Playbook: Leader as Coach During Challenging Times
2024 • Rosenfeld Community
Dr Chloe Sharp
Using Evidence and Collaboration for Setting and Defending Priorities
2023 • Design in Product 2023
Gold
Denise Jacobs
Interactive Keynote: Social Change by Design
2024 • Enterprise Experience 2020
Gold

More Videos

Abbey Smalley

"The future is already here, it’s just not evenly distributed – Dave Maloof, highlighting uneven UX maturity and design ops adoption."

Abbey Smalley

Today’s Design Ops and Programs Landscape & Career Paths

October 4, 2023

Benjamin Real

"We finally instead of using the compass and the map we finally were able to build a GPS."

Benjamin Real

Maturity Models: A Core Tool for Creating a DesignOps Strategy

October 1, 2021

Ned Dwyer

"Non-researchers running their own research can produce mediocre results if they lack operational support and guidance."

Ned Dwyer Emily Stewart James Wallis

The Intersection of Design and ResearchOps

September 24, 2024

Lisanne Norman

"You constantly perform calculus every day figuring out where you can be your authentic self."

Lisanne Norman

Why I Left Research

March 27, 2023

Molly Fargotstein

"UX research marketing is the strategy behind and implementation of intentional effective promotion and communication of who UX research is and what UX research does."

Molly Fargotstein

Multipurpose Communication & UX Research Marketing

September 12, 2019

Christopher Geison

"Removing research from underneath UX has increased our ability to influence at Salesforce."

Christopher Geison

Theme 1 Intro

March 25, 2024

Heidi Trost

"Onboarding is often fleeting, so influencing security behavior there has an outsized impact."

Heidi Trost

To Protect People, You Have to Protect Information: A Human-Centered Design Approach to Cybersecurity

January 23, 2025

JD Buckley

"Fight to communicate your passion in a way that is digestible for other people."

JD Buckley Margot Dear Jim Kalbach Janaki Kumar

COMMUNICATE: Discussion

June 14, 2018

Steve Chaparro

"Don’t tell government teams you’re using design thinking—use their language and let them experience it first."

Steve Chaparro

Bringing Into Alignment Brand, Culture and Space

August 13, 2020