Data and Evaluation Driven Development of AI Agents

Add to Calendar Apr 17, 2025 Apr 17, 2025 America/Los_Angeles Data and Evaluation Driven Development of AI Agents In this practical, step-by-step session, we’ll dive deep into the data and evaluation-driven development process of AI agents using a compelling real-world scenario. Participants will start by defining clear business objectives and translating these into targeted evaluation metrics critical for agent success. Next, we’ll collaboratively generate and curate high-quality synthetic data designed specifically to meet these business-driven metrics. Using this dataset, attendees will set up structured evaluations, emphasizing real-world performance indicators like accuracy, reliability, and alignment with business KPIs. With our data and metrics firmly established, we’ll move to hands-on development, leveraging popular frameworks to build, evaluate, and refine an actual AI agent live. Participants will gain practical insights into common pitfalls and learn best practices for iterative agent improvement using continuous evaluation feedback. By session end, attendees will have experienced the full lifecycle- planning, data generation, evaluation setup, development, and optimization thereby equipping them with actionable strategies and tangible skills to immediately implement in their own AI agent projects. Workshop 3 | Gateway Pavilion - Pier 2 | 2 Marina Blvd, San Francisco, CA 94123

About this session

In this practical, step-by-step session, we’ll dive deep into the data and evaluation-driven development process of AI agents using a compelling real-world scenario. Participants will start by defining clear business objectives and translating these into targeted evaluation metrics critical for agent success. Next, we’ll collaboratively generate and curate high-quality synthetic data designed specifically to meet these business-driven metrics. Using this dataset, attendees will set up structured evaluations, emphasizing real-world performance indicators like accuracy, reliability, and alignment with business KPIs. With our data and metrics firmly established, we’ll move to hands-on development, leveraging popular frameworks to build, evaluate, and refine an actual AI agent live. Participants will gain practical insights into common pitfalls and learn best practices for iterative agent improvement using continuous evaluation feedback. By session end, attendees will have experienced the full lifecycle- planning, data generation, evaluation setup, development, and optimization thereby equipping them with actionable strategies and tangible skills to immediately implement in their own AI agent projects.

Session Speaker

Nikhil Pareek

Founder & CEO

Future AGI

Access Prerequesites

Access Slides

More sessions

Developer Day

Apr 17, 2025

3:30 pm

Harnessing AI Context for Superior Code Quality

Discover how agentic workflows can revolutionize software development with Qodo AI-powered tools. Learn to utilize context-awareness and automated review workflows to ensure impeccable code integrity. This session will explore real-world applications, focusing on best practice integration and AI-driven insights to optimize your development process.

David Parry

Qodo



More Information

Developer Day

Apr 17, 2025

12:40 pm

The Full Stack of Open Generative Ai

Join an AI expert from Meta for an in depth look at the latest advancements in the open generative AI stack. This session will cover from the metal to the agent how large scale AI systems are built, the tools used and how you can build your own with open source AI from Meta including PyTorch, Triton, Llama and more.

Joe Spisak

Ensuring Responsible AI with Advanced Model Evaluation

This session will cover Sama's comprehensive Model Evaluation services, emphasizing the importance of aligning AI systems with ethical guidelines and improving their performance. Attendees will learn how to systematically assess AI outputs, identify inaccuracies and vulnerabilities, and employ strategies for accelerated time-to-market with robust model validation. Leveraging Sama's expert-driven platform, participants will discover how to build reliable, high-performing generative AI models while adhering to industry standards.

Duncan Curtis

Sama

