A Journey into Responsible AI Design

Designing Intelligence You Can Trust

Responsible Multi-Agent AI Systems for Real-World Impact

Nicole (Nicki) Florio | Zarin Lokhandwala

Trust

Transparency

Collaboration

The Challenge We All Face

Whether you're an AI expert or just getting started, one question unites us all:

"How do we unlock AI's incredible potential while ensuring accuracy, reliability, and trustworthy outcomes?"

The Reality

It doesn't matter how powerful or impressive an AI model is—if people can't trust the output, they simply won't use it.

The Opportunity

Designing trustworthy AI isn't just an ethical obligation—it's a strategic requirement for adoption, efficiency, and long-term value.

Why Intentional Design Matters

Understanding the risks helps us build better safeguards

Errors at Scale

AI doesn't just make mistakes—it can amplify them. What would be a small human error can become thousands of incorrect outputs within seconds.

Data Reflects Us

AI learns from human-generated data. If that data is biased, incomplete, or skewed, the model will replicate—and often magnify—those patterns.

Not "Set & Forget"

AI is only as good as the instructions we provide. Clear prompting, continuous training, and ongoing review are essential for consistent performance.

Every risk we've discussed can be mitigated with thoughtful, intentional design.

Understanding AI Agents

An AI agent is a system that can perceive, decide, and act toward a goal—with minimal human input

Non-Agentic AI

Standard Chatbot

Reactive—responds only when prompted
Stateless—no persistent memory across sessions
Single-turn—handles one request at a time

VS

Agentic AI

Autonomous Workflow Partner

Proactive—plans and pursues goals across multiple steps
Stateful—maintains context, memory & progress
Autonomous—executes end-to-end workflows
Tool-enabled—uses external tools, APIs & systems

The Difference in Practice

Scenario: A customer's card is charged $2,400 at an electronics store in a city they don't live in

Non-Agentic AI

Answers questions

1 Fraud system flags transaction & freezes card

2 Customer opens chatbot: "Why is my card frozen?"

3 AI responds: "Please call 1-800-555-0123 to verify your identity."

4 Customer calls → 20 min hold → re-explains → agent manually reviews → unfreezes card

5 Customer asks later about the alert — AI has no memory of it

~30–45 min · multiple handoffs · frustrated customer

Agentic AI

Resolves problems

1 Fraud system flags transaction

2 AI pulls customer profile, spending patterns & detects phone GPS is in that city

3 AI sends push notification: "$2,400 at Best Buy in Denver. Was this you?"

4 Customer taps "Yes, that's me" — AI verifies, unfreezes card, updates travel profile

5 Days later, customer asks about it — AI recalls the full interaction

~2 min · one touchpoint · delighted customer

Non-Agentic AI answers. Agentic AI resolves.

Before You Build: Essential Questions

Stop. Think. Design. Then open your systems.

Let's walk through this together — imagine we're designing a multi-agent fraud detection system for a bank.

1

What is the core purpose of this agent?

Our Fraud Detection Agent has one job: monitor every incoming transaction in real time and flag anything suspicious before it impacts the customer.

2

What should it own and be responsible for?

It owns risk scoring — pulling transaction history, comparing spending patterns, and assigning a confidence-rated risk level. It does NOT own the decision to freeze an account.

3

What decisions can and should it make autonomously?

Auto-approve low-risk transactions that match known patterns. Auto-block purchases from confirmed fraudulent merchants. But gray-zone cases? Those need a second opinion.

4

Where should it stop and escalate to a human?

Transactions over $5,000, first-time international purchases, or sudden behavioral shifts — the agent packages full context and routes to a human fraud analyst.

Agent Handles

Scoring every transaction against historical patterns

Auto-approving low-risk transactions in real time

Sending push notifications for customer verification

Logging every decision with a full audit trail

Human Reviews

High-value flagged transactions ($5,000+)

Customer disputes & edge-case appeals

Updating fraud detection rules & thresholds

Final account freeze / unfreeze decisions

Multi-Agent Workflow in Action

See how agents collaborate in our fraud detection system — from transaction to resolution

Real-Time Data Sources

Transaction Stream $2,400 · Best Buy · Denver

Spending History Avg electronics: $180/mo

AI Agents

Ingestion Agent Captures Transaction

Enrichment Agent Adds Context

Enriched Output

$2,400 Best Buy

GPS: Denver

13x avg spend

3 Transactions Incoming

$45 grocery

$800 online

$2,400 Best Buy

Fraud Detection Agent

Risk Scoring Pattern & History Analysis

Decisions

$45 Approved

$800 Blocked

$2,400 Flagged — Score: 0.62 Escalate to Human Review

Flagged Transaction

$2,400 · Best Buy · Denver Score: 0.62 · 13x avg spend

Human Fraud Analyst

Reviews Context GPS matches · Customer traveling

Resolution

Card Unfrozen

"Was this you?"

Audit Logged

Transaction Ingestion

Capture & Enrich in Real Time

Every incoming transaction is captured in real time. The Ingestion Agent pairs it with the customer's spending history and behavioral patterns, then the Enrichment Agent adds context — location, device, merchant category — so the next agent can score it accurately.

Sample Enriched Transaction

$2,400 at Best Buy — Denver, CO

Customer GPS: Denver (traveling)

Avg electronics spend: $180/mo

Validating Your Design

How do you know you built it right? Ask these critical questions.

Auditability

Can you trace how it reached its conclusions?

Reproducibility

Can you repeat the same outcome given the same inputs?

Accuracy

Is it actually producing correct results?

Consistency

Does it perform reliably over time?

The Audit Test

If you can't audit it, you can't trust it.

Implement logging, traceability, and systematic checks. Document how decisions are made so anyone can understand the path from input to output.

The Fairness Check

Optimization without fairness creates risk.

Speed and efficiency mean nothing if outcomes are biased or inequitable. Build fairness checks into your validation process from day one.

Designing with Agentic AI

Four principles to carry into every project

Design First, Build Second

Define purpose, ownership, and boundaries before you write a single line. The tools are ready — your thinking has to be too.

Earn Trust by Default

Auditability, fairness, and clear goals aren't add-ons — they're the foundation. If people can't trust the output, they won't use it.

Humans Stay in the Loop

AI lacks judgment, context, and accountability. Human oversight isn't a limitation — it's what makes the system reliable.

Complexity doesn't equal capability.
Build systems that are understandable, auditable, and worthy of trust.

Now Go Build.

You've got the framework. You've seen the workflow. The only thing left is to start.

Design with intent

Build in trust

Keep humans close

Nicole (Nicki) Florio · Zarin Lokhandwala

Designing Intelligence You Can Trust

The Challenge We All Face

The Reality

The Opportunity

Why Intentional Design Matters

Errors at Scale

Data Reflects Us

Not "Set & Forget"

Understanding AI Agents

Non-Agentic AI

Agentic AI

The Difference in Practice

Non-Agentic AI

Agentic AI

Before You Build: Essential Questions

Agent Handles

Human Reviews

Multi-Agent Workflow in Action

Transaction Ingestion

Sample Enriched Transaction

Risk Scoring

Risk Assessment

Escalation & Resolution

Resolution

Validating Your Design

Auditability

Reproducibility

Accuracy

Consistency

The Audit Test

The Fairness Check

Designing with Agentic AI

Design First, Build Second

Earn Trust by Default

Humans Stay in the Loop

Now Go Build.