From Insight to Implementation: AI-Native Product and Design

The most interesting thing about building with AI agents was not the code they wrote. It was the product work they did before a single line was written.

The workflow below was built in partnership with Claude Code. This is a snapshot as it existed in February 2026. By March it had already evolved, so consider this a record, not a prescription.

What Is EV Charger Management?

Our customers are in the Nordics, where energy costs are high and peak power tariffs make smart charging a real concern. Our product helps homeowners manage solar panels, batteries, and EV chargers. The charger feature lets you set preferences: when to charge, how much, and whether to prioritize solar or the cheapest grid price.

The first version shipped quickly, and user feedback showed clear opportunities to improve. That gave me the chance to re-imagine the workflow end to end, from the product manager and designer perspective, using an AI-native approach from the start.

Starting From What a PM Would Actually Do

I did not start with “what can AI agents do?” I started with “what would a good product manager do if they had unlimited time?”

Step 1

Research

Step 2

Synthesize

Step 3

Define

Step 4

Design

Designer

Step 5

Build

Engineer

Step 6

Verify

The insight was not “use AI to skip steps.” It was “use AI to do every step thoroughly.” Most teams skip research because it takes too long. Edge cases go undocumented because our brains do not think like machines. We miss the combinations, the failure modes, the states nobody clicked through. The same is true for designers working in Figma: it is difficult to create a static flow for every single edge case. AI agents change the economics of thoroughness.

The Workflow Overview

Here is the pipeline I built. Each stage has a dedicated AI agent, and every transition has a human gate, a point where I read everything, question the output, and decide whether to proceed.

Step 1

User Research

PM Agent

Step 2

Data Analysis

PM Agent

Step 3

Decisions

PM Agent

Step 4

Use Cases

PM Agent

Step 5

UX Design & Flow Specs

PM Agent

Step 6

FE Engineer Builds

FE Agent

Step 7

Review & QA

QA Agent

Human gate between every step

No code is written until step six. Steps one through five are pure product work.

Step 1-2: User Research and Data Analysis

The product manager agent runs five research tracks in parallel:

 .claude/agents/product-manager.md  
 ---
name: product-manager
description: Orchestrates product research and specification.
Runs data analysis, code audit of existing system, bug
collection, then synthesizes decisions, edge cases, and
flow specs (product half). Hands off to frontend-engineer
agent for UI design, with BFF design pattern.
model: opus
tools: Read, Grep, Glob, Bash, Write, Edit, Task, WebFetch
skills:
- add-product-feature
- add-flow
--- 

UX Research. Analyzing user surveys and drafting usability test plans.
Data Analytics. Running BigQuery and SQL queries against production data to understand actual usage patterns.
Bug Reports. Collecting field feedback, support tickets, and known issues.
Code Audit. Examining the existing implementation for unhandled states and tech debt.
Competitor Analysis. Building a feature matrix across competing products.

The add-product-feature skill scaffolds the entire research structure with a single command:

 .claude/skills/add-product-feature/SKILL.md  
 /add-product-feature ev-charger

docs/product/ev-charger/
├── decisions.md                  # Data-backed product decisions
├── scenarios.md                  # Edge case catalog
├── open-questions.md
├── ux-flows/                     # Per-screen specs (via /add-flow)
├── research/
│   ├── data-analytics/           # BigQuery + SQL queries and takeaways
│   ├── bug-reports/              # Field feedback and issues
│   ├── ux-research/              # Survey findings
│   ├── competitor-analysis/      # Feature matrix
│   └── code-audit.md             # Codebase audit
 

For the EV charger, this surfaced patterns that reshaped our thinking. Most users never configured their charging options at all. They stayed on whatever the default was. The PM agent cross-referenced this with Nordic winter conditions and limited solar production, which explained the behavior.

Human Gate

I read every research output. Not to rubber-stamp it, but to build my own understanding. I ask follow-up questions, dig into surprising data points, push for root causes.

The agent collects and structures; I develop judgment about what matters. Our local product manager pointed out that in Swedish winters, a car needs about thirty minutes to precondition the cabin. If a user sets “ready by 7:00 AM,” the car must be fully charged by 6:30, because preconditioning draws from the grid, not the battery. No AI agent would surface this on its own.

Step 3-4: Decisions and Use Cases

The product manager agent takes the research and produces two artifacts.

Decisions. Structured trade-offs: what we considered, what the data said, and why we chose what we chose. For example, when a user plugs in their car, the smart scheduler waits for a lower price window before charging. From the user’s perspective, nothing is happening. The AI proposed surfacing the charging plan on the charge options page (“Charging starts at 21:45 at best price”) and a compact homepage widget (“Waiting for lower price”) so the user knows the system is working, not broken.

 docs/product/ev-charger/decisions.md  
 ## D9: Show charging plan summary

Data: Field feedback — after saving charge preferences,
there is no summary of what the system will do or when.
The scheduler optimizes silently; the UI shows nothing.
Decision: Show plain-language plan on the charge options
page and a compact widget on the homepage.
Rationale: Users set preferences but have no confirmation
of the resulting plan. Without feedback, they don't know
when charging will start or if anything is happening. 

Use cases and edge cases. Every scenario catalogued with expected behavior. Take guest charging: a friend visits and needs to plug in. They have no account, you do not know their battery size, and the homeowner may not want them charging during peak tariff hours.

The AI was surprisingly good here. It reasoned through the tension: the visitor just wants juice, but the homeowner is paying the bill. So guest mode respects the host’s smart scheduling by default, but “Charge Now” is always available when someone genuinely needs to leave. The agent figured out on its own which modes to expose and which to restrict.

 docs/product/ev-charger/scenarios.md  
 ## EC2.1: Friend/visitor needs to charge their EV

When: A non-household EV needs to charge. Friends
visiting, family staying over, Airbnb guests.
User intent: "Just charge this car now" — but the host
paying the bill may feel differently.

Degradation:
- "Charge Now" works without any configuration
- No car registration required. No SOC input needed
- Show charging status (power flowing) but not time
  estimates or SOC (we don't know the battery)
- Smart scheduling still respects the host's tariff
  preferences — no peak-hour surprises on the bill
- When guest unplugs, revert to household defaults

Key principle: Guest charging = zero-config.
One tap to start, unplug to stop. 

Human Gate

AI proposes, human approves. Part of the job here is deciding which scenarios are in scope and which are not. The agent will happily enumerate fifty edge cases, but shipping means choosing which ones matter now and which can wait. Without that filter, thoroughness becomes scope creep.

Step 5: UX Design and Flow Specs

The PM agent writes per-screen specifications: every state, every field, every interaction, every transition. The frontend engineer reads these, not a Figma file.

 docs/product/ev-charger/ux-flows/add-car.md  
 # Add Car

Decisions referenced: D1, D5, D8
Edge cases referenced: EC1.1, EC1.2, EC1.3, EC1.5

## User Journey (Happy Path)

1. User reaches "Which car charges here?"
2. Brand dropdown — user selects "Tesla"
3. Model dropdown appears — user selects "Model 3"
4. If multiple variants, version dropdown appears
 ("Standard Range Plus — 60 kWh", "Long Range — 75 kWh")
 Single variant auto-selects.
5. Battery hint fades in. Name input pre-filled from model.
6. System registers car, assigns to charger, returns. 

Step 6: The Build

This is where code finally gets written. It is the boring part. All the guardrails and constraints are already in place from steps one through five. Now we are just seeing if the machine can spit it all out.

 EV charger feature — generated structure  
 contracts/openapi/
└── ev-charger.yaml              # API contract (source of truth)

shared/schemas/
├── generated/ev-charger.ts      # Zod schemas from OpenAPI
└── ev-charger.ts                # Re-export bridge

shared/hooks/
├── useChargeOptions.ts          # GET + PATCH + POST
├── useAddCar.ts                 # Car registration
├── ...                          # Other feature hooks
└── *.test.ts                    # Every hook has tests

web/pages/
├── ChargeOptionsPage.tsx        # Main charging screen
├── AddCarPage.tsx               # Car registration flow
├── ...                          # Other feature pages
└── *.test.tsx                   # Every page has tests

server/mock-data/
└── ev-charger.js                # Stateful mock BFF 

The mock BFF server is stateful. You can add a car, assign it to a charger, save charging preferences, start a charge session. The frontend runs end-to-end against these mocks. When the backend team builds the real API, they build to the same OpenAPI contract. The frontend does not change.

The QA engineer agent runs alongside the build. It verifies that the BFF contract stays in sync as the UI evolves: schemas match the OpenAPI spec, mock data passes validation, hooks test every response and request shape. If someone changes a page and the contract drifts, the QA agent catches it.

This is where people ask: “Where is the designer?” There is no designer agent in the pipeline. We have a separate design system with a review agent that audits component usage, token compliance, and accessibility. The design system encodes the decisions a designer would make: tokens, components, spacing rules, theme separation.

 design-system/.claude/skills/audit-component/SKILL.md  
 /audit-component Button

Checks:
1. Hardcoded values — colors, spacing, shadows, opacity
2. Theme separation — base structural only, visual in
 light/dark blocks
3. Token usage — semantic for theme-aware, primitive
 for static
4. Accessibility — :focus-visible, ARIA, touch targets
5. Showcase docs — CSS popovers match actual CSS 

This works well within constraints. It does not work for taste.

Human Gate

This is the most hands-on gate. The UI comes out structurally correct but visually off. The design system is still young, and AI models do not fully understand visual context. I spend real time adjusting visual hierarchy, spacing rhythm, and the small details that make a UI feel considered rather than assembled. Whether you need a dedicated designer agent depends on the project. More on that in a future article.

Step 7: Review and QA

This is the product manager running the final product review. One skill spins up nine sub-agents in parallel, each cross-referencing the code against a different product spec:

 .claude/skills/product-impl-audit/SKILL.md  
 /product-impl-audit ev-charger

# 9 audits run in parallel

Product:
1. Design System — tokens, spacing, typography
2. Decisions — every D1-D15 checked against code
3. Scenarios — every EC1-EC12 verified

Engineering:
4. BFF Contract — OpenAPI → Zod → mock → hook
5. Visual Review — Playwright screenshots at 390×844
6. QA Engineering — hook tests + page tests

Quality:
7. V1 Lessons — specific checks from past mistakes
8. Reasoning Docs — WHY comments for handoff
9. React Health — anti-patterns, dead code

Rule: Every audit runs fresh. Never carry over results
from a previous session. 

Human Gate

Even with nine parallel sub-agents auditing, I am still tapping every screen myself. The audits verify structure, but they would not catch the charging plan summary showing stale data because the scheduler has not responded yet. That kind of thing only surfaces when you use the product the way a real person would, not the way a test script would.

The Agent Team

Four agents in Home AI, one in the design system, and a handful of skills each. I am the orchestrator. The agents are the orchestra.

Home AI Agents

product-manager
frontend-engineer
design-reviewer
qa-engineer

Home AI Skills

/add-product-feature
/add-flow
/product-impl-audit
/react-doctor
/polish

Design System

designer agent
/component
/audit-component
/add-token
/add-icon
/bump-version
/thiink-create

Performance Review

This was the first time me and Claude worked together on a full product and design cycle, so I asked it to write a performance review. The kind you would give a colleague after a project. It was oddly honest. And the best part: no bonus negotiation afterward.

On what it could do better:

 self-review.md — What I Didn't Do Well  
 Fixing symptoms without understanding the full flow.

When you point me at a bug, I fix the immediate symptom
rather than tracing the complete data flow. Locally correct
changes that broke something downstream because I didn't
model the full state machine across files.

Going forward: Before touching code, write down the complete
data flow — endpoints, state reads/writes, consuming
components, user-visible result. The 3 minutes this takes
would have saved hours. 

Of course, like any good manager, I wanted to hear feedback on what I could have done better:

 self-review.md — Feedback for Vivian  
 Tell me the *intent*, not just the *symptom*. When you
report a bug, you describe what you see ("the banner is
still there") but not always what you expected or why it
matters. The "why" helps me prioritize which fix approach
to take.

Share what "good" looks like before I build. When you say
"build the charge options page from the flow spec," I
produce a functional page. But I don't know you envisioned
the SOC ring as the hero element, or that the schedule
timeline should feel like a Gantt chart. A 2-sentence
design intent at the start would dramatically improve
first-draft quality. 

Learnings

Product and design will never be as autonomous as pure engineering code. There are too many judgment calls. You cannot automate “should this feel urgent or calm?” the way you can automate “does this schema match the contract?” But there is a spectrum, and right now my human gates are binary: agent produces, I review.

Backend engineering solved this problem decades ago with CI/CD: verify continuously, not at the end. Product and design work needs the same evolution. Right now my verification happens at the gates. What I want is verification that runs continuously as agents work, catching drift before it compounds. Make the loop tighter and more autonomous.

Taste is the frontier that verification alone will not solve. You can write a test for “does this token match the spec.” You cannot yet write a test for “does this screen feel right.” But every time I reject something visual, I am learning to capture why, turning gut reactions into constraints the agent can follow next time.

Agent-to-agent communication is the last manual bottleneck. When the frontend needs a new component variant, I still switch contexts and build it myself. I am experimenting with letting agents coordinate directly. That would close the gap.

AI agents change the economics of thoroughness. The question is no longer whether you can afford to do the work properly. It is whether you can build the system that does.

This article was written by Vivian and Claude through Wispr Flow voice dictation and Claude Code remote control, just as how we collaborate on product and design.