Premise
I've been thinking a lot about what it means to scale a designer's time when agents are doing most of the building.
It started with a simple frustration. Our team was shipping fast with AI agents, and the UI kept coming back inconsistent. Not broken, just... off. Wrong surface colours. Components used in the wrong context. Patterns that existed in our system getting ignored in favour of whatever the agent had seen most often in its training data. And I kept having to catch it in review.
The thing I realised is that agents don't know the design system we use internally. Not because it's poorly documented. Because it was never built to be read by machines. The docs were written for engineers who already have context. The components live in a library that an agent can't browse. And unlike something like shadcn, there's no public corpus of examples for it to have learned from. The agents were essentially working blind every time they touched our UI.
First pass
So I started experimenting with design.md, a spec for writing design documentation in a format agents can actually consume. My first instinct was to write one big file. The logic felt sound at the time: capture everything in one place, make it the source of truth, give agents one document to read.
I got pretty deep into it before the problem became obvious. The moment you try to make a markdown file the source of truth for your entire design system, you've just created another thing to maintain. Every time a component changes, every time a token gets updated, every time a pattern evolves, that file needs to change too. And it won't. Not consistently. Not across a whole team moving fast.
Breaking it down by component felt like a better direction: /components/table/design.md, /components/sidebar/design.md. Smaller surface area, easier to keep current, closer to the thing it's describing. That part was genuinely interesting. Rethinking the design language through the lens of how an agent reads it is a different exercise than writing docs for engineers.
Review signal
GUI Design Review - PR #2811
Score: 2 / 5
This PR introduces significant new UI components, but is missing CSS definitions for the majority of new visual elements. The code architecture is sound, but the visual layer is incomplete.
Critical: Missing CSS Definitions
| Class | Used in | Purpose |
|---|---|---|
.settings-skills-tab-bar |
SkillsTab | Tab container |
.settings-skills-tab-badge |
SkillsTab | Count badge inside tabs |
.settings-remote-source-card |
RemoteSourceCard | Card wrapper for remote sources |
The useful signal was not just that the PR had issues. It was that the critique became specific enough to act on: missing CSS definitions, variant drift, and unstyled states.
What changed
But then a colleague pointed out a problem I hadn't fully sat with yet. The moment you put token values and component specs in a markdown file, you have two sources of truth. The component library changes. The markdown doesn't. Agents start working from stale data. That might actually be worse than working blind.
So I ran two approaches in parallel. One was more of a steering doc: three files covering what to avoid, an on-demand review skill you could load when auditing a diff. Gotcha guides rather than blueprints. The second followed the design.md spec more faithfully, YAML tokens as authoritative values, full component coverage. After testing both I landed somewhere I didn't expect: the second direction isn't right. At least not in the way I was thinking about it.
Review loop
7d910ef
Added the missing origin badge variant.
This is the difference between broad taste feedback and portable judgement. The review named the visual problem, explained the system expectation, and suggested a concrete repair.
Two layers
What I think actually works is two layers.
The markdown captures judgement. Why this pattern and not that one. When to use a sheet versus a dialogue. What "own the polish" actually means in practice on our team. The stuff that can't be inferred from looking at the component source. Slow-changing, principle-driven, written by a designer. I kept it strict: no hex values, no token definitions, nothing that stales. Just the why.
The structural layer is something else entirely. In our case it's an MCP server with live access to our component library, current schemas, actual prop values. Agents query it at inference time rather than reading a markdown copy of it. The drift problem mostly disappears because they're reading the real thing.
Together those two layers are what makes our existing design system usable by agents without rebuilding it from scratch. We're not replacing what we have. We're adding an interface it was never designed to have.
Leverage
And the thing I keep coming back to is this: I'm not trying to review more PRs. I'm trying to write my judgement down once in a form agents can carry forward. That's a different kind of leverage. And I think designers who figure it out early are going to have their taste show up in a lot more places than designers who don't.
[DESIGN] medium -
.settings-badge.originhas no CSS variant defined. This will render as the unstyled base badge.Consider either reusing the info badge color or adding:
This should work correctly in both dark and light mode because
--gui-linkis theme-aware.