design.md vs Design Tokens for AI UI Workflows

I’m Dora. I spend most of my week inside coding agents and AI UI tools — Cursor, Claude Code, Stitch, the usual lineup — building and rebuilding interfaces faster than I have time to document them. About a month ago I started seeing the same file appear in every other repo I touched: DESIGN.md. Same name, same YAML-on-top-prose-on-bottom shape. By the third project I realized it wasn’t a coincidence. It was the thing replacing what most of us used to ship as a tokens.json.

So I rebuilt one of my own component libraries twice — once with a classic DTCG-style token file, once with a DESIGN.md — and ran the same coding agent against both. This is the part of the comparison I couldn’t find written down: not what each format is, but what each one is actually optimizing for, and which one belongs in your stack right now.

design.md vs Traditional Design Tokens

What each format is optimizing for

Design tokens, in the classic sense, are a methodology. The term was coined at Salesforce around 2014 to solve a very specific scaling problem: how do you keep a color decision in sync across web, iOS, Android, and four codebases without filing four tickets? The answer was a platform-agnostic name-value pair, stored in JSON or YAML, transformed at build time into whatever each platform needed. That methodology is now codified by the Design Tokens Community Group at the W3C, and as of late 2025 the DTCG format has a stable v1 specification.

Tokens optimize for deterministic distribution. A hex code goes in, the same hex code comes out on every platform, every build, forever. There’s no ambiguity. There’s also no narrative — a tokens file tells you primary: #1A1C1E but it doesn’t tell you why that color exists or when not to use it.

DESIGN.md, open-sourced by Google Labs in April 2026, optimizes for something different: giving a coding agent enough context to make decisions the token file doesn’t cover. It’s a single markdown file with YAML front matter for tokens and prose below for rationale. Same file, two audiences — the deterministic part for parsers, the narrative part for whatever LLM is reading the repo.

That’s the actual split. Not “old vs new.” Not “JSON vs Markdown.” It’s values vs values plus reasoning in the same file.

Why AI agents create a new requirement set

When a human implements a design, the gap between “the token says #1A1C1E” and “this empty state needs a tone of voice” gets filled by the human. They’ve seen the Figma file. They sat in the brand workshop. They know the secondary button is supposed to feel quiet, not assertive.

A coding agent has none of that. It has whatever you put in the repo and whatever it can infer from filenames. So when you ask it to generate a screen the token file doesn’t fully specify — an edge case, a new component, a layout decision — it either guesses or it defaults to whatever it saw most often in training. That’s the source of the “AI beige” aesthetic everyone complains about: not bad models, just missing context.

This is what DESIGN.md is solving. The official spec on GitHub is explicit about it — tokens give agents exact values, prose tells them why those values exist and how to apply them. The format expects both halves.

Where design.md Adds Value

Persistent narrative context

The thing I noticed in the first 48 hours of testing: the same agent, given the same brief, generates noticeably different output when prose context is present. Not “slightly better colors.” Different layout choices, different copy register, different component density. The token values were identical in both runs — what changed was whether the agent had a paragraph saying “the brand voice is restrained and editorial; favor whitespace over decoration.”

This is the part the traditional token pipeline doesn’t carry. A DTCG JSON file can describe —color-primary precisely, but it can’t tell an agent that the primary color is meant to be used sparingly. DESIGN.md carries that judgment into every generation pass, persistently, without anyone re-typing it into a prompt.

It works.

Better multi-screen consistency for generation workflows

In my second test I generated eight screens for the same app across two days. With tokens-only context, screens 5–8 started drifting — same palette, but the layout language loosened. With DESIGN.md present, the drift was much smaller. Not zero. Smaller.

My read on why: the prose section acts like a re-anchor every time the agent reads the file. Tokens alone give an agent enough to be correct on individual values. The narrative gives it enough to be consistent across decisions the tokens didn’t anticipate. For one-off generation that gap doesn’t matter. For multi-screen output and ongoing iteration, it compounds.

This is also where DESIGN.md plays nicely with the broader agent-instruction stack — most setups now reference it from an AGENTS.md alongside SKILL.md files, so the design system sits in the same context layer as the rest of the agent’s persistent instructions.

Where Traditional Tokens Still Win

Two scenarios, both real.

Cross-platform distribution beyond the web. If you’re shipping the same design system into iOS, Android, a React Native app, and a marketing site, the DTCG pipeline through Style Dictionary or Terrazzo is still the path of least resistance. DESIGN.md’s YAML can export to DTCG JSON via the official @google/design.md CLI, but the source-of-truth question still matters — if your token graph is large, multi-themed, and consumed by non-AI tooling, keeping DTCG as the canonical format is the cleaner setup.

Mature design systems with established governance. Tokens are not just a file format. They’re a methodology with about a decade of accumulated practice — primitive layers, semantic layers, aliasing, theming, the whole taxonomy that Nathan Curtis laid out in Tokens in Design Systems. If your team already operates that way, DESIGN.md doesn’t replace it. It sits on top of it, or alongside it, as a context layer for agents. Tokens stay the canonical source; the markdown becomes the AI-facing translation.

The mistake would be reading DESIGN.md as a replacement for the token pipeline. It isn’t. It’s a different layer with a different consumer.

A Decision Framework for Teams Building AI UI Pipelines

I keep going back to four questions when deciding what to put in a repo:

Who’s reading this file? If the primary consumer is a build pipeline that emits CSS, Swift, and Kotlin, you want tokens in a canonical format. If the primary consumer is a coding agent generating UI on demand, you want DESIGN.md. If it’s both, you keep both — and let the markdown file’s YAML mirror a subset of the tokens.
How often does your UI surface get regenerated? Low-frequency teams (a stable product, occasional new screens) get most of their value from tokens. High-frequency teams (rapid prototyping, agent-driven iteration, new screens every week) feel the missing-context gap acutely. The higher the regeneration frequency, the more the prose layer earns its keep.
How many platforms? Web-only or web-primary with agent-driven generation — DESIGN.md is the simpler stack. Three-plus platforms with serious native presence — tokens-first, with DESIGN.md as a downstream artifact.
Is the rationale already documented somewhere? If your brand guidelines, voice doc, and component philosophy live in a Notion page no agent will ever read, DESIGN.md is the single highest-leverage move you can make this quarter. You’re not creating new documentation — you’re moving existing documentation into a file the agent actually opens.

That’s my framework. Yours might differ. The thing I’d flag: don’t pick a format because it’s new. Pick it because of who’s reading the file.

FAQ

Is design.md a replacement for design tokens?

No. DESIGN.md is a wrapper that contains design tokens (in YAML front matter) plus the rationale around them (in markdown prose). The tokens inside it are still design tokens in the conventional sense. If you already have a DTCG-format token file, DESIGN.md doesn’t replace it — it sits as a parallel artifact for AI agents, or you can have the markdown export DTCG JSON when needed.

Why would AI agents need more than numeric tokens?

Because most UI generation requests aren’t fully specified by the token graph. “Generate a pricing page” requires hundreds of micro-decisions — hierarchy, density, tone, what to emphasize — that no token file covers. Without narrative context, the agent fills those gaps with whatever it saw in training data, which produces the generic look most AI-generated UIs share. Prose in DESIGN.md is what closes that gap.

Which workflows benefit most from design.md?

Three patterns I’ve seen pay off most:

Solo builders and small teams using Cursor, Claude Code, or Stitch to ship UI faster than they can hand-write it.
Design system teams maintaining several internal products where consistency across AI-generated screens is becoming a real problem.
Agencies and contract teams who want a single drop-in file that encodes a client’s design language for any coding agent.

If your workflow is mostly hand-coded with occasional AI assistance, the marginal value drops.

When is classic design-token infrastructure still enough?

When you’re not generating UI with agents, or when your platform reach extends well beyond the web. Native mobile heavy, multi-theme white-label products, mature design ops practices — these still get more from the DTCG ecosystem than from a markdown file. The two aren’t mutually exclusive, but if you have to pick one to invest in, the answer depends on where your generation friction actually is.

Conclusion

The honest version: DESIGN.md is not a paradigm shift. It’s a focused solution to a specific gap — coding agents lacking the rationale that token files don’t carry. For the workflows where that gap is real, the gain is immediate and obvious. For the workflows where it isn’t, traditional tokens still do the job.

I’m two months into using DESIGN.md on every AI-generation project I run. It’s stayed in the workflow, which is the only test I trust. The token files haven’t gone anywhere either — they’re still doing what they’ve always done, just with a sibling file now for the audience that needs more than numbers.

Run it yourself on a project. Two days will tell you more than this article can.

Previous posts：