There’s a pattern I keep noticing in the community. Someone starts working heavily with LLMs - and gradually drifts off track. They build themselves a complex system of agents with roles, slash commands, swarms, orchestrators, step-by-step workflows, “plan mode” etc. Peter Steinberger nailed it with the term “Claude-pilled” - the person gets so deep into the tooling around Claude that they start believing: the more complex the system, the better the result.

I don’t think that’s true. And the further I go, the more convinced I am of the opposite.

What a “Claude-pilled” workflow looks like

One popular example is the “gstack” from Garry Tan, CEO of YCombinator, which has been widely mocked on Twitter.

Tons of slash commands. Agents with roles. Prompts for those agents that were apparently auto-generated by Claude a few months ago and haven’t really been revisited since. The instructions are full of noise, contradictions, and obvious statements that just get in the model’s way. Tasks are decomposed down to the level of “which package to put a function in” and “which line to declare an interface on.”

I get where it comes from. Claude with its plugins, agents, and structured workflows practically nudges you toward this. And at some point you think: since I’ve already set up the agent - might as well have it follow strict rules. Step by step. With clear instructions for every move.

The problem is that this doesn’t help the model. It gets in the way.

Why micromanaging agents is a step backward

Frontier models - Opus, GPT-5.5, etc. - can make decisions on their own. They can write 8,000 lines of code in one shot, decide in the moment which package something belongs in based on the specs and project context. They don’t need you to spoon-feed every step.

When you create a rigid workflow with rules that apply to all tasks the same way - you’re stripping the model of the flexibility that makes it useful. Real tasks always contain something unplanned. Something that surfaces mid-process. Something the user left unsaid. A rigid agent system can’t adapt to that - it will execute strictly to the instructions. And now you’ve got non-compiling code in the repo, five QA cycles after that, the task gets rolled back - and you’ve wasted far more time than if you’d given the model a bit more freedom from the start.

Agents don’t need separation of responsibilities. They need unknown variables removed.

How I work

My approach looks much more modest on paper, but works better in practice.

I talk to Builder’s agent like a senior engineer, with myself in the role of product owner. I don’t spell out every step - instead, I spend 5 - 7 minutes discussing the task, write up a single text document (for the agent’s convenience and to preserve context), and then the agent works for 2 - 8 hours and gets it done.

No trillions of slash commands, no three JavaScript specialist agents, no “agent swarms.” No decomposing down to the level of individual lines of code. No “plan mode”, only planning.

Sure, the first draft can be rough - the prototype might not be perfect. But 4 out of 5 times my problems come from a bad spec, not “didn’t know which package to put the interface in” - meaning the problem is me. Those who work with a strict workflow get a more predictable first result, but spend 30+ minutes planning instead of 5.

On tokens - because it matters

Complex workflows burn through a huge number of tokens. Claude Code with its Markdown planning, GitHub tickets, subagents, reading the same files 10 times on every run - that’s all tokens, and not a small number.

In my Builder setup, context is kept in a single session. Compact is configured to carry over product-level information - decisions, intentions, task context - rather than low-level code details. That’s dramatically more token-efficient.

In theory, with a well-defined multi-agent workflow, simpler phases could be handled by cheaper models. But the problem is that transferring context between sessions and compressing it correctly is a problem the AI agent world still hasn’t properly solved - so I don’t see the point in fighting the model over it.

Why all of this is a relic of 2025

All the context management strategies currently used in harnesses like Claude Code exist for a specific reason: older versions of Opus would lose around 40 IQ points simply because the context filled up. That’s where compacts, tickets, and step-by-step planning came from - attempts to work around model limitations at the tooling level. Models from gpt-5.5 onward don’t need this.

I think this should be solved at the model level: proper 300K - 1M token context windows, solid native compacts, less sensitivity to context going stale. Not layer after layer of crutches in the harness.

It’s a shame to see people paying serious money, dealing with caching bugs, and building increasingly complex systems on top of fundamental model limitations - when the whole issue is that their approach still carries baggage from a year ago.

What’s next

I’m currently going to try codifying a more structured approach in Builder - attempting to take the best of both worlds: the speed and simplicity of my current approach, plus some of the predictability of a more defined workflow. Whether it works out - we’ll see.

The main point I want to get across: if you’re building a complex agent system, stop and ask yourself - does the task actually need this, or have you just gotten caught up in the tooling? Frontier models in 2026 are smarter than the systems people are trying to pack them into.