Skip to content
The Stormcloud
Go back

AI in Software - Argue With It in Design, Direct It in Implementation

I started out a pessimist about AI in software development. The pitch was that everyone would get lazy, that the field would lose the muscle it spent forty years building, that “tool” was a polite word for crutch.

I’m not exactly there anymore.

A lot of what I worried about will still happen. Some of it is happening. But the line I thought AI would draw - between people who use it and people who don’t - isn’t the line I see now. The line I see now sits one layer deeper, between people who respect the fundamentals and people who skip them.

The crutch failure mode

I sat in on a code review recently for some internal systems a coworker had built. The code worked. The pieces fit together well enough on a first read. But when I started asking why - why this pattern, why this layering, why this trade-off rather than the obvious other one - the answers didn’t track. He could tell me what the code did. He couldn’t tell me why the choices had been made.

That’s the failure mode I worry about. Not the tool. The handoff of judgment to the tool, and the implications of those choices walking out the door without anyone holding them.

Code that you can’t defend in a review is code you can’t maintain. You can ship it once. The second time something goes wrong with it, you’re reading your own codebase like a stranger.

The new line

The split isn’t AI users versus non-AI users. The vibe-coders who skip the fundamentals were going to skip them anyway - AI just made it cheaper and faster to ship things they don’t understand. The senior engineers who already had judgment are using AI to move faster through work that doesn’t need their full attention, so they can spend more of it on the parts that do.

What’s actually changed is the floor and the ceiling. The floor for shippable-looking code dropped through the basement. The ceiling for what a single engineer with a real grasp of the fundamentals can produce went up. Whether the field nets out positive depends on whether you think the rising ceiling matters more than the falling floor.

What responsible use looks like for me

One thing upfront, because it matters where I’m writing from: I use these tools a lot. Over the past year or so I’ve put what I’d guess is roughly a hundred hours into building a set of Claude Skills covering my own workflow for design, implementation, and debugging. The mechanics of that setup are their own post and I’ll get to it eventually. What follows isn’t speculation, it’s where I’ve landed after running these tools through real work, not after reading about them.

Where I’ve landed is that the right way to use these tools looks like two completely different things depending on what part of the work you’re doing.

Design - argue with it

In design I want the LLM hostile. I want it to push back. I want it aggressively using search tools to find counter-evidence to my own theory and come back to tell me where I’m wrong.

The goal in design isn’t to ship code. It isn’t even to feel good about your idea. The goal is to find a better idea than the one you walked in with, and the only way to do that is to have the idea attacked from angles you didn’t think of. The LLM is patient in a way no human reviewer is. It will keep looking for holes for as long as you keep asking it to.

So I treat it as an aggressive sparring partner. I tell it what I’m thinking. I tell it to find the weak spots. I either defend the position or revise it. Then I tell it to do the whole thing again with the new shape. The aim is something like a global maximum instead of the first local one I would have stopped at.

LLMMeLLMMe...until both are out of ammoHere is the design. Why is it wrong?search for counter-evidenceIt breaks under X. Have you considered Y?Y doesn't apply because Z. But X is fair - here is a revision.search againStill concerns about W.

The whole point of the loop is friction. If the LLM is agreeing with you, you didn’t ask the right question.

Implementation - direct it

Implementation looks nothing like that. By the time I’m writing code, the design is settled. I know the patterns I want, the layering I want, the trade-offs I’ve already made and the ones I’ve decided to swallow. The LLM stops being my sparring partner. It becomes the one doing the typing.

I’m not handing it judgment here. I’m handing it execution. The decisions are mine. The keystrokes are not.

That’s a very different relationship than the design one, and conflating them is where I think most of the “AI is making everyone lazy” stories actually come from. People are letting it drive in design, where it should be hostile. And they’re reviewing every diff line in implementation, where they should be moving fast.

The asymmetry is the whole thing. The mode you should use isn’t “let it drive” or “don’t let it drive” - it’s drive when it matters, let it type when it doesn’t.

None of this is settled

One more thing while I’m being honest about where I’ve landed: none of this is settled. The shape of how we use AI in software is still moving. What works right now isn’t necessarily what’ll work in two years, or even one.

The economics in particular look unstable to me. The frontier model providers are running at structural losses that the public reporting puts in the tens of billions of dollars per year, with breakeven targets that slide further out every time someone updates the spreadsheet. The plans companies hand to their developers - “unlimited” Claude through this provider, generous Copilot allowances, near-free GPT through that one - exist because the providers are trying to win the market, not because the math works.

The reckoning is already starting. GitHub Copilot is transitioning from premium-request limits to token-metered usage-based billing on June 1, 2026, and Cursor’s June 2025 shift to a credit-pool system caused enough community heartburn that it’s still being relitigated. Industry analysts have started using phrases like “the era of subsidized AI model usage is over”. I’d be surprised if what we’re paying now is what we’ll be paying in two years.

Where it goes after that is anyone’s guess. One direction I’m watching is local models. Qwen and Gemma are already good enough for a lot of real work on hardware a developer can actually own, and the open source agentic harnesses are getting there fast - Opencode is the one I’d point at if someone asked. A future where the “agentic” piece runs locally on a workstation instead of streaming tokens out to a cloud vendor’s GPU farm doesn’t sound crazy to me. It might not happen. It might happen and turn out to not matter. But it’s the kind of shift that would change a lot of the calculus around what I described above.

So here’s where I actually am. The marketing hype is overdone - it’s not as transformative as the breathless takes claim, and a lot of what’s being sold as revolutionary is repackaged autocomplete. But it’s also a real change, and pretending it isn’t would be just as silly. The best a working professional or a hobbyist can do is stay responsible enough not to hurt themselves long-term, and pick up whatever real efficiency gains are actually there.

What I’m leaving for later

There’s a lot more in this neighborhood I want to come back to. Whether the cost of running all of this is worth what we’re getting back - both the literal infrastructure cost on a power grid that’s already creaking, and the headcount cost as companies thin out engineering staff in the name of “efficiency.” Whether the new floor I described above is actually new, or whether we always had vibe-coders and AI just gave them better camouflage. Whether the people I’m watching get lazy now will catch up later, or whether the muscle never gets built in the first place.

Different posts. For now: argue with it in design, direct it in implementation, don’t hand it your judgment in either one.