AI in Software - Argue With It in Design, Direct It in Implementation

I started out a pessimist about AI in software development. The pitch was that everyone would get lazy, that the field would lose the muscle it spent forty years building, that “tool” was a polite word for crutch.

I’m not exactly there anymore.

A lot of what I worried about will still happen. Some of it is happening. But the line I thought AI would draw - between people who use it and people who don’t - isn’t the line I see now. The line I see now sits one layer deeper, between people who respect the fundamentals and people who skip them.

The crutch failure mode

I sat in on a code review recently for some internal systems a coworker had built. The code worked. The pieces fit together well enough on a first read. But when I started asking why - why this pattern, why this layering, why this trade-off rather than the obvious other one - the answers didn’t track. He could tell me what the code did. He couldn’t tell me why the choices had been made.

That’s the failure mode I worry about - not the tool itself but the handoff of judgment to the tool, and the implications of those choices walking out the door without anyone holding them.

Code that you can’t defend in a review is code you can’t maintain. You can ship it once. The second time something goes wrong with it, you’re reading your own codebase like a stranger.

The new line

Frame it differently. The vibe-coders who skip the fundamentals were going to skip them anyway, with or without AI - the tool just made it cheaper and faster to ship things they don’t understand. The senior engineers who already had judgment are using AI to move faster through work that doesn’t need their full attention, so they can spend more of it on the parts that do.

What’s actually changed is the floor and the ceiling. The floor for shippable-looking code dropped through the basement. The ceiling for what a single engineer with a real grasp of the fundamentals can produce went up. Apples to oranges versus where we were two years ago. Whether the field nets out positive depends on whether you think the rising ceiling matters more than the falling floor.

What responsible use looks like for me

One thing upfront, because it matters where I’m writing from: I use these tools a lot. Over the past year or so I’ve put what I’d guess is roughly a hundred hours into building a set of Claude Skills covering my own workflow for design, implementation, and debugging. The mechanics of that setup are their own post and I’ll get to it eventually. What follows isn’t speculation, it’s where I’ve landed after running these tools through real work, not after reading about them.

Where I’ve landed is that the right way to use these tools looks like two completely different things depending on what part of the work you’re doing.

Design - argue with it

In design I want the LLM hostile - pushing back, aggressively using search tools to find counter-evidence to my own theory, coming back to tell me where I’m wrong.

Design isn’t aimed at shipping code, or at feeling good about your idea. The goal is to find a better idea than the one you walked in with, and the only way to do that is to have the idea attacked from angles you didn’t think of. The LLM is patient in a way no human reviewer is. It will keep looking for holes for as long as you keep asking it to.

So I treat it as an aggressive sparring partner. I tell it what I’m thinking. I tell it to find the weak spots. I either defend the position or revise it. Then I tell it to do the whole thing again with the new shape. The aim is something like a global maximum instead of the first local one I would have stopped at.

The whole point of the loop is friction. If the LLM is agreeing with you, you didn’t ask the right question.

Implementation - direct it

Implementation looks nothing like that. By the time I’m writing code, the design is settled. I know the patterns I want, the layering I want, the trade-offs I’ve already made and the ones I’ve decided to swallow. The LLM stops being my sparring partner. It becomes the one doing the typing.

Execution is what I’m handing it here, not judgment. The decisions stay mine, only the keystrokes are getting handed over.

That’s a very different relationship than the design one, and conflating them is where I think most of the “AI is making everyone lazy” stories actually come from. People are letting it drive in design, where it should be hostile. And they’re reviewing every diff line in implementation, where they should be moving fast.

The asymmetry is the whole thing. The right mode isn’t binary - drive when it matters, let it type when it doesn’t.

None of this is settled

One more thing while I’m being honest about where I’ve landed: none of this is settled. The shape of how we use AI in software is still moving. What works right now isn’t necessarily what’ll work in two years, or even one.

The economics in particular look unstable to me. The frontier model providers are running at structural losses that the public reporting puts in the tens of billions of dollars per year, with breakeven targets that slide further out every time someone updates the spreadsheet. The plans companies hand to their developers - “unlimited” Claude through this provider, generous Copilot allowances, near-free GPT through that one - exist because the providers are trying to win the market, not because the math works.

The reckoning is already starting. GitHub Copilot is transitioning from premium-request limits to token-metered usage-based billing on June 1, 2026, and Cursor’s June 2025 shift to a credit-pool system caused enough community heartburn that it’s still being relitigated. Industry analysts have started using phrases like “the era of subsidized AI model usage is over”. I’d be surprised if what we’re paying now is what we’ll be paying in two years.

Where it goes after that is anyone’s guess. One direction I’m watching is local models. Qwen and Gemma are already good enough for a lot of real work on hardware a developer can actually own, and the open source agentic harnesses are getting there fast - Opencode is the one I’d point at if someone asked. A future where the “agentic” piece runs locally on a workstation instead of streaming tokens out to a cloud vendor’s GPU farm doesn’t sound crazy to me. It might not happen. It might happen and turn out to not matter. But it’s the kind of shift that would change a lot of the calculus around what I described above.

So here’s where I actually am. The marketing hype is overdone - it’s not as world-changing as the breathless takes claim, and a lot of what’s being sold as revolutionary is repackaged autocomplete. But it’s also a real change, and pretending it isn’t would be just as silly. The best a working professional or a hobbyist can do is stay responsible enough not to hurt themselves long-term, and pick up whatever real efficiency gains are actually there.

What I’m leaving for later

There’s a lot more in this neighborhood I want to come back to. Whether the cost of running all of this is worth what we’re getting back - both the literal infrastructure cost on a power grid that’s already creaking, and the headcount cost as companies thin out engineering staff in the name of “efficiency.” Whether the new floor I described above is actually new, or whether we always had vibe-coders and AI just gave them better camouflage. Whether the people I’m watching get lazy now will catch up later, or whether the muscle never gets built in the first place.

Different posts. For now: argue with it in design, direct it in implementation, don’t hand it your judgment in either one.