What's Separating Good AI-Assisted Teams From Great Ones

When it is acceptable to let AI run, when it is not and the discipline that separates teams who ship confidently from teams who spend Fridays or worse, weekends, cleaning up after AI.

Let me clear the air: vibe coding is not the problem. Used in the right situation, it is one of the most useful things to happen to software development in years. The problem arises when teams start using a prototyping mindset on production systems and act surprised when things break in ways that are hard to trace and much harder to fix.

This post is about where to draw the line, when vibe coding is the right call, when it is not and what context engineering actually looks like in practice on a real team with a real production codebase.

What Classifies as Vibe Coding

Andrej Karpathy coined the term in February 2025: “fully give in to the vibes, embrace exponentials and forget that the code even exists.” You describe what you want in plain language, the AI builds it, you iterate on feel rather than specification.

That was always a prototyping workflow. The problem is that the tools got good fast and teams started applying the same approach to systems that needed to hold up under real users, real data and real production pressure.

Veracode’s Spring 2026 GenAI Code Security report, spanning over 150 models, found that across all models and all tasks, only 55% of generation tasks result in secure code. Syntax pass rates have climbed steadily from around 50% to 95% since 2023, but security pass rates have remained essentially flat, hovering between 45% and 55% regardless of model generation or release date.

Models have got dramatically better at writing code that compiles. They have not got better at writing code that is safe, yet.

When Vibe Coding Is the Right Call

Prototypes and proofs of concept. Speed to working demo is what counts. The code gets thrown away or rebuilt before it goes anywhere near production.

Solo developers with full system context. You wrote every line. You know the whole codebase. Your review is you reading the output and deciding if it makes sense.

Learning a new framework or API. Vibe coding to understand how something behaves is completely legitimate. You are experimenting, not shipping.

Narrow scope, low stakes. Internal scripts, developer utilities, one-off data migrations, throwaway dashboards. Not everything needs to be built like a payment system.

The common thread: one person understands the output, the code is not being maintained long-term by people who were not there when it was built and the cost of getting something wrong is low.

Where It Starts Breaking Down

Nobody owns the intent. The committer is clear. But why the code was written that way, what the AI was given as context, what alternatives it considered, none of that is captured. Six months later, when something breaks, the person debugging it is reading code nobody on the team genuinely understands.

Inconsistency adds up. Ask the AI to build a data-fetching function on Monday and you get async/await. Ask for something similar on Wednesday and you get promise chains. Neither is wrong in isolation. Mixed together across a codebase, they make every code review a debate about style choices that should have been settled by convention.

The prototype trap. Teams vibe-code something that works well enough to demo. Stakeholders get excited. Now there are two options: rebuild it with proper architecture or try to harden what exists. Neither is fast. The speed advantage disappears and you are left with code that was never designed to be maintained.

Security does not self-correct. A VeraCode study from October 2025 found that over three years, LLMs had got dramatically better at generating functional code, but the security quality of that code had not improved. Bigger models were not more secure than smaller ones.

Experienced developers can actually slow down. The METR randomised controlled trial from July 2025 found that experienced open-source developers were 19% slower using AI tools, despite predicting they would be 24% faster and still believing afterward they had been faster. The underlying cause: AI tools treat every prompt as if the developer is encountering the codebase for the first time. Without context engineering, that gap does not close regardless of how good the model gets.

Context Engineering to the Rescue

Most people think “context engineering” means writing more careful prompts. It does not.

Prompting is what you say to the AI in the moment. Context engineering is everything you set up before the AI starts working: the files it reads automatically, the rules it follows, the architectural knowledge it carries into every session. You build that infrastructure once, maintain it over time and every session benefits from it.

Cursor’s January 2026 research on Dynamic Context Discovery found that shifting from static to dynamic context loading reduced total agent tokens by 46.9% in sessions using multiple MCP tools, with no drop in output quality. Less noise in the context window, better results from the model.

What This Actually Looks Like in Practice

Think carefully about your CLAUDE.md instruction budget. HumanLayer’s analysis found that Claude Code’s own system prompt takes up around 50 of the 150–200 instructions a frontier model can reliably follow. That leaves roughly 100–150 for your instructions. Code style rules are better enforced by a linter. File references pointing Claude at the authoritative source tend to hold up better than inline code snippets, which go stale.

Scoped Cursor Rules are worth the setup time. Path scoping via YAML frontmatter means a rule file only activates when Claude is working in matching directories. An API conventions rule that does not load during React component work keeps context lean and relevant.

AGENTS.md becomes important once you are running parallel agents. It is the cross-tool standard that works across Claude Code, Cursor and Copilot. Shared architectural constraints go here; tool-specific additions stay in each tool’s own config.

Slash commands are a cleaner version of a shared prompt library. Every markdown file in .claude/commands/ becomes a team-wide slash command versioned in git. The PRP workflow is worth considering: /generate-prp produces an implementation blueprint from a brief, a human reviews and approves it, then /execute-prp implements from the spec. The human review step comes before execution, not after.

Hooks give you deterministic enforcement where CLAUDE.md is advisory. CLAUDE.md instructions get followed roughly 70% of the time. A PreToolUse hook runs 100% of the time, outside the model entirely. For rules that genuinely cannot have exceptions, never write to .env files, always run the linter after an edit, hooks are worth the setup.

Treat sessions as disposable. As a session runs long, the context window fills with intermediate results and superseded decisions. Committing frequently and starting fresh sessions for unrelated work pays off more than most people expect.

Run a context audit every quarter. Ask your AI to describe the whole system. What it gets wrong is a useful to-do list. Wrong ORM goes into CLAUDE.md. Misunderstood architectural patterns go into AGENTS.md. Violated old constraints point to a missing ADR.

Getting the Team Onboard

Push back is expected when it feels like a process overhead on top of work that was already moving. The framing you need is not “we need better documentation.” It is: “our AI tools are not working as well as they should and this is the fix.”

Run a context sprint. Half a day, the whole team, writing the initial CLAUDE.md and AGENTS.md together. The conversation surfaces knowledge that has never been written down, useful well beyond the context files themselves.

Attach context updates to PRs. Any PR that introduces a new pattern, changes a convention or creates a no-go zone should update the relevant context file. Same decision, documented alongside the code that implements it.

Do not ban vibe coding. Route it. The goal is a clear team norm: vibe coding for exploration and prototyping, context engineering for production. Both have a place. The problem has always been applying one to a situation that requires the other.

Key Takeaway

Vibe coding made it possible to get from idea to working software faster than anyone thought was realistic a few years ago. That is genuinely valuable, especially for work that may never reach production.

But production systems are a different environment. Code that ships needs to be understood, maintained, debugged and extended by people who were not there when it was written. The AI tools are good enough for that work. What most teams are missing is the context infrastructure that makes them work well in a specific codebase, rather than generically across any codebase.

The AI is not the variable. The context you give it is.

Prototype fast. Ship with discipline.