When we joined a staffing platform as their engineering partner, the codebase had a problem documentation couldn't fix.

Nine years of development. Twelve heavily-interconnected repositories. Features layered on top of features, nothing ever deprecated. The engineers who originally built most of it had all the context - and siloed it. The system worked - mostly - but only made sense if you had been present from the beginning.

New engineers spent weeks trying to understand why a route existed, what a service was actually responsible for, or whether a file was still active or just never cleaned up. There was no decision trail. Meetings happened, architectural choices got made, and two weeks later nobody could reconstruct why.

The LLM Problem

The team had tried AI tools before we arrived. It didn't work well. The codebase complexity defeated generic assistance. An LLM given a slice of one repo had no context about how it connected to the other eleven. It would suggest things that were technically correct in isolation but wrong for this specific system.

Engineers got burned a few times - AI-generated code that looked right but conflicted with assumptions baked into another repo - and became skeptical. Not of AI broadly, but of the specific experience of getting confidently wrong answers about their own platform.

That skepticism was reasonable. The tools weren't calibrated to their context.

One Rule That Changed Behavior

Before introducing any tooling, we established a single accountability rule: whoever pushes code is responsible for it. Not the AI, not the tool, not the suggestion. The engineer who reviews and merges the change owns that decision.

This sounds obvious. In practice, it shifted behavior immediately. Engineers stopped treating AI output as something to paste and ship. They started treating it as a first draft - which it is. The rule made the risk of vibe coding visible and personal. Once that was in place, the tooling could actually help.

We've written more about the broader implications of this pattern in The Accountability Wrapper.

Making Knowledge Persistent with Claude Mem

The core problem wasn't the codebase itself. It was that nine years of institutional knowledge was trapped in people's heads, old Slack threads, and meetings that never got documented.

We started capturing everything with Claude Mem. Every technical meeting had its transcript analyzed. Decisions, trade-offs, things that were tried and abandoned - all indexed. Claude Mem stores memories in a git-backed format, which meant the entire team shared the same context pool. When an engineer started a new session, they weren't beginning from scratch. They had access to why the auth service was structured the way it is, why a particular API endpoint was never deprecated, what was attempted six months ago and why it failed.

The thing that made this stick in practice: mistakes stopped repeating. When an engineer hit a problem that someone else had already solved, or tried to make a decision that had already been made and reversed, that context surfaced before they went further.

The team's institutional knowledge became queryable rather than conversational.

Planning That Surfaces Problems Early with GSD

The other friction point was planning. On a codebase this distributed, an engineer starting a new feature without full context would regularly discover - mid-implementation - that something they were building conflicted with logic in another repo. Work would get partially done, blocked, and either abandoned or restarted.

We introduced GSD's discuss phase into every piece of work. Before any plan is written, GSD asks structured questions about the feature: edge cases, cross-repo dependencies, assumptions, things that could go wrong. On this codebase, those questions reliably surfaced issues that weren't visible from the ticket alone.

The outputs - a structured context document - got posted to Slack through GSD's integration. Anyone on the team could read through the context and leave comments before work began. Once there was alignment, execution moved on autopilot: GSD broke the plan into phases, executed them with atomic commits, and maintained state across sessions so context didn't reset between days.

The shift was from "engineer discovers a conflict on day three" to "conflict surfaces in the discuss phase before a line is written."

The Proof: 3 Days to an Agentic Chat POC

About two months in, we needed to demonstrate what was possible to stakeholders - specifically, an agentic chat interface for the platform's recruitment workflow.

The requirements were real: job search results rendered as interactive cards inside the conversation thread, resume parsing that triggered an inline document upload widget, multi-step tool calling that could search, match, and surface a job application within a single chat session. The kind of UX where the conversation itself becomes the product.

A build like that would typically take a development team several weeks to a month. We shipped a working POC in three days using Claude and the AI SDK. The demo moved a funding conversation forward.

That timeline wasn't exceptional speed. It was what happens when the context layer is working. Every decision from the prior two months was accessible. Every constraint in the codebase was documented. The engineer building the POC didn't spend time reconstructing what already existed. They built.

What This Actually Requires

The accountability rule, Claude Mem, and GSD address three different failure modes on a legacy codebase:

Accountability keeps humans reviewing AI output rather than shipping it blind
Claude Mem makes institutional knowledge persistent across sessions and across engineers - not just for the person who was in the room
GSD surfaces complexity during planning, where it's cheap to address, rather than during implementation, where it isn't

None of these work in isolation. The memory layer is only useful if engineers actually start sessions with it. The planning structure only helps if the discuss phase is taken seriously, not skipped. The accountability rule only holds if it's enforced when something gets pushed without review.

Together, they made a 9-year-old, 12-repo codebase something a new engineer could work in productively and something an experienced engineer could ship fast in.

The codebase didn't change. The context layer around it did.

How We Introduced Claude Code to a Legacy Engineering Team

The LLM Problem

One Rule That Changed Behavior

Making Knowledge Persistent with Claude Mem

Planning That Surfaces Problems Early with GSD

The Proof: 3 Days to an Agentic Chat POC

What This Actually Requires

More in AI

Human-in-the-Loop AI for B2B Operations: Balancing Automation and Oversight

Dynamic Prompt Engineering: The Essential New Skill for AI-Driven Teams

Cloud TPU 7 & Gemini 2.5: What Google's Latest AI Platforms Mean for Your Backend

Ready to Build Your Next Big Thing?