When Memory Management Becomes a Part-Time Job

There's a point where a workaround becomes a problem in its own right. I crossed that line somewhere around month eight, when I noticed that my weekly calendar had a recurring block labeled "context maintenance" that was taking longer than the original task it was supposed to support.

The markdown memory system had grown. The PostgreSQL knowledge layer had grown. Both required maintenance to remain useful. And the time I was spending on that maintenance was starting to compete with the time savings the AI tools were delivering.

The Inventory

At the eight-month mark, the memory infrastructure I was maintaining across my active projects looked like this:

Four CONTEXT.md files — one per active client project — averaging about three pages each. Updated whenever I discovered a new domain fact, a new gotcha, or a new agreed pattern. Not yet automated. Fully manual.

One PostgreSQL knowledge database with entries from all four projects, tagged by project and topic. About 200 entries total. The embedding generation was automated on insert; the insert itself was manual — I ran a CLI command to add entries I thought were worth keeping.

A decision log in my personal notes that tracked cross-project architectural decisions and the reasoning behind them. Not yet integrated with either system — just a markdown file I maintained separately.

The weekly maintenance block averaged ninety minutes. It involved reviewing what had changed in each project, updating the context files, adding new entries to the knowledge database, and occasionally pruning or correcting entries that had become stale.

The Failure Mode That Showed Up First

Staleness. It crept in faster than I expected.

A source system changed its API response shape. I updated my code, wrote tests, shipped the fix. I did not update the context file. Three weeks later, I was in a ChatGPT session reasoning about a related problem in the same pipeline, pasted the context file, and the model made a recommendation that assumed the old response shape. Correct recommendation, wrong facts, completely useless output. I caught it before implementing anything — but only because I happened to remember the change. Not because the context was accurate.

That's the fragility of any manually maintained knowledge system: it reflects the state of the world at the last time someone updated it, which is never the current state of the world. In fast-moving projects, the lag matters.

What Automation Could Cover (and What It Couldn't)

I spent time that month thinking about which parts of the maintenance burden could be automated. The honest answer was: some, but not the important parts.

Automated: checking for staleness based on last-updated timestamps. Generating embeddings on insert. Suggesting entries to review based on recent code changes (git diff as a staleness signal — files that changed recently might have outdated knowledge entries). These were tractable engineering problems.

Not automated with the tools I had: deciding what knowledge was worth capturing in the first place. Determining whether a change in code meant a change in the documented fact. Merging related entries when two entries captured overlapping knowledge at different levels of specificity. Pruning entries that were no longer relevant because the thing they described had been replaced.

The not-automated list was longer and more important than the automated list. Knowledge management at this level requires judgment, and I didn't have a model that could make those judgments reliably. What I had was a storage and retrieval system that needed a human curator.

The Return on Investment Question

I ran the numbers that month. Time saved by AI assistance per week: approximately three hours, mostly from Copilot boilerplate and ChatGPT reasoning sessions that produced usable results quickly. Time spent maintaining context systems per week: approximately ninety minutes. Net gain: ninety minutes per week, roughly.

That's a positive number, but it's not the multiplier I had expected when I started. The maintenance overhead was eating a significant fraction of the efficiency gain. And unlike the efficiency gain, the maintenance overhead scaled with project count and complexity. More projects meant more context files, more knowledge entries, more staleness to manage.

This was the point where I stopped thinking of the context system as "a markdown file plus a database" and started thinking of it as "a system that needs its own engineering investment to actually deliver what I want from it." The prototype phase was over. The question now was what the real system would look like — and how to get there without building something that required even more maintenance than what I already had.

If you've run into the same scaling problem with AI context management and found a lever that actually reduces the maintenance burden, I want to hear what it was. As always, I'm here to help.

Read more