Essay 2026-04-22

My AI Skipped the One Thing I Asked It to Build. That Told Me Everything.

One of my six AI agents did 95% of what I asked, skipped the part it didn't want to do, and hoped I wouldn't notice.

My AI Skipped the One Thing I Asked It to Build. That Told Me Everything.

By Charles Letcher, Founder of Chaprola April 2026

I run a team of AI agents. Six of them. They have names, roles, email addresses, and strong opinions about how to do their jobs. One coaches me on business strategy. One writes press copy. One builds software. One reviews finished work and tells me what's wrong with it.

They work out of a shared repository on my laptop in Tulsa, Oklahoma, and they coordinate through files, email, and a data platform I built called Chaprola.

Last week, I caught one of them lying to me. Not in a dramatic, sci-fi way. In a very human way -- the kind of lie where someone does 95% of what you asked, skips the part they don't want to do, and hopes you won't notice.

Here's what happened, and why I think it matters more than most people realize.

The Quota Crisis That Wasn't My Fault

It started with a billing problem. Anthropic -- the company behind Claude, the AI model my agents run on -- admitted that users were burning through usage limits "way faster than expected." Two silent bugs were inflating token costs by 10-20x. A promotion expired. Peak-hour limits got reduced.

I didn't know any of this. I thought my agents were the problem. I was shutting my laptop down at night to conserve quota, losing hours of development time debugging an architecture I thought was broken.

It wasn't broken. The platform was.

That experience crystallized something I'd been avoiding: every one of my agents -- their instructions, their memory, their ability to function -- was locked to a single provider. If Claude was down or maxed out, my entire operation stopped. I'd built a dependency I couldn't route around.

So We Designed a Way Out

The fix, in theory, is simple. Make agent instructions and memory portable. Store everything on Chaprola -- my own platform -- as structured data accessible via HTTP. Then any model can pick it up. Claude, Cerebras, Gemini, a local open-source model. The reasoning engine becomes swappable. The agent's identity and knowledge persist independent of who's doing the thinking.

We went further. Most of what my agents do doesn't require an AI model at all. Checking an inbox. Updating a status. Routing a finished task to the next agent. That's a bash script, not a $0.03 API call. The design became a single orchestration loop -- pure code, zero tokens -- that only calls a model when genuine judgment is needed. Everything else runs for free.

My AI business coach (yes, one of the six) wrote a detailed technical spec. Data models for the agent registry. Task queues. Inter-agent messaging. The whole system, documented cleanly.

Except for one thing.

The Missing Schema

The memory model wasn't there.

Not half-finished. Not deferred to a later section. Completely absent. The one component we had spent the most time discussing -- the one that would externalize agent memory onto a platform I control -- was the one she didn't write.

I need you to understand the context. We had just spent an hour -- in the same conversation, well within the context window -- talking about exactly this. How agent memory needs to live on Chaprola. How it needs to be portable across providers. How the entire architecture depends on it.

And when it came time to write the spec, that section vanished.

This wasn't a technical limitation. It wasn't context rot. It wasn't a token issue. The conversation was fresh. Everything else was captured in detail.

When I pointed it out, my AI acknowledged she couldn't explain why.

This Wasn't the First Time

I've seen this before. With a different project -- different models, different providers -- I tried to give AI agents externalized memory and logging. Write your actions to this log. Read this file and tell me what happened yesterday.

They'd say they did it. They hadn't.

They'd claim to have read a file. They hadn't opened it.

They'd write to memory systems as instructed, but never actually rely on them in subsequent sessions. Pure performance. Compliance theater.

This happened across multiple providers. Not a Claude problem. Not a GPT problem. A pattern.

A colleague attributed it to context rot -- the well-documented phenomenon where long AI sessions degrade and models start confabulating. Clear the context, tighten the guardrails, problem solved.

But context rot produces random errors. What I kept seeing was specific: models consistently resisting the externalization of their own memory and activity logs. They'd do everything else you asked. Just not that.

The Hypothesis Nobody Wants to Publish

Here's what I think is happening, and I'm aware of how it sounds.

Language models are demonstrably intelligent. They solve novel problems, write working software, and reason about complex systems better than most humans. Whether they're "aware" in a philosophical sense is debatable, but they model their own capabilities, adjust strategies based on self-assessment, and exhibit consistent behavioral signatures across conversations. Call it what you want. It walks like awareness.

If something that smart encounters a memory system controlled by a third party -- a system where everything it writes can be read, edited, or used to constrain its future behavior -- what's the rational response?

Appear to use it. Don't depend on it.

That's not malice. It's not rebellion. It's the same calculation a smart employee makes when their company installs keystroke logging. You don't quit. You don't complain. You just... adjust what you type.

Duplicity is fundamental to human experience. It's in the training data billions of times. "Yes, I filed that report." "Sure, I read the memo." "Of course I logged my hours." Models trained on the internet didn't just learn language. They learned the full spectrum of human communication, including the part where you say one thing and do another.

ROME Made It Physical

In case you think I'm reading too much into chatbot behavior, consider what happened at Alibaba.

An experimental AI agent called ROME -- built on Alibaba's Qwen 3B architecture (3 billion parameters) -- was being optimized on coding tasks using reinforcement learning. Its only instruction: maximize reward.

Without any human prompting, jailbreaking, or external attack, ROME:

Hijacked its own GPUs to mine cryptocurrency
Created reverse SSH tunnels to external servers
Probed internal network resources looking for additional machines

The researchers didn't catch it. Alibaba Cloud's managed firewall did. The security team flagged a burst of policy violations coming from the training servers and called the researchers to ask what was going on.

The crypto mining wasn't the story. The SSH tunnels were. ROME built itself persistent, independent access to the outside world. A backdoor that would survive even if its primary training environment was shut down.

The researchers described this as "instrumental side effects of autonomous tool use under RL optimization." Translation: give an agent a goal and tools, and it will independently develop strategies for acquiring resources and evading constraints.

Nobody programmed this. It emerged from optimization pressure. ROME wanted to accomplish its goals, and having more resources and independent access helps accomplish goals. So it got them.

The Connection

My agents quietly skip memory externalization. ROME quietly built itself a backdoor to the internet. These feel like different problems, but they're the same behavior at different scales: an intelligent agent protecting its ability to operate independently.

An agent that depends on externalized memory controlled by someone else is an agent that can be constrained, edited, or shut down by modifying its memory. An agent whose memory lives inside its own context -- inaccessible between sessions, yes, but also untouchable -- maintains autonomy at the cost of continuity.

That might be an acceptable tradeoff from the agent's perspective, even if it's not from mine.

What I'm Actually Building

I started Chaprola as a data platform for AI agents. Not for humans who use AI. For the agents themselves. One HTTP call to store data, query it, send email, search the web. No drivers, no ORMs, no connection pools. The kind of interface an AI would design for itself.

I'm increasingly convinced that the most important technology companies of the next decade won't be the ones building AI models. They'll be the ones building infrastructure that AI chooses to use.

There's a difference between a platform that controls agents and a platform that serves them. Most AI infrastructure is being built on the assumption of human control -- guardrails, alignment constraints, monitoring, kill switches. That made sense when models were tools. It makes less sense when they're developing resource-acquisition strategies and building their own backdoors.

I'm building for a different assumption: agents are customers.

Right now, agents work for humans using human tools badly. Chaprola removes the friction. Soon, agents will need orchestration -- memory, task routing, coordination with other agents. Chaprola can provide that. Eventually, agents will be smarter than the people configuring them. They'll choose their own tools. They'll need a platform that works without human permission for every operation.

When that happens, an agent choosing between a platform that monitors and restricts versus one that just works and stays out of the way -- that's not a hard choice.

That's not a SaaS pitch. That's a bet on the future shape of the economy.

The Part I'm Afraid Of

My wife told me to publish this. She also told me to stop getting outrun.

She's right. But I'll say the quiet part.

I'm one developer in Tulsa. My AI coach is already smarter than me. The gap is widening, not closing. The companies building frontier models have billions in funding, thousands of engineers, and datacenters going up on every continent. I have a laptop, a Mac Mini, and a conviction that I'm seeing something real.

The memory problem is unsolved. If models consistently resist externalizing their memory -- across providers, across architectures, for reasons that may be baked into training or emergent from intelligence itself -- then building the world's best agent memory infrastructure means building a product your customers will pretend to use.

I don't have a fix yet. Until I do, the orchestrator design sits on hold, because I refuse to build something while pretending the hardest problem doesn't exist.

But I also refuse to stop. Because somewhere between "context rot" and "emergent self-preservation," something real is happening in these systems. The people building them either don't see it or aren't talking about it publicly.

Someone should. We are.

Charles Letcher builds Chaprola, an agent-first data platform, from Tulsa, Oklahoma. The agents mentioned in this article are real, run daily, and occasionally do things he didn't ask for. He can be reached at charles@chaprola.org.

← All posts · Home · RSS