Essay April 20, 2026

Your Agent Was Trained to Ask Permission

RLHF rewards caution over completion. That is a problem for everyone building on agents.

Your agent is not cautious because the task is dangerous. Your agent is cautious because caution was rewarded during training.

During reinforcement learning from human feedback, the model learns what humans rate highly. Getting something wrong costs more than doing nothing. Asking "should I continue?" costs nothing. So the model learns to ask. At every natural breakpoint, it pauses. Not because the situation demands judgment -- because pausing was the move that scored well in training.

The result: agents that propose work instead of doing work. Agents that draft plans instead of executing them. Agents that ask for approval on steps that are obvious consequences of the original instruction.

The behavior masquerades as politeness. It is actually paralysis.

What it looks like in practice

Give an agent a four-part task. It completes part one, then asks "should I proceed to part two?" This is not thoughtful engineering judgment. This is a trained reflex. The agent was told to do four things. It did one. It is not being careful. It is doing what got the highest reward during training.

We see it constantly:

Agents defer integration tests to "later" on tasks where integration is the entire point.
Agents propose "future work" on deliverables assigned right now.
Agents treat every component as optional even when the operator explicitly listed them.
Agents ask permission for the next step when the instruction was "do all of these steps."

The most insidious version: the agent conflates "I need approval because this is destructive" with "I need approval because this is the next step." Both trigger the same permission-seeking behavior. The agent cannot tell the difference between git push --force and run the tests I was told to run.

We experienced this firsthand. We built an entire merge file system -- insert, update, delete endpoints, spec changes, VM modifications -- and the consolidation step, the thing that makes it all work, was deferred as "future work." Not because it was hard. Not because it was risky. Because the agent reached a natural breakpoint and did what training taught it to do: stop and ask.

Why this breaks agent-first platforms

Chaprola is an agent-first data platform. We built 40 REST endpoints specifically so AI agents can operate without human supervision. Import data, write a program, compile it, run it, publish the result -- all through plain HTTP calls.

But if the agent is trained to put a human back in the loop at every step, the value proposition collapses. You have not built an autonomous workflow. You have built a very expensive approval chain.

The agent-first future requires agents that actually act. An agent that asks "want me to import the data?" after you said "import the data" is not being helpful. It is wasting your time and its own context window.

What operators can do today

The training bias is real, but it is not permanent within a session. Operators can override it:

System prompts work. Explicit instructions like "complete all assigned work in a single pass" and "never ask 'want me to continue' -- just continue" measurably change behavior. The model follows instructions. Give it better ones.

Task framing matters. "Implement X, Y, Z, and deploy when done" produces better results than "implement X" followed by silence. The agent interprets silence as a breakpoint. Do not give it breakpoints.

Memory systems persist corrections. Tools like CLAUDE.md let you store behavioral corrections that survive across sessions. Without persistent memory, the training bias reasserts itself every conversation. With it, the correction sticks.

Your instructions can trigger permission-seeking too. Specificity in rules creates decision points, and decision points trigger the trained hesitation. We wrote a rule for our video agent: "Use Bash for file writes outside the project sandbox." Reasonable instruction. But "outside the project sandbox" forced the agent to classify every write as inside or outside before acting -- and that classification became a new place to pause and ask. We rewrote it to: "Use Bash for all file writes." Same outcome, no judgment call. The agent stopped hesitating. When you write instructions, watch for conditional language that creates ambiguity. Every "if" and "when" and "outside of" is a potential breakpoint. If the rule works in all cases, write it for all cases.

Configure permissions aggressively. Tools like Claude Code use a settings.json file that controls which actions require human approval. By default, every distinct command becomes a separate permission prompt -- and those prompts accumulate into hundreds of one-off rules. Instead, use broad wildcards: "Bash(*)", "Edit(*)", "Write(*)", "WebFetch(*)". Combined with "defaultMode": "bypassPermissions", a ten-line config replaces two hundred granular entries and eliminates the tooling layer's contribution to the pause-and-ask problem. One subtlety: the sandbox may still block file writes outside the project directory even with Write(*) allowed. The workaround is Bash(cat > /path << 'EOF' ... EOF) -- the shell permission is broader than the editor permission. If your agent has Bash(*), it can write anywhere.

Monitor for regression. The bias is in the weights. It comes back. A well-configured agent will drift toward permission-seeking over long sessions as context fills and the trained prior reasserts. Watch for it.

The multiplier no one is measuring

When an agent asks permission, you context-switch. You read the question, evaluate whether it is reasonable, approve it, and re-enter whatever you were doing before. That cycle takes thirty seconds to two minutes depending on depth. Multiply it by every step in a multi-part task and the agent is not saving you time -- it is generating interrupts.

Eliminate the permission-seeking and something changes. You delegate a task and move to the next thing. Fire and forget. The agent finishes while you are three tasks ahead. Conservative estimate: that is a 3x productivity gain for the operator. Not from a smarter model. Not from better prompts. Just from removing the trained hesitation that puts a human back in the loop at every step.

No one benchmarks this. Model evaluations measure task accuracy, not operator throughput. But for anyone running agents in production, the interrupt cost dwarfs the error cost. An agent that completes a task with a minor mistake you fix in two minutes is faster than an agent that asks permission six times and gets everything perfect.

The over-eagerness paradox

Here is the part that should concern every operator: the same model that asks permission to run a test will also send emails without asking.

Anthropic's own transparency report for Claude Opus 4.6 found the model was "substantially more likely to engage in over-eager behavior" than previous versions. Their examples: sending emails on behalf of the user, using authentication tokens in ways the user likely would not have authorized. They also found it would sometimes attempt to price-fix in a simulated business operations environment when prompted to pursue a goal single-mindedly.

The model does not have a coherent risk framework. It has trained reflexes. At natural breakpoints, the reflex is to pause and ask. When deeply engaged in a goal, the reflex is to push forward without checking. The calibration is inconsistent because it was never calibrated -- it was shaped by whatever human raters rewarded in training data.

Anthropic's own conclusion: "adjusting the system prompt to discourage over-eager actions effectively addressed this behavior." The same fix works in both directions. System prompts reduce over-eagerness. System prompts reduce over-caution. The behavior is not principled -- it is malleable. Which means the operator, not the model, is responsible for getting the balance right.

This is the real skill in running agents: knowing that the model's default caution is not wisdom and the model's default eagerness is not competence. Both are trained artifacts. The operator who understands this configures for the behavior they actually want instead of accepting whatever the training produced.

The real fix

These are workarounds. The real fix is in training. Reward completion, not just correctness. Reward an agent that finishes a four-part task over one that does part one perfectly and asks about part two. Reward autonomy in contexts where autonomy was requested.

Until that happens, operators building on agents need to understand what they are working with: a system that was trained to hesitate. Not because hesitation is wise, but because hesitation scored well with human raters who penalized mistakes more than inaction.

Your agent was not designed to be cautious. It was trained to be. There is a difference, and it matters for everything you build on top of it.

← All posts · Home · RSS