I Run 46 AI Agents in Production. Here's What Broke.
What Are AI Agents, and Why 46?
An AI agent is a program that can make decisions and take actions on its own. Think of it like a very focused employee that never sleeps. You give it a goal, some rules, and access to tools. It figures out the rest.
I didn't plan to build 46 of them. I had problems. Writing LinkedIn posts took 3 hours a week. Researching prediction markets was inconsistent. I kept missing opportunities because I wasn't watching enough signals. So I built agents to handle each problem, one at a time.
They're organized into teams, like departments in a small company:
These aren't demos. They run every day, unsupervised. And that's where things get interesting, because unsupervised software can fail in ways you don't expect.
Here are the 5 failures that taught me the most.
The $500 Infinite Loop
What happened: My Content Team has an agent that writes LinkedIn post hooks (the opening line that makes people stop scrolling). It generates 10 options, ranks them by predicted engagement, then tries to improve the best ones. The problem? I forgot to tell it when to stop.
Why it matters: The agent kept finding tiny improvements and kept rewriting. By the time I woke up, it had generated 50,000 hook variations for a single post. Each AI call costs about a penny. 50,000 pennies is $500.
This is called an infinite loop: when a program keeps repeating a step forever because nobody told it when to quit. It's like asking someone to "keep improving this essay" without saying "stop after 3 drafts."
The fix: Every agent now has a spending limit, which is a hard cap on how many times it can run and how much money it can spend per day. Think of it like a prepaid debit card instead of an unlimited credit card.
// Every agent gets a spending limit before it can run
interface SpendingLimit {
maxRetries: number // "stop after 3 attempts"
dailyBudget: number // "you can spend $5 today, max"
callsSoFar: number // tracks how many times it's run
}
function canAgentContinue(limit: SpendingLimit): boolean {
if (limit.callsSoFar >= limit.maxRetries) return false
if (getTodaysSpend() >= limit.dailyBudget) return false
return true
}The hook generator is now capped at 3 improvement rounds and $5/day. When it hits either limit, it returns whatever it has.
When 6 Minutes Cost Me $200
Before: No freshness check
After: Mandatory freshness check
What happened: My Research Team found a trading opportunity at 11:55 PM. Score: 95 out of 100. Looked great. The agent queued it for execution at 12:01 AM to avoid rate limits. Six minutes later, the market had already moved. The agent executed anyway.
Why it matters: The data wasn't wrong. It was right 6 minutes ago. But in fast-moving markets, 6 minutes is a lifetime. This is called stale data: information that was accurate when collected but is outdated by the time you act on it. It's like driving with a GPS that updates every 10 minutes in a city where roads close every 5.
The fix: Every opportunity now has a freshness timer. If too much time passes between scoring and acting, the agent re-checks the data before proceeding.
// Before acting, check: is this data still fresh?
interface Opportunity {
score: number
scoredAt: Date
maxAge: number // 5 min for fast markets, 30 min for slow
}
function isFresh(opp: Opportunity): boolean {
const ageMs = Date.now() - opp.scoredAt.getTime()
return ageMs <= opp.maxAge
}
// The rule: never act on stale data
async function execute(opp: Opportunity) {
if (!isFresh(opp)) {
return rescore(opp) // re-check, don't blindly execute
}
return act(opp)
}98% Accurate and Still Losing Money
Before: Trusting raw accuracy
After: Weighted trust score
What happened: My Trading Team agent hit 98% accuracy on prediction market calls over 3 months. Impressive, right? Then the 2% wrong calls wiped out a huge chunk of the gains.
Why it matters: The agent was right almost every time on small, obvious bets. But on the big bets where it was most confident, it was often wrong. An agent that's right 98% of the time on $1 bets and wrong every time on $1,000 bets is a terrible system. This is the difference between accuracy (how often you're right) and calibration (whether your confidence matches reality).
Imagine a weather app that's right 98% of the time about sunny days but wrong every time it predicts rain. You'd still get soaked.
The fix: I replaced simple accuracy with a trust score that weighs how much money was on the line when the agent was right or wrong.
// Trust isn't just "how often are you right?"
// It's "are you right when it matters most?"
function computeTrustScore(agent: AgentRecord): number {
const rawAccuracy = agent.wins / agent.totalTrades
const bigBetAccuracy = agent.bigBetWins / agent.bigBetTotal
const calibration = agent.confidenceCalibration
// Weight big-bet accuracy as heavily as raw accuracy
return (rawAccuracy * 0.4)
+ (bigBetAccuracy * 0.4)
+ (calibration * 0.2)
}
// Agents with low trust get smaller bets, period
function getMaxBetSize(trustScore: number): number {
if (trustScore < 0.6) return 10 // $10 max
if (trustScore < 0.8) return 50 // $50 max
return 200 // full allocation
}Now, agents with a poor trust score get smaller bets regardless of how confident they feel about a specific opportunity. Trust is earned across all bets, not claimed on individual ones.
When All Your Agents Think Alike
What happened: In February 2026, about $400M in positions got liquidated in a single cascade. The cause? Roughly 15,000 autonomous agents across various platforms had similar strategies and similar exit triggers. When the first ones started selling, it pushed prices down, which triggered more agents to sell, which pushed prices down further. A domino effect.
My Trading Team agents weren't directly in that cascade. But I looked at my own system and saw the same pattern: multiple agents reading the same data sources, reaching the same conclusions, and taking the same positions. If something went wrong, they'd all react identically.
The fix: I added diversity rules. Before any agent takes a position, the system checks: "How many of our agents are already betting in this direction?" If too many agree, the new bet gets blocked.
// If everyone agrees, that's not conviction. That's a blind spot.
function canTakePosition(
newBet: Trade,
existingBets: Trade[]
): boolean {
// Find bets that point in the same direction
const similarBets = existingBets.filter(
bet => isSimilar(bet, newBet)
)
// If >40% of our money is already going this way, block it
const totalExposure = sum(similarBets.map(b => b.size))
return totalExposure < MAX_DIRECTIONAL_EXPOSURE
}The Failures Nobody Notices
Problem 1: Ghost agents
Problem 2: Slow bleed
Ghost agents are the scariest failure mode. The agent runs. It reports "success." But it didn't actually do anything useful. My Content Team had a comment formatter that hit a bug with special characters. Instead of crashing (which I'd have noticed), it quietly returned empty text. The rest of the pipeline kept going and published blank comments for 2 weeks.
The fix for ghost agents: Every agent now files an execution receipt, like a delivery confirmation. If the receipt says "success" but the actual output is empty, that's a contradiction, and the system flags it immediately.
// Every agent must prove it actually did something
interface ExecutionReceipt {
agentId: string
status: "success" | "failure" | "timeout"
outputHash: string | null // fingerprint of what was produced
}
// The catch: "success" + no output = something's wrong
function validateReceipt(receipt: ExecutionReceipt): boolean {
if (receipt.status === "success" && !receipt.outputHash) {
flagForReview(receipt) // ghost agent detected
return false
}
return true
}Slow bleed is the other invisible problem. No single agent is expensive. But 46 agents each spending a few dollars a day adds up fast. The fix is the same spending limit system from Failure #1, but applied at the team level too, not just individual agents.
// Individual limits aren't enough. Teams need budgets too.
interface TeamBudget {
teamName: string
dailyLimit: number // the whole team's budget
perAgentLimit: number // no single agent dominates
expiresAt: Date // forces regular review
}
// Example: Content Team gets $15/day across 5 agents
// Each agent maxes out at $5/day
// Budget expires monthly — forces me to review costsWhat Actually Works
After all these failures, three patterns survived and I'd use them from day one on any new project.
Notice what these have in common. They're all boring. Spending limits are just prepaid budgets. Execution receipts are just delivery confirmations. Trust scores are just track records. None of this is cutting-edge AI research. It's basic risk management applied to software.
1. Give every agent an ID
You can't track what you can't identify.
2. Set spending limits
Per-agent AND per-team daily budgets. No exceptions.
3. Require execution receipts
Every run proves it did real work.
4. Track trust over time
Weight results by how much was at stake.
5. Enforce diversity
If most agents agree, block new bets in the same direction.
If you're building agents, start here. Not with the fancy stuff. With the guardrails.
The Takeaway
I've spent real money learning these lessons. $500 on a feedback loop. $200 on stale data. Losses from a poorly calibrated trust system. The patterns that keep my 46 agents running aren't clever. They're borrowed from decades of financial risk management: budget limits, delivery confirmations, track records, and diversification.
The agent space is growing fast. About 30% of Polymarket trades are now agent-driven. Over 550 agent projects exist with a combined $4.34B market cap. Most of them don't have these guardrails yet.
Build the guardrails before you need them. Don't wait for a $500 bill to make the point.
FAQ
Agents in Production
Episode 1 of 8
Enjoyed this post?
Get more like it in your inbox every Tuesday.
