It started with a football question.
I asked ChatGPT about the training camp situation at Real Madrid, which is all over my social media feeds. The response came back detailed, confident, and most and foremost completely wrong. It told me Xabi Alonso was managing internal tensions in the squad — navigating player dynamics, shaping the dressing room. The only problem: Alonso had left the club months earlier. When I pushed back, the model corrected itself immediately. Right answer the second time. But only because I knew enough to challenge it.
Most people don't challenge it. They read the first answer, trust the confident framing, and move on.
That moment stuck with me. Not because it was catastrophic, it wasn't, but because it was a perfect, low-stakes illustration of something that matters a great deal when the stakes are higher.
I use ChatGPT as a sparring partner. Not a search engine, not a coding autocomplete, a genuine second opinion when I'm working through an idea. Most of the time, it earns that role. Then, sometimes, it reminds me why I shouldn't fully trust it.
Two things have been bothering me for a while. I've been sitting on this because I didn't want to write a generic "AI has limits" piece, those exist in abundance and mostly say nothing. But this cuts deeper than a product complaint. It touches on something that actually matters: whether tools like ChatGPT are narrowing the gap between institutional intelligence and everyday people, or quietly widening it in ways we don't immediately notice.
Why this matters
There's a version of AI that genuinely democratises access to quality thinking. A private investor using ChatGPT to stress-test a thesis could, theoretically, operate with a level of analytical rigour that was previously reserved for people with Bloomberg terminals and research teams. A small founder could pressure-test a go-to-market strategy the way a consultant would. That potential is real. I've seen it work.
But two recurring failure modes are eating into that promise and they're not random bugs. They're structural, and they mirror some of the most well-documented cognitive errors in human decision-making. Which makes them more dangerous, not less, because they feel like intelligence.
Why ChatGPT gets recent events wrong
The training data issue is the more obvious of the two, but it keeps surprising people. ChatGPT, even the paid versions, works from a training cutoff. After that date, it doesn't know what happened. It wasn't there.
The problem isn't the cutoff itself. That's a technical constraint and it's disclosed. The problem is that the model doesn't always flag its own blind spot. It answers with the same confident tone whether it's recalling something it was trained on thoroughly or reconstructing something it barely has data on.
Which brings me back to Real Madrid. The model didn't say "I'm not sure, my information might be outdated." It just told me about Xabi Alonso managing the squad like it was current fact. The tone was the same as it would be for anything else. No signal that it was working from stale data. No asterisk.
That's the structural problem. In a world where people are increasingly using AI to verify things they see on social media, a model that delivers guesses with the same confidence as facts is a meaningful failure.
ChatGPT recency bias in investing: why this is genuinely risky
The second issue is subtler and, to me, more concerning: recency bias.
Recency bias is a well-documented human cognitive pattern where we overweight recent events and assume current trends will continue. It's why investors pile into assets after a run-up and exit after a drawdown — the opposite of what the math suggests. Good analysts are trained to fight it. It's one of the reasons systematic, rules-based investing tends to outperform discretionary judgment over long horizons.
ChatGPT has this bias. And it has it in a specific, almost ironic way: because its training data is skewed toward what was written about recently, the model reflects whatever narrative was dominating the news cycle at the time of training. If the last six months of its data were full of coverage about a rate hike cycle, it will anchor to that. If a particular sector was getting hyped in the financial press, the model will subtly overweight that framing.
Ask it for a view on a macro environment or a company and it often does one of two things: it either extrapolates from the most recent narrative it has data on, or it hedges so aggressively that the answer is useless. Neither is what a good analyst would do.
This matters for private investors in particular. (Nothing here is financial advice, just thinking out loud about the tool itself, and you should absolutely form your own view before acting on anything.) The gap between institutional and retail investors has always partly been about access to dispassionate, systematic analysis. If AI is going to help close that gap, it needs to be the thing that resists headline-driven framing, not the thing that encodes it at scale. Right now, in my experience, it's closer to the latter.
I know you can build systems around this. Custom instructions, structured prompts, injecting your own context and constraints. I do some of this. But I shouldn't have to spend two hours engineering guardrails just to get a second opinion that isn't contaminated by whatever was trending six months ago. That's not a sparring partner. That's a liability dressed up as one.
What most people get wrong about AI reliability
The common response to both of these issues is: "Just fact-check it." And yes, obviously. But that misses the point.
The value of a tool like ChatGPT isn't just that it gives you information. It's that it gives you information at a speed and scale that changes how you think and work. The moment you have to manually verify every output, you've lost a big part of that value. And the model's confident delivery actively works against your instinct to check, it doesn't sound uncertain, so you don't treat it as uncertain.
We engineered something close to a superpower and then quietly installed the same cognitive bugs that make humans terrible at making decisions under pressure.
The irony is that these aren't hard problems in principle. Recency bias could be countered by training architectures that deliberately weight longer time horizons. Outdated information could be flagged more aggressively when the model's confidence should be low. Real-time retrieval exists and is improving. I've seen better behaviour in some of the newer browsing-enabled configurations, and I'll keep testing as these evolve. But the default experience still fails in the ways I've described often for the users who need it most: people who aren't prompting systematically and aren't in a position to know when the model is working from stale data.
That's the gap worth worrying about.
The bigger race: whoever translates human will into product wins
Here's my actual long-term belief about where this is all going.
The LLM that most perfectly translates human will into product will win the race. Not the one with the most parameters. Not the one with the flashiest benchmark. The one that takes what you mean and turns it into what you need — at the speed of thought, without friction.
This is why vibe coding and Claude Code matter so much. Not just as features, but as a signal. Claude Code doesn't ask you to adapt to its syntax. It listens to what you're trying to build and builds it. It translates intent into product in a way that, even a year ago, felt like science fiction. That's a genuinely revolutionary step — and it's brought Anthropic much, much closer to OpenAI than most people expected. I think it will be remembered as the moment the race changed shape. Not "who has the best model" but "who best understands what the human actually wants."
ChatGPT's failure modes I've described, the stale confidence, the recency bias, are, at their core, failures of translation. The model produces something that sounds like what you wanted, but isn't actually responsive to what you needed. It's fluent without being useful. And in a world where the bar is shifting toward perfect intent-to-output translation, fluency alone won't be enough.
What to actually do
- Treat ChatGPT's confident tone as neutral, not as a signal of accuracy. If the topic is time-sensitive (markets, current events, recent signings), assume the information could be six to twelve months behind and verify before acting on it.
- Ask the model to declare its uncertainty. Something like: "Before you answer, tell me how confident you are and whether this could be affected by outdated training data." It won't always catch it, but it changes the dynamic.
- For anything financial, use it for frameworks, not for facts. It's good at helping you think through a thesis structure, stress-test assumptions, or identify what you might be missing. It's not reliable for current valuations, recent earnings, or macro data. (And again that's my approach, not a recommendation for yours.)
- When something sounds confidently wrong, push back immediately. The model updates. But you have to be the one who knows enough to push. Which means you still need independent knowledge, AI doesn't replace that, it extends it.
- Build in a "what year does this assume" check. Especially for anything involving companies, geopolitics, sports, or regulatory environments. If the model's answer only makes sense in a world from a year ago, it probably is.
- Watch this space. Real-time retrieval and better uncertainty signalling are genuinely improving — I plan to write more about how the tooling is evolving as I test it. But the default product today still has these gaps.
The potential is real. So is the gap between the potential and what ships by default. Knowing the difference is most of the work.
And just to close the loop on the Real Madrid rumours: I went back and asked a more specific follow-up about the rumoured squad inner conflicts. The model confidently weighed in on Valverde's leadership role and his conflict in training with Tchouaméni and cited some local newspapers.
I mean... Valverde. Come on. I know you're the captain and everything, but seriously challenging Tchouaméni? That guy is a beast, but much respect for trying...
Top comments (0)