You are paying a rather remarkable sum of money to let a very confident guesser make your decisions. Here is why
that should concern you.
The sales pitch is hard to resist: “Our AI-powered platform will transform your operations, cut costs by 40%, and
unlock insights you never knew existed.” It sounds wonderful. It sounds like the future. And for most businesses,
it sits somewhere between misleading and actively dangerous. The AI risks for business are real — and most
organisations are not thinking about them clearly.
I am not here to tell you that AI is useless. Clearly it is not. But the current wave of enthusiasm around large
language models has businesses making serious operational decisions without reasonable understanding of what these
systems actually are.
Contents
You are building on quicksand
Large language models are probabilistic. They do not know things and they do not reason through
problems — they produce statistically likely sequences of text based on patterns absorbed during training. This is
not a technical footnote; this is the one and only thing that they do.
When you ask an LLM to analyse your sales data and recommend a pricing strategy, it is not performing analysis. It is
producing output that pattern-matches to what “a pricing strategy recommendation” tends to look like in its training
data.
One might fairly object that humans do something similar — we too are taught, and we too fall back on cultural
defaults. True enough. But a human analyst has personal preferences, opinions shaped by experience, knowledge
absorbed through dozens of invisible learning channels: not only documents read, but conversations had, expressions
noticed while presenting last quarter’s numbers, the uneasy feeling from a client meeting that went sideways. These
bits of feedback get incorporated unconsciously.
An LLM has none of that. These LLM limitations are not temporary growing pains — they are structural.
Its output converges on the generic — the statistical average of everything it was trained
on. For low-stakes tasks such as drafting an email or brainstorming, this is perfectly fine. But the moment you
route business decisions through such a system, you have introduced a source of randomness that you cannot fully
control or predict — and the output you receive will be, almost by definition, the most average possible version of
whatever you asked for.
Yesterday’s data, tomorrow’s decisions
An LLM’s knowledge is frozen in time — its training data comes from the past, while the business decisions you are
asking it to inform are about the future. In ordinary times, this is manageable. The past and the future are
related, and a system that has absorbed yesterday’s patterns can say something useful about tomorrow.
The trouble is that this relationship weakens precisely when you need it most — when the business landscape becomes
unpredictable.
Consider the manufacturing business whose next aluminium shipment is due in six months. For the past decade, that
supply chain had been boringly stable; the factors influencing price were a more or less fixed set of variables. A
model trained on that decade has absorbed those patterns thoroughly, and in calm weather its forecast would be
perfectly serviceable. But the probability weight that ought to sit on “geopolitical disruption” was, until quite
recently, a marginal footnote — and for many models, not a footnote at all.
This is the compounding problem of probabilistic systems trained on historical data: they are most confident
precisely when conditions resemble the past, and they are least able to warn you when conditions do not. The world
does not send advance notice when the relationship between training and reality is about to break — and it is
precisely in those moments that the decisions you are making matter most.
The black box you cannot debug
Traditional software aims to be deterministic. When something goes wrong, you can trace the logic, find the flaw, and
fix it. The problem is knowable.
LLMs do not work this way. The same prompt with the same data can give you a correct answer today and a subtly wrong
one tomorrow. When it is wrong, you typically cannot explain why. The “reasoning” is distributed across
billions of parameters that no human can inspect. You debug by tweaking prompts, adding examples, and hoping it
behaves differently next time. “Hope” is not an engineering methodology.
For industries that care about auditability and compliance — finance, healthcare, legal, energy, government — this is
not merely inconvenient. It is a liability. “The computer said so” is not an answer a regulator will accept.
Hallucinations are not a bug — they are the product
A common misconception is that hallucinations — cases where the model confidently generates fabricated information —
are temporary, a flaw that the next version will resolve. They are not. Hallucination is inherent to how these
models produce text. Sometimes the most probable-sounding output is simply wrong.
The consequences are already documented. In February 2024, the British Columbia Civil Resolution Tribunal ruled
against Air Canada after its chatbot incorrectly told a passenger he could apply retroactively for a bereavement
discount. When the passenger tried, Air Canada refused — and then argued in court that its chatbot was “a separate
legal entity responsible for its own actions.” The tribunal disagreed.1Moffatt v. Air Canada, 2024
BCCRT 149. The
tribunal ruled that companies are liable for AI chatbot misrepresentations. Covered by the American Bar
Association, February 2024. In 2023, attorneys in Mata v. Avianca submitted briefs containing
at least six entirely fictitious case citations generated by ChatGPT, complete with fabricated quotes and judicial
opinions. They were sanctioned and fined $5,000.2Mata v. Avianca, Inc., 22-cv-1461 (S.D.N.Y. 2023). Judge P. Kevin Castel
sanctioned attorneys Steven Schwartz and Peter LoDuca for submitting fabricated case citations generated by
ChatGPT.
And here is where it becomes genuinely concerning. Because the output looks right, it tends to receive
less scrutiny. A human reviewer naturally spends less effort picking apart text that reads like competent
professional work. When a hallucination slips into a business decision, the damage is rarely immediately visible. A
subtly wrong legal interpretation, a marginally off financial projection — these things do not announce themselves.
The errors compound until somebody notices, and by then you have been operating on bad information for weeks.
Your data is not staying where you think it is
Unless you are on an enterprise-tier plan, the text you send to an LLM may be used to train future versions of the
model. Every prompt, every pasted document, every customer name or contract detail is, in principle, heading into a
training dataset you have no control over.
Cyberhaven’s 2026 AI Adoption & Risk Report found that 39.7% of all enterprise AI interactions involve sensitive
data — including prompts, copy-paste actions, and file uploads.3Cyberhaven Labs, “2026 AI Adoption & Risk
Report”, February 11, 2026. The study tracked real-time data lineage across millions of enterprise
employees. Cisco’s 2024 Data Privacy Benchmark Study found that 48% of respondents admitted entering
non-public company information into GenAI tools.4Cisco, “2024
Data Privacy Benchmark Study”, January 25, 2024. Survey of 2,600 security professionals across 12
countries. A GDPR fine that is a rounding error for a multinational can end a 20-person company. Even if you
are careful, you are one careless paste away from a problem you cannot undo.
The talent trap
Expertise atrophies the moment you hand the work to an AI — and the consequences run deeper than your current team
becoming rusty.
In August 2025, The Lancet Gastroenterology & Hepatology published a study examining 1,443 colonoscopies
performed by experienced endoscopists. After routine AI assistance, adenoma detection rates in non-AI-assisted
procedures dropped from 28.4% to 22.4% — a 20% relative decline.5Budzyń et al., “Endoscopist deskilling risk after exposure to artificial
intelligence in colonoscopy”, The Lancet Gastroenterology & Hepatology, August 2025. Study of
1,443 colonoscopies across four Polish centres. In May 2019, a cyberattack took down Wolters Kluwer’s CCH tax
software for nearly a week. CNBC reported the attack “left many in the accounting world unable to work.” A sales
representative emailed clients: “Many of you are awaiting guidance on what you should be doing with your staff today
and unfortunately I do not have a good answer for this.”6CNBC, “Wolters
Kluwer, one of the biggest accounting software companies, hit by malware attack”, May 8, 2019. CCH products
serve 100% of the top 100 U.S. accounting firms.
Now combine this with the opacity problem. With traditional software, the expertise needed to oversee the
system is the same expertise needed to do the work manually. With LLMs, overseeing the AI requires a
different kind of specialist: people who understand both your domain and the model’s behaviour. These
people are rare. Lightcast’s 2025 analysis of 1.3 billion job postings found that positions requiring AI skills
offer 28% higher salaries — nearly $18,000 more per year.7Lightcast, “Beyond
the Buzz: Developing the AI Skills Employers Actually Need”, July 23, 2025. Analysis of 1.3 billion job
postings from 2024. The larger companies are already outbidding everyone else for them.
This is the trap. You deskill your existing team. You cannot hire the specialists needed to properly oversee the AI.
And if you try to reverse course, rebuilding lost expertise takes years.
But we can fix it
At this point, you may reasonably push back. The pro-AI camp has real counter-arguments, and the strongest are worth
acknowledging.
They will tell you that foundation models have commoditised, and the real value is what you build around them:
retrieval-augmented generation, fine-tuning on proprietary data, workflow redesign that properly embeds the model in
your operations. Fair point.
They will tell you that API costs per task have collapsed to pennies, and for high-volume standardised work —
customer service, drafting, summarisation — the unit economics genuinely work. Also fair.
They will tell you that well-architected RAG systems can substantially reduce hallucination rates by grounding
outputs in verifiable source documents. The evidence supports this: a 2025 study in JMIR Cancer found RAG
reduced hallucinations from roughly 40% to near zero for cancer-related queries.8“Reducing Hallucinations and Trade-Offs in Responses in
Generative AI Chatbots for Cancer Information”, JMIR Cancer, 2025. Study tested 62 cancer-related
questions across six chatbot configurations. Partly true — and worth being careful about what “grounded”
actually means.
RAG does not fix hallucinations. It tethers the model to your existing corpus, so its output becomes
statistically more likely to resemble your own documents. That is useful — but the tether is made of probability,
not logic. You have narrowed the distribution of possible wrong answers; you have not eliminated wrong answers. The
grounding, if anything, makes the failure mode harder to spot.
But notice what each of these counter-arguments is actually saying. None of them disputes that a naked LLM is a bad
tool for most business decisions. What they are saying is this: if you build the right architecture around
it, govern it properly, invest in the data layer, maintain audit trails, train your people to oversee it, and
continuously measure its error rates — then it works.
Which is another way of saying that the burden of making AI safe and useful falls entirely on the organisation
adopting it. The vendor sells you transformation; you are the one who has to build the thing that actually delivers
it.
There is an irony here. The same vendors who market their products as “PhD-level intelligence” quietly agree that you
need extensive customisation and dedicated oversight to make the thing behave. These two stories cannot both be
true. If the system were genuinely operating at PhD level, it would not need your engineers to hand-feed it context,
guard its outputs, and keep its knowledge base in sync with the real world.
The $600 billion question
All of which would be perfectly reasonable if decisions about AI were being made in a calm, rational environment.
They are not.
In June 2024, Sequoia Capital’s David Cahn calculated that the gap between AI infrastructure investment and actual AI
revenue had grown from $125 billion to $600 billion in nine months.9David Cahn, “AI’s $600B Question”, Sequoia Capital, June
2024. Goldman Sachs published a report titled “Gen AI: Too Much Spend, Too Little Benefit?” noting that
roughly $1 trillion in projected AI capex “has little to show for it so far.”10Goldman Sachs, “Gen AI: Too
Much Spend, Too Little Benefit?” June 2024. MIT economist Daron Acemoglu, who won the 2024 Nobel
Prize, estimates AI will produce only a modest 0.5% productivity increase and roughly 0.9% GDP growth over the next
decade.11Daron Acemoglu, “The Simple Macroeconomics of
AI”, NBER Working Paper 32487, 2024.
At a Yale CEO Summit in June 2025, 40% of the 150+ top executives present said AI hype had led to overinvestment and
a correction was imminent.12Yale School of Management, “This Is How the AI Bubble
Bursts”, CEO Summit report, October 2025. Goldman Sachs CEO David Solomon expects “a lot of capital
deployed that doesn’t deliver returns.” Even Sam Altman has warned that “people will overinvest and lose money.”
Business owners read breathless headlines about productivity gains and fear being left behind. Boards ask “what is
our AI strategy?” in a tone that implies the only wrong answer is “we do not have one.” FOMO is a terrible basis for
a technology investment, and the louder the hype becomes, the more sceptical you should be about whether the
decision you are making is yours.
So what should you do?
None of the above means you should ignore AI. It means that every downside I have described might be worth accepting
— but only if you know why you are doing it.
Losing some expertise in a process that does not generate meaningful ROI? Probably fine. Accepting a measured
hallucination rate in a low-stakes content pipeline? Reasonable, provided you know the stakes. Using an LLM for
first-pass triage on something a human reviews anyway? Sensible. The problem is not using LLMs. The problem is using
them without understanding the tradeoffs — or worse, without even knowing there are tradeoffs to understand.
So start with the problem, not the technology. You might find that your needs are better addressed by an employee
training programme or another spreadsheet. Not as exciting as “AI-powered transformation” — but predictable,
auditable, and actually solving the problem in front of you.
If an LLM genuinely fits after honest evaluation, treat it like any other business risk. Build monitoring and human
oversight into the pipeline. Know what it costs you when the model gets it wrong, and decide consciously whether
that cost is acceptable.
And be deeply sceptical of anyone who tells you that AI is the solution before they have even begun to understand
your problem. The most expensive technology investment a business can make is the one that solves the wrong problem
with great confidence.