Back to Blog
AI & LLMs in Finance: Beyond the Hype

AI & LLMs in Finance: Beyond the Hype

Why Most AI Projects modeling Financial data Fail — and What Actually Works

Lessons from the Past

January 15, 2015. The Swiss National Bank removed the EUR/CHF floor. In minutes, the Swiss Franc strengthened 30%. Many brokers failed as they couldn't repay borrowed funds. It was an event that many in markets should have prepared for but didn't.

Another event was Brexit. The market had priced in some of the risks, but sterling was on the ascent just before the results as many expected or wished it wasn't going to happen. The finance world is full of such events.

The core lesson: just because the entire market believes something is true doesn't make it so. If AI companies are going to take a real crack at modeling this world, their models have to at least know where the market has been and how it behaves during catastrophic events, objectively. And that is exactly where most AI will fail.

The Uncomfortable Truth About AI and Data in Finance

Currently, there is no dearth of AI builders and Claude Code Champions who are trying to find the next holy grail. One big elephant in the room is "what's your data quality", data providers championing MCP servers are not talking about data quality, but features they have launched and ease of use they provide. You will not see them talk about their data quality, why because they don't aggregate it control it or have been long enough to know that it's key to a long-term sustainable business.

The pattern is depressingly predictable in the AI space too. A team gets excited about transformer models or reinforcement learning. They download two years of data from a free API. They build something sophisticated. But it's not something useful because data granularity and accuracy are key, and where it's likely to break is on days where even traders can't agree on where the high and low were for a given instrument.

Starting cheap is the most expensive decision you'll make.

The Failure Happens at the Seams in the Fat Tails

Now back to our earlier examples: we know what happened, or do we? Traders still can't agree on the lows and highs of both those events, but we were there. We lived it. We made sure we recorded it to the highest accuracy in every minute (since 1990) and every millisecond (since August 2016). We are one of the very few data providers in continuous business since the 1990s.

When you train a financial model on insufficient or inaccurate data, you get precisely the wrong predictions. The model doesn't know it's wrong. It can't. It thinks volatility is 5%, but it has forgotten bid and ask and only looked at the mid, and probably the wrong mid.

That's like designing a ship's hull by only calculating the average wave height of a calm sea. You'll get an answer. The answer will be: "A three-foot hull is completely safe." Technically true for 99% of the voyage. Catastrophically misleading when that one rogue wave inevitably hits.

Lessons from the Present

Now that we know what the significance of historical data is, let's look at how that relates in a live environment.

Scenario One: Oil News Break.

An OPEC production cut announcement hits the wire at 10:14 AM. Within 3 seconds, WTI Crude has moved. Within 8 seconds, USD/CAD has repriced. Within 15 seconds, the trade opportunity on CAD-denominated equities has largely been absorbed.

A system relying on REST API polling at 1-second intervals is already 3 to 5 ticks behind before it even begins processing. Furthermore, a system without historical context doesn't know whether this move is at the low end of OPEC surprise ranges — a classic "buy the news" setup — or at the high end, where mean reversion tends to follow within 20 minutes.

Without that granularity? Your model sees a number jump. It has no framework for what happens next because it's never observed the shape of these events. Just the before and after.

Scenario Two: Asian session volatility.

It's 3 AM London time. USD/JPY is moving sharply on a Bank of Japan intervention rumour. Your model — trained on minute-close data spanning multiple years — has seen this pattern before. It recognises the velocity signature. It knows that BoJ intervention rumours in this vol regime produce a specific distribution of outcomes, and it knows the median reversal window.

A model trained on minute close sees nothing until the London open. By then, the opportunity — or the risk — has resolved.

This isn't about having fancier models. Both scenarios use the same architecture. The difference is entirely in what the model has seen. The depth of its experience. The granularity of its memory.

Low latency plus credible historical data is achievable. It doesn't require the budget of a bank. What it requires is the right data infrastructure, built by people who understand what markets actually need — not what looks good in a pitch deck.

What granularity and what history is essential to train your model.

There's a reason experienced quant teams insist on multi-year datasets, and it's not conservatism. It's statistics.

Financial markets operate in regimes. Low volatility. High volatility. Trending. Mean-reverting. Risk-on. Risk-off. Each regime has its own statistical properties — its own distributions, correlations, and response functions. A model trained within a single regime doesn't generalise. It memorises.

Two years of data might capture one regime, maybe one and a half. That's not a training set. That's an anecdote.

Three to six years of granular data — minute-close at minimum, tick-level where possible — gives you something qualitatively different. You get:

Multiple volatility regimes. The post-2017 low-vol crawl. The 2020 COVID shock. The 2022 rate-hiking cycle. The 2023 banking stress. Each of these produced fundamentally different market dynamics. A model that's seen all of them develops what you might call statistical intuition — the ability to recognise which regime it's operating in and adjust accordingly.

Seasonal variation. Currency markets have real seasonal patterns. Quarter-end rebalancing flows. Japanese fiscal year dynamics. Summer liquidity thinning. December position squaring. These patterns are real, measurable, and exploitable — but only if your model has seen enough cycles to distinguish signal from noise. One year of seasonal data is one observation. That's not a pattern. That's a data point.

Structural breaks. The SNB floor removal. Brexit. The COVID liquidity crisis. The 2022 gilt market blowup. These events aren't anomalies to be filtered out — they're the most important observations in your dataset. They tell your model what happens when assumptions break. A model that's never seen a structural break will break structurally the first time it encounters one.

This is the part that teams consistently underestimate. They look at the volume of data in rows and think they have enough. But financial data isn't like natural language, where a billion tokens gives you broad coverage. Financial data is dominated by regime-specific behaviour. You need enough regimes, not just enough rows.

What Should You Look Out For When Backtesting Your AI Model on Market Data

You should look out for these 3 basic tenets that have always held us in good stead:

Start with the data, not the model. Everybody has roughly the same model. Gemini, Claude, and other LLMs are launching features faster than the startups and destroying whatever is built, so what's your moat? In our view, the moat is the data your model trains on and the patterns you recognise that big players can't provide. Similar to how Amazon can sell cards, but boutique firms beat them every time because they have design recognition.

Test for statistical significance, not backtest performance. A model that produces a Sharpe ratio of 3 on a two-year backtest is not a good model. It's an overfit model that hasn't been caught yet. Run your signals through proper statistical significance tests. Calculate p-values. Adjust for multiple comparisons. If you're testing a hundred features and one of them works at p < 0.05, you've found noise, not signal. Experienced teams know this. Teams new to quant finance learn it the expensive way.

Respect regime shifts. Your model should know what it doesn't know. Build in regime detection. Train on multi-regime data. And most importantly, build kill switches for when the model encounters conditions that fall outside its training distribution. The models that survive are the ones that know when to say "I don't have a view" — because their operators built that humility into the system.


What "Future Ready" Actually Means

The financial industry is going to adopt AI. That's not a prediction — it's already happening. The question isn't whether, it's how. And the "how" will separate the teams that generate real insight from the teams that generate expensive noise.

Future ready doesn't mean having the latest model architecture. Model architectures change every six months. Today's cutting-edge transformer is next year's baseline.

The teams that will win the next decade of quantitative finance aren't the ones with the most sophisticated models. They're the ones with the most comprehensive, cleanest, most historically deep data foundations. Because when the next SNB moment comes — and it will — your models will either recognise it or be blindsided by it.


About TraderMade

TraderMade provides institutional-grade historical and real-time market data across FX, crypto, equities and stock index CFDs, gold, and silver — from a single provider, with a single data lineage.

What makes TraderMade the data foundation for AI in finance:

  • 50M+ tick quotes per FX pair since August 2016 — the density needed for real ML training
  • Minute-level OHLC from 2013 — 13 years of continuous data across FX, CFDs, gold, silver, and crypto
  • Curated historical data from 1990 — decades of market cycles, regime changes, and tail events in one dataset
  • Bid/ask history from 2017 — 9 years of spread and microstructure data
  • One company, one lineage — no stitched sources, no unexplained gaps
  • Sub-second WebSocket feeds — tick-by-tick delivery for real-time analysis, not REST polling
  • REST, WebSocket, and FIX — your choice of protocol, your architecture

This is the data foundation that turns AI from a pilot into a competitive edge. When your model has trained on decades of verified market history — it creates a moat that's difficult to beat.

Related Blogs