FRESH

Hacker News

Home

Exploring the limits of large language models as quant traders

132 points by rzk

by kqr

5 subcomments

Super interesting! You can click the "live" link in the header to see how they performed over time. The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital.
Although I lack the maths to determine it numerically (depends on volatility etc.), it looks to me as though all six are overbetting and would be ruined in the long run. It would have been interesting to compare against a constant fraction portfolio that maintains 1/6 in each asset, as closely as possible while optimising for fees. (Or even better, Cover's universal portfolio, seeded with joint returns from the recent past.)
I couldn't resist starting to look into it. With no costs and no leverage, the hourly rebalanced portfolio just barely outperforms 4/6 coins in the period: https://i.xkqr.org/cfportfolio-vs-6.png. I suspect costs would eat up many of the benefits of rebalancing at this timescale.
This is not too surprising, given the similiarity of coin returns. The mean pairwise correlation is 0.8, the lowest is 0.68. Not particularly good for diversification returns. https://i.xkqr.org/coinscatter.png
> difficulty executing against self-authored plans as state evolves
This is indeed also what I've found trying to make LLMs play text adventures. Even when given a fair bit of help in the prompt, they lose track of the overall goal and find some niche corner to explore very patiently, but ultimately fruitlessly.

by lordnacho

3 subcomments

I was chatting to a friend in the space. This guy is both experienced in trading and LLMs, and has gone all-in on using LLMs to get his day-to-day coding done. Now he's working on the model to end all models, which is a fairly ambitious way to put it, but it throws off some interesting conversations.
You need domain knowledge to get this to work. Things like "we fed the model the market data" are actually non-obvious. There might be more than one way to pre-process the data, and what the model sees will greatly affect what actions it comes up with. You also have to think about corner cases, eg when AlphaZero was applied to StarCraft, they had to give it some restrictions on the action rate, that kind of thing. Otherwise the model gets stuck in an imaginary money fountain.
But yeah, the AI thing hasn't passed by the quant trading community. A lot of things going on with AI trading teams being hired in various shops.

by callamdelaney

5 subcomments

The limits of LLM's for systematic trading were and are extremely obvious to anybody with a basic understanding of either field. You may as well be flipping a coin.

by thisisit

2 subcomments

LLMs are very good at NLP/classification tasks and weak at calculations and numbers. So, I doubt feeding it numerical data is a good idea.
And if you feeding or harnessing as the blog post puts it in a way that where it reasons things like:
> RSI 7-period: 62.5 (neutral-bullish)
Then it is no better than normal automated trading where the program logic is something along the lines if RSI > 80 then exit. And looking at the reasoning trace that is what the model is doing.
> BTC breaking above consolidation zone with strong momentum. RSI at 62.5 shows room to run, MACD positive at 116.5, price well above EMA20. 4H timeframe showing recovery from oversold (RSI 45.4). Targeting retest of $110k-111k zone. Stop below $106,361 protects against false breakout.
My understanding is that technical trading using EMA/timeframes/RSI/MACD etc is big in crypto community. But to automate it you can simply write python code.
I don't know if this is a good use of LLMs. Seems like an overkill. Better use case might have been to see if it can read sentiments from Twitter or something.

by pinkmuffinere

1 subcomments

> Ordering bias. Early prompts listed market data newest→oldest. Even with explicit notes, several models still read it as oldest → newest, inferring the wrong state. Switching to oldest → newest fixed the immediate error and suggests a formatting prior in current LLMs.
This kind of error just feels comical to me, and really makes it hard for me to believe that AGI is anywhere near. LLM's struggle to understand the order of datasets, when explicitly told. This is like showing a coin trick to a child, except perhaps even simpler.

by ezekiel68

2 subcomments

You don't actually need nanosecond latency to trade effectively in futures markets but it does help to be able to evaluate and make decisions in the single-digit milliseconds range. Almost no generative model is able to perform inference at this latency threshold.
A threshold in the single-digit milliseconds range allows the rapid detection of price reversals (signaling the need to exit a position with least loss) in even the most liquid of real futures contracts (not counting rare "flash crash" events).

by XenophileJKO

0 subcomment

I don't think betting on crypto is really playing to the strengths of the models. I think giving news feeds and setting it on some section of the S&P 500 would be a better evaluation.

by Havoc

2 subcomments

Are language models really the best choice for this?
Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as
> We also found that the models were highly sensitive to seemingly trivial prompt changes

by binsquare

2 subcomments

Today it's clear that there are limitations to LLM's.
But I also see this incredible growth curve to LLM's improvement. 2 years ago, I wouldn't expect llm's to one shot a web application or help me debug obscure bugs and 2 years later I've been proven wrong.
I completely believe that trading is going to be saturated with ai traders in the future. And being able to predict and detect ai trading patterns is going to be an important leverage for human traders if they'll still exist

by aswegs8

6 subcomments

Given that LLMs can't even finish Pokemon Red, how would you expect they are able to trade futures?

by DivingForGold

1 subcomments

. . . "The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital."
Proves that LLM's are nowhere near close to AGI.

by vita7777777

2 subcomments

This is very thoughtful and interesting. It's worth noting that this is just a start and in future iterations they're planning to give the LLMs much more to work with (e.g. news feeds). It's somewhat predictable that LLMs did poorly with quantitative data only (prices) but I'm very curious to see how they perform once they can read the news and Twitter sentiment.

by bluecalm

1 subcomments

>>LLMs are achieving technical mastery in problem-solving domains on the order of Chess and Go, solving algorithmic puzzles and math proofs competitively in contests such as the ICPC and IMO.
I don't think LLMs are anywhere close to "mastery" in chess or go. Maybe a nitpick but the point is that a NN created to be good at trading is likely to outperform LLMs at this task the same way way NNs created specifically to be good at board games vastly outperform LLMs at those games.

0 subcomment

by spaceman_2020

0 subcomment

Hyperliquid now has select tokenized equities as well. Would love to see how these models perform when trading equities
I've been following these for a while and many of the trades taken by DeepSeek and Qwen were really solid

by chronic740202

0 subcomment

Even ChatGPT knows why LLMs for quant trading would never work.

by devin

0 subcomment

When I saw this I rolled my eyes. It is well-understood that purpose-built models perform better than general models on tasks like this, and yet it would seem one of the main purposes of running this experiment according to the website is to figure out if general models are enough.
In addition, I cannot imagine how the selection of securities was chosen. Is XRP seriously part of the proposed asset mix here?
It's hard not to look at this and view it as a marketing stunt. Nothing about the results are surprising and the setup does not seem to make any sense to me to begin with.

by IAmGraydon

1 subcomments

Crazy how people continue to treat LLMs like they’re anything more than a record of past human knowledge and are then surprised when they can’t predict the future.

by p1dda

1 subcomments

LLM's can do language but not much else, not poker, not trading and definitely no intelligence

by infecto

0 subcomment

This might be the dumbest thing I have ever seen but I am happy to be corrected and told why it’s not.
I use LLMs a lot and I work in finance and I don’t see how a LLM benefits in this space.
Also it looks like none of their data uses any kind of benchmarking. It’s purely a which model did better which I don’t think tells you much.

by reedf1

0 subcomment

you simply will lose trading directly with an llm. mapping the dislocation by estimating the percentage of llm trading bots is useful though.

by jwpapi

1 subcomments

Isn’t that what Renaissance Technology does?

0 subcomment

by lvl155

1 subcomments

At the end of the day it all comes down to input data. There are a lot of things you can do to collect proprietary data to give you an edge.

by Edvinyo

0 subcomment

Cool experiment, but it’s nothing more than a random walk.