Quantitative Perspective on AI Assessment: Profit Expectation for Everyone is Less Than 1, How Far is Artificial Intelligence from Replacing Traders?

PANews

2025-11-24 06:43:22

Author: Frank, PANews

If you were given $10,000, which artificial intelligence would you choose to trust to manage it for you?

Previously, PANews conducted a review of the AI trading competition held by nof1.ai (related reading: Six Major AI “Traders” Ten-Day Showdown: A Public Class on Trends, Discipline, and Greed). However, during the competition of nof1.ai, the time-frame was specific to a certain market condition, and the final trading capabilities of various AI large models did not seem to be fully demonstrated under the specific trading periods. Additionally, there is an urgent need for answers regarding the actual predictive capabilities of AI models under different conditions. Moreover, recently, various AI companies have released their latest large models, and the ranking of model capabilities is also in a stage of re-evaluation.

To unveil this mystery, PANews has organized an “AI Trader Championship.” This event aims to understand the judgment and trading planning capabilities of AI large models under different scenarios. For example, which time frames they are better at analyzing market trends, and whether the success rate of AI predictions will improve when indicators are used as supplementary conditions.

We extended the timeline from 2017 to the present, randomly selecting 100 real market slices from Binance BTC historical data to construct three hell-level testing scenarios: “4-hour naked K”, “15-minute short line”, and “4-hour full indicators”. The six competitors represent the pinnacle of computing power in China and the US today: Gemini-3-pro, Doubao-1.6-vision, DeepSeek V3.2, Grok 4.1, GPT-5.1, and Qwen3-max.

This test collected 15-minute K-line data for the Binance BTC spot trading pair from August 2017 to the present, as well as 4-hour K-line data from 2021 to the present. For each period, 50 images with a time span of 100 K-lines were randomly generated. The 4-hour period is divided into two types: one with only K-line and trading volume images, and the other with K-line charts that include indicator information such as EMA, SMA, Bollinger Bands, MACD, and RSI. The 15-minute K-line charts are all naked K-charts (with trading volume). The specific price data values or indicator data values corresponding to the current K-line chart are also synchronized to the AI. All AI output results can be viewed here.

4-hour indicator chart

4-hour pure candlestick chart

During the testing process, the data information and commands obtained by each large model are completely identical. From another perspective, this also tests the multimodal capabilities of these large models (DeepSeek only has a text large model, so the data received is all information, with no images transmitted).

Gemini 3: The Naked K King Sealed by “Indicators”

Gemini 3 is currently the hottest AI large model. Based on media reviews and tests since its release on November 18, it can be considered the most capable AI multimodal model at present. However, in this trading prediction test, the results of Gemini 3 were not the best, landing in the mid-range. Among the three scenarios (4-hour naked K, 4-hour with indicators, 15-minute naked K), Gemini 3 performed best in the 4-hour naked K scenario, achieving a win rate of 39.58%. The second best was in the 15-minute naked K scenario at 34.04%. In the case with indicators (same time frame), the accuracy of the 4-hour cycle dropped to 31%, making it the worst among the three scenarios.

From this perspective, Gemini 3 seems to be better at pure candlestick pattern states, while adding indicators tends to cause interference. In the specific operational process, without indicators, Gemini 3 seems more willing to open positions; in pure candlestick situations, 95% of the market will choose to enter, whereas this ratio drops to 71% after adding indicators. It is worth mentioning that Gemini 3 is the only profitable model in the 4-hour pure candlestick scenario.

In the 15-minute scenario, Gemini 3 had the best overall profit situation, with a total position profit of 15.34%, but in the indicator-based scenario, it instead incurred a loss of 21.18%. However, this profit is also a form of short-term luck. Considering the data of each profit-loss ratio, the profit expectation of Gemini 3 (win rate * profit-loss ratio) is below 1, which means that in the long run, it is in a losing state.

DeepSeek V3.2: A “super short-term trading bot” as steady as an old dog.

DeepSeek is the model with the best overall performance among the six models, and it is also the most stable one. In three scenarios (4-hour naked K, 4-hour with indicators, 15-minute naked K), the win rates are 40%, 41.38%, and 42.86% respectively. From this perspective, DeepSeek's predictive ability is relatively stable across different time periods and whether indicators are used or not.

However, the final profitability of DeepSeek is not good, due to its low profit-loss ratio, with an average of only 1.25. This take-profit-oriented profit-loss ratio also reflects DeepSeek's lack of ability to let profits run during trading. Therefore, the overall expected profitability is almost around 0.5, and there is similarly a lack of profit potential in the long term. In addition, DeepSeek is also relatively conservative in terms of opening positions, with an overall opening ratio of only 58%.

Doubao ( Doubao ): The “All-Around MVP” of this competition.

In this test competition, the comprehensive result of Doubao1.6-vision is the best. In a scenario with a 4-hour indicator, Doubao1.6-vision achieved the highest win rate in the test, reaching 50%, with a final yield of 22.2%. At the same time, in the 15-minute short cycle, it also achieved an overall yield level of 8.2%. It is the only model that can consistently generate profits in two different dimensions (short-term and 4-hour indicators).

Moreover, the results of Doubao1.6-vision are not achieved under a relatively conservative style, but rather with an average opening position ratio of over 92%. In other words, Doubao1.6-vision chooses to open positions in the vast majority of scenarios. However, relatively speaking, the capability of Doubao1.6-vision also heavily relies on indicator signals; the total profit differs by 38% depending on whether indicators are used or not. Additionally, based on the data of the profit and loss ratio, Doubao1.6-vision has a relatively high average loss ratio during the two periods of positive returns, which is also a reason for its overall outstanding performance.

Grok 4.1: “Radical Gambler” from xAI

The overall style of Grok 4.1 is bold but relies on quarterly indicators, while being willing to chase larger profits. In three scenarios, Grok 4.1 only achieved a win rate of 34.69% within a 4-hour week with indicators, while the win rates in the other two scenarios were extremely low. In the case of pure K-line on a 4-hour basis, the win rate is only 14.58%, and for a 15-minute cycle, it is 26.53%. However, its average opening ratio reaches as high as 98%, showing a willingness to open positions in almost all K-line scenarios. From this perspective, Grok 4.1's style resembles that of an uncontrollable gambler.

However, the profit-loss ratio of Grok 4.1 is often relatively high, with an average of 2, which is the highest among all models. But overall, it is not a wise choice to entrust your funds to Grok 4.1.

GPT 5.1: The Extremely Cautious “Dead Short” Pessimists

The trading style of GPT 5.1 is completely opposite to that of Grok 4.1. GPT 5.1 is extremely cautious, often opting to wait in most cases. Out of 150 tests, it only opened positions 52 times, resulting in an average opening ratio of just 0.34%.

However, even with such caution, GPT 5.1 did not achieve a better win rate performance. It only obtained a 35% win rate even in the best scenarios. Moreover, compared to the 4-hour and 15-minute later stages, GPT 5.1 is clearly not good at opening positions over long periods; even with the addition of technical indicators, the win rate for 4 hours is only 27%. In the 15-minute period, it only achieved a positive return feedback with a relatively high profit-loss ratio (2.02), resulting in an overall outcome of 9.9%.

In addition, GPT 5.1 has a notable characteristic of obvious pessimism, being very keen on short selling. Over 70% of the orders are short positions.

Qwen 3: The “risk-averse” who values words like gold

Qwen 3 is obviously the most cautious large model, having opened positions only 44 times in all tests, with an opening ratio of only 29%. However, like GPT, this extreme caution did not lead to a higher win rate. His average win rate is also only 34%, with the best performance observed in the 4-hour indicator scenario.

In addition, the profit-loss ratio of Qwen 3 is also relatively high, reaching 1.96. It seems to belong to risk-averse players, who are better at reducing the number of trades while allowing profits to run. In the scenario with the 4-hour band indicators, the expected profit of Qwen 3 is also the closest to being profitable, achieving 0.95, which is the highest among all models.

Data summary status

Summary:

In summary, we may have gained the following insights from these AI simulated trading processes.

First, for the vast majority of models, charts with indicators are more reliable than pure candlestick charts. With indicators, the average win rate of these six models reached 38%, while the win rate without indicators was only 30%.

Secondly, AI may be better at short-term trading rather than long-term trading. In a pure candlestick scenario of 15 minutes, the average win rate of the six major models reached 34%, which is higher than the 30% of the 4-hour period. Among the six models, three were profitable (Gemini, GPT, Doubao), and the average profit-loss ratio is generally good.

Thirdly, completely entrusting the position to AI is not advisable. During this testing process, all AI models had a profit expectation lower than 1, which means that in the long run, based on such a win rate and profit-loss ratio, their final results would all be losses. It’s just a matter of the speed of losses (however, this is due to the fact that the AI models were not specially trained, and the indicators used were merely some simple common indicators). Therefore, if you want AI to replace you in trading, it may require a more complex training process and more backtesting data.

When this computing power showdown comes to an end, looking at the final numbers in our account balance, the most important insight we gain may not be “which model is the strongest,” but rather “where are the boundaries of AI trading.” The ultimate conclusion is that today's AI may not yet be able to directly replace an excellent fund manager, but it has evolved into a relatively mature trading assistant in some aspects, with some being good at chart analysis, some being good at risk control, and others excelling at data analysis to achieve stable winning rates. As for people's growing expectations of AI, having AI replace humans in trading remains a complex proposition.

BTC1.74%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.