An algorithm is considered effective at a point in time if it remains profitable as intended when applied in a target market in the future. After the optimization process, we likely have an effective algorithm on the in-sample data set. However, this algorithm is fine-tuned on the same data set, and there’s a possibility of overfitting. To determine the effectiveness of an algorithm, we need to assess the algorithm on never-before-seen data that has never been used to build the model or to fine-tune its parameters. This data set is also called out-of-sample data. Note that this data should be sufficiently representative for the performance statistics to be valid.
Testing the Algorithm on Out-Of-Sample Data
To assess a trading algorithm, we need to carefully evaluate its performance on an out-of-sample data set to find and explain performance dissimilarities if any when applied to in-sample data. The assessment also helps understand the advantages, risks, and behavior of the algorithm in different market conditions. This is key to monitor the algorithm during real trading. From this point, we can detect anomalies to manage risks and find opportunities to improve the algorithm.
The effectiveness of a trading algorithm is evaluated by a comprehensive analysis of transaction records. They include net assets over time and statistics on buys, sells, wins, losses, and the times when the net asset draws down. In these metrics, the maximum drawdown is to assess risk, while monthly returns are used to evaluate rewards. The net asset curve provides an overview of both. The detailed transaction statistics report the algorithm’s behaviors.
An algorithm is considered ineffective if its performance on out-of-sample data shows abnormal results compared to in-sample data. Some examples are losing streaks for a long time, sharp drops, excessive deviation of daily returns, or unmatched risk and reward compared to in-sample performance. Investors should carefully research and identify the reasons for these anomalies to determine the algorithm’s effectiveness. For example, it’s advisable to pay attention to market conditions in the data. It’s a possibility of performance mismatch due to differences in market conditions, not because of the algorithm itself.
Figure 14 is a real-world example of the performance dissimilarity of risk and reward performance between in-sample and out-sample data.
A common way to detect the incompatibility of two trading profiles is to use maximum drawdown (MDD) as a metric. The algorithm will be considered ineffective and unusable when it undergoes a sharp drop exceeding a predefined multiple of in-sample MDD. It’s often used to prevent serious losses when the algorithm is applied on the main account with large capital.
Algorithm Testing Process
The testing process for a trading algorithm has 4 main stages.
Testing on Historical Out-Of-Sample Data
In this first stage, we test the algorithm on historical out-of-sample data. If its result shows potential and similarity to in-sample performance, we can move to the paper trading phase.
Paper Trading
In this stage, we test the algorithm on future data. That is, the algorithm is simulated to operate in a real-time trading environment. When receiving a transaction order, algorithm developers will not know in advance how it affects the end result. The goal is to ensure no one can predict the performance of paper trading as it operates in real-time.
The main difference between the first two stages is that the historical out-of-sample data may be used repeatedly for several algorithms. It leads to the possibility of an algorithm overfitting this data set, though not trained and validated on it. This phenomenon is called data depletion. Paper trading helps avoid this since it uses completely new data in the system.
If the algorithm still performs well in paper trading, we know that the algorithm is not an overfit and can proceed to the next phase.
Small-Account Test
Paper trading simulates the process from placing an order to its completion. This simulation may not be completely accurate due to data limitations in the Vietnam market. For example, it’s not possible to accurately simulate partial order matches in the derivatives market. The small-account test is designed to solve this problem. It tests the algorithm on future data using a real account with small capital.
Although the performance difference between paper trading and small-account tests is unavoidable, the difference really depends on the algorithm, assuming we provide the best simulation of a security company in paper trading using market data. The goal of a small-account test is to quantify this difference.
If the performance does not result in noticeable deviations, the algorithm can proceed to the final stage to make profits for investors. We move on to real trading on the main account with large capital, where close monitoring is necessary when first applied to the target market.
Live Monitoring During Real Trading
At this stage, investors can be confident that the algorithm is not an overfit. Yet the algorithm can become ineffective when factors affecting stock prices have changed. The algorithm may need regular retesting in real trading. When the algorithm's assumption is no longer valid, the algorithm needs to be stopped or adjusted accordingly before causing catastrophic losses.