To trade in a real environment, a trading algorithm must have a logical foundation. It’s different from patterns that generate profits but cannot be interpreted. These patterns can be found by exhaustively searching past data. Past data should only be used to apply financial theory to a particular market.
The optimization process begins with a hypothesis in the form of a preliminary algorithm. It can be proposed by a professional based on their financial knowledge or practical experience. This preliminary algorithm is optimized on a sufficiently long historical data range to produce a complete, stable, and profitable algorithm. The past data range is called the in-sample data of the fine-tuned algorithm. This data range needs to reflect market volatility. The algorithm needs to make sufficient trading orders in this range for its performance to be statistically significant.
A preliminary algorithm is refined by adding new rules and adjusting parameters so that its efficiency in the target market is optimal. The efficiency is measured by an objective function, and the target market is represented by an in-sample data set.
The optimal efficiency is the right balance between risk and return, and not when the algorithm reaches the highest return. The Sharpe ratio is often selected as the objective function. Limiting the risk of loss also helps an algorithm avoid overfitting. It’s a key issue in the algorithm development process.
Overfitting is a special phenomenon when an algorithm is specifically fine-tuned to get optimal results in specific sample data. However, when the tuned algorithm works on out-of-sample data, the efficiency is significantly reduced. When searching for the optimal parameters, algorithm developers should note two requirements:
-
Good performance in in-sample data;
-
Equivalent performance in out-of-sample data and in real trading.
It’s key to limit overfitting when developing the trading algorithm. Even though there are solid techniques in this area, it’s challenging to remove overfitting completely.
Find an Optimally Fine-Tuned Algorithm
There are two steps to finding an optimally fine-tuned algorithm.
1. Add the Variational Law
When optimizing an algorithm, the dynamic nature of the market should be noted. Instead of choosing fixed parameters, it’s advisable to create a rule that is a function of the statistical market variation. In this method, trading algorithms can adjust to the moving market, ensuring stability regardless of market conditions.
For example, volatility is an important indicator of market conditions. An algorithm may work differently in different market volatility. The average difference between the highest and lowest price of VN30 in recent days can be a statistical signal for market volatility. Using this signal, an algorithm can adapt itself to work well in different market variations.
2. Parameter Search
After adding market statistics to algorithms, there are possibly algorithm parameters that don’t change frequently and need to determine their values. Parameter space is the set of all possible combinations of these parameter values. The optimization process should be designed to pick the optimal parameter values.
A simple approach is that for each set of parameters, algorithm developers run backtests over past data and pick the best set according to an objective function. This function can be the Sharpe ratio or another criterion that balances risk and return. This method has the limitation that the number of parameter sets can be infinite or very large. In this scenario, there will be insufficient resources to try all possible sets. Another issue is that the set found in this method may not guarantee similar performance in future data. These are the two main problems in the optimization process.
Other methods have been proposed to limit the search space and still select a near-optimal parameter set, such as grid search, priority parameter search, and advanced search algorithm.
Grid search. This method is closest to the naive method above. All preset values of the parameter set will be checked. To avoid overfitting, the appropriate set of parameters will be reselected by dividing the data set in the sample into training and validation sets.
For example, a regression algorithm predicts how much the closing price of VN30F1M will change compared to the previous day’s. It then opens and closes a position at ATO and ATC respectively. For instance, if it predicts a 3% decrease, it will open a short position at ATO and close at ATC. It will make a profit if the prediction is correct. In this algorithm, there are two main parameters to identify:
-
N: the number of previous days used as data for prediction.
-
Alpha (α): the threshold to open a position at ATO. If the price difference is greater than this threshold, the algorithm will open a position at ATO.
In this example, there are 25 sample values of N, and 7 sample values of α to check. Also, the parameter search assumes this range is sufficient to cover the parameter space that is likely to contain the optimal value. Therefore, there are 25 x 7 = 175 different parameter sets. Tables 01 and 02 list the values and performance of these parameters. Based on the following result, the optimal parameter is N = 19 and α = 0.2%. It results in 127.62% profit and a -11.83% maximum drawdown (MDD).
Priority parameter search. When there are too many combinations of parameters to evaluate, grid search will be inefficient due to a lack of computational resources. To save time, the parameters that have a great influence on the algorithm should be prioritized by only changing the value of that parameter, while keeping others unchanged.
We can fix α = 1.0%, and change the values of N to the optimal N which is 15. Then we fix this parameter and search for different values of α. Finally, we find the optimal α = 0.3% (and N = 15) with 103.95% profit and -16.20% maximum drawdown (MDD). We find this parameter set by trying 31 different combinations, instead of 175 as with the grid search method.
Advanced search algorithm. In advanced search, it’s recommended to discover which direction we should move in the parameter space to find the optimal set, guided by the value of the objective function. In this way, it’s only required to test a much smaller part of the parameter space to find the optimal approximation parameter set.
One of the advanced methods is the hill climbing search. This method searches based on a direction that gradually increases efficiency. It stops when there are no other parameter sets around for higher results. For example, the direction in Table 01 is [N=21, α = 0.3%] → [N=20, α = 0.3%] → [N=19, α = 0.3%] → [N=19, α = 0.2%].