In this post I'll present the results from testing my strategy using the approach described in the first post. Right at the outset, I'll say that the results presented here are flawed in that they consider the aggregated performance of the four algorithms on each market. It occurred to me as I was compiling the results that it would make more sense to analyse each algorithm on each market separately. I'll do this in the future, but for now this is a decent starting point and hopefully illustrates how this approach works and that it has some value.

The strategy is simple enough - it attempts to exploit short term serial correlation (or lack thereof) in an asset's return series. There are four algos: Algo 1 assumes that short term autocorrelation is persistent and buys or sells in the prevailing direction if the autocorrelation coefficient rises above a certain threshold. Algo 2 assumes that autocorrelation is not persistent and enters in the opposite direction if the autocorrelation coefficient rises above the threshold. Algos 3 and 4 use the same approach but use anti-autocorrelation. The algos are switched on or off via an equity curve approach as described in the manual. The correlation coefficient lag and lookback parameters are optimized.

I tested it on a bunch of assets and picked the algo/asset/direction combinations that produced good results in the WFO (I realise that this introduces a form of bias into the development process).

Which combinations were profitable due to the strategy's predictive power, and which were profitable simple due to the bias in the market during the test period? I used the approach described in the first post to try to answer that question.

The attached spreadsheet shows the results of the backtest selecting the combinations that performed well in the WFO (first tab). The equity curve looks great, as you would expect due to the bias introduced into the development process. There are a few additional performance statistics that I look at included in the spreadsheet if anyone is interested.

The second tab shows the results from the random test. The test statistic selected is profit factor. This sheet includes a histogram and cumulative frequency chart for the profit factor obtained for each asset in the random test (10,000 iterations). Lets select a cumulative frequency limit of 90% for selecting assets. That is, for an asset to be included in our strategy, the profit factor generated in the backtest must be better than 90% of the profit factors generated in the random test. Anything less than this indicates that the results obtained were possibly due to the nature of the market during the simulation time (for example, an asset traded only on the long side will do well in a rising market, even if entries are random). The 90% value is arbitrary; the value selected would depend on the individual. Perhaps 95% is a better choice.

Interestingly, some of the assets returned what seems a very good profit factor, but didn't beat an overwhelming majority of the random strategies. Take a look at the results for EUR/CHF for example, with a PF of 1.39 returned by the strategy. This only beat about 75% of the random strategies, which indicates that there is a good chance that the strategy was not exploiting any tradeable edge; rather it did well simply by being in the market in the right direction. Indeed, the strategy only selected this pair for trading short, and the results appear to be largely due to this bias.

Another question that occurred to me while I was compiling the results is whether all this testing could be deemed unnecessary if my strategy development process were more elegant, introducing as little bias as possible? Or is it appropriate to take a 'brute force' approach, testing as I have in this case and potentially introducing bias, only to strip away the flawed combinations using methods such as this? In the latter approach, you would be casting the net far and wide so to speak, hoping to catch something of value and eliminating the rest. The former approach, in my mind, risks missing a strategy of value, but ensures that anything that shows promise actually shows real promise and not something ephemeral.

I hope this kicks off some healthy discussion. I'll run another test on individual asset/algorithm combinations overnight and post the results in due course.

Cheers

Edit: spreadsheet too big to upload, so attached instead is a screenshot of the results.

Attached Files RandomTest.png