Thanks SFF,

that guy certainly seems to have good credentials. I will have a look out for this one.
The other books in this area that I am aware of are those by Robert Pardo. His earliest one is from 1992 (and it comes with very retro dot matrix printer performance tables) so he may well have been the pioneer! One impression I get from all these texts is that they all get a bit "rule of thumb" when it comes to the statistics. This may be a conscious choice re targeting of the text or it may mean that this is as precise as we can get in this area (statistical interpretation of results often ends up being somewhat subjective) - which of these is the case I do not know. A textbook answer would be superb but may not exist. Anyway.

The reason I am a bit hung up about my Luxor problem is this. Here is a simple system for which I have a reasonably detailed set of results (from the J and T book). It is natural to want to replicate them. I find the following results against the benchmark ones - I hope the copyright police won't execute me for the second picture.





I think we can agree that the difference between the two equity curves is non-negligible - the book curve suggests further evaluation while the curve generated by my script invites immediate binning. I see the following different possible reasons for this:
1) there is an error in my script: this would be my preferred reason. It should be easy to correct my script above if it does not, actually, describe an MA crossover strategy where we are always in the market (after the initial entry, which will indeed not be according to system) and where entries are made on a one pip break of the signal candle.
2) there is a bug in either TradeStation (which J&T use) or Zorro. This is a facile way out and, given both systems have no doubt been tested exhaustively with substantially more complex systems, it is extremely improbable.
3) this difference is caused by a difference between data feeds: this is the potential reason I am worried about. Clearly, data from different sources will never completely match meaning that every data point we collect for testing will in fact represent an instance of a random variable subject to sampling error. We would hope and assume that the error introduced is not systematic (that is, e.g. 'large' candles are on average two pips larger in one datafeed that in the other), but corresponds to a white noise (zero-mean, constant variance) term, such that any data induced performance divergences largely average themselves out over a test set of sufficient length. To be honest I have not made any particular effort to vet the Zorro data as I assume its quality has been deemed sufficient for any algo testing purposes. If this is the case then it suggests the dependence of system performance on input data is extreme, sufficient in fact to completely change our evaluation of a system (from interesting to pointless). If so, any effort to design a winning system seems meaningless to me unless we at the same time ensure it works across data feeds by, for instance, containing the testing logic within a further (resampling) loop.
4) the curve shown in the book was not actually generated by their printed code. Well.

So this is why I am hoping possible reason 1) is the answer. Indeed, there may also be further possible explanations I have not thought of! Maybe anyone has an idea?

Last edited by hyperfish; 04/27/13 14:12.