Backtesting with T1 data and variable spread

Posted By: pcz

Backtesting with T1 data and variable spread - 05/26/16 14:00

Hello everyone!

From the beginning I needed to backtest with tick data collected from my own broker that has variable spread. However I wasn't able to find how to do that. It took me several days but I think that I got it right finally. I will write the steps I took so it can serve as a tutorial for other people attempting to do something similar. I also hope that this will eventually help to discover bugs as I don't have much experience in C programming.

1. Obtaining data
For this demonstration I will use Dukascopy tick data downloaded using StrategyQuant Tick Downloader. First I downloaded the data for each currency pair and then I exported them to .csv files, each line in this format: YYYY-MM-DD hh:mm:ss.fff,bid,ask,bidVol,askVol

2. Splitting data
T1 conversion is more complicated than M1 conversion because there is no maximum size of the data set and the physical memory may not be sufficient to read the input file at once. Because I didn't want to end up wit too complicated Zorro conversion script I decided to split the data and reverse the line order using Python script and Linux shell command 'tac' (reversed 'cat'

). The script can be found here: split.py. To use it in Windows you will need something like Cygwin.

3. Conversion
I had to modify the Zorro/Strategy/Convert.c script to accomodate for T1 structs ( ConvertT1.c ). You need to set 'base_path', 'smbl', 'start' and 'end' variables to use it. The input files should be in aforementioned CSV format (produced by split.py) and stored in base_path folder. Their names should be in this form: [SYMBOL]_[YEAR].csv. The important part is that if the SPREAD is defined using #define directive in the ConvertT1.c script, another .t1 file (besides the one with ask prices) is exported. Its name contains 's' at the end of symbol's name and it contains spread values for each tick.

4. Checking the converted data
To make sure that the conversion has been done properly you can use this modified export script: ExportT1.c

5. Simulating variable spread
The final step is using both .t1 files during backtest to simulate variable spread. One of the ways to do so is to include both SYMBOL and SYMBOLs in the backtest. SYMBOLs is an artificial asset used only to get the spread value for the actual asset. When the program loops through SYMBOLs, save the spread value from T1 struct to a global variable. When the program loops through the actual SYMBOL, assign the spread from the global variable to Zorro's Spread variable. And that's it

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 05/29/16 08:17

This is probably question for Zorro developers...

So...I am setting the spread as described above. However I'm using the tick() and priceClose() to assign it to a global variable first. I have TICKS flag set.

Code:

float spread;
function tick()
{
   // if current asset == artificial spread asset
   if(!strcmp(Asset, ast1s))
      spread = priceClose();
}

From Zorro documentation: In a TMF or tick function, priceClose() returns the last price quote, updated every tick when new price data becomes available.

The prices seem to be OK in general but the spread values are sometimes weird. I double-checked that the values I'm storing in my T1 conversion script are fine. Is there any kind of smoothing or something that could cause this? Here's what I get when exporting the T1 values. I aligned the corresponding times in original and converted series and highlited the discrepancies.

A link to download the spreadsheet: https://we.tl/0ikhO2QXAZ

Posted By: jcl

Re: Backtesting with T1 data and variable spread - 05/31/16 16:50

Zorro automatically fixes outliers. Probably that correction acts here since spread data always has lots of outliers. Set Detrend to NOPRICES for telling Zorro that this asset must not be fixed.

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 06/07/16 12:33

Thank you for the reply. One last question - how does the selection of the next T1 price work? I mean - obviously you can save more than 10 prices per second to the .t1 file. But only one price per every 100 ms is considered during the backtest. If I export the .t1 data to .csv file I see that the millisecond portion of selected price slowly changes, for example:

hh:mm:ss.x40
hh:mm:ss.x40
hh:mm:ss.x40
hh:mm:ss.x40
hh:mm:ss.x41
hh:mm:ss.x41
hh:mm:ss.x41
hh:mm:ss.x42
hh:mm:ss.x42
hh:mm:ss.x42
etc... (but much slower)

The reason I'm asking this is that I want to trim the data before I convert them to .t1 files so the resulting file is as small as possible. It's especially important with some brokers that have hundreds of prices per second. But there seems to be no clear split (like lets say Zorro would take the last price from each whole hundred ms: .100, .200, ...) - rather it seems that the split shifts slowly, one millisecond at a time. Is that correct?

Posted By: jcl

Re: Backtesting with T1 data and variable spread - 06/07/16 15:06

As to my knowledge, there is no limit to the number of ticks, only to the number of bars. You can have only 10 bars per second, but any number of ticks.

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 06/07/16 17:27

Originally Posted By: jcl

As to my knowledge, there is no limit to the number of ticks, only to the number of bars. You can have only 10 bars per second, but any number of ticks.

You are absolutely right. I didn't realize that even with TICKS flag set run() is called on every bar and not every tick. Now, when outputting the prices in tick() function, I see them all with the exact timestamps +/- 1 ms. Thank you!

Posted By: boatman

Re: Backtesting with T1 data and variable spread - 06/09/16 00:06

pcz, thank you for sharing this. Extremely useful!

Posted By: CaptainChezza

Re: Backtesting with T1 data and variable spread - 12/19/16 23:14

Hey guys,
is this currently the best way to import tick data for zorro, or have things changed?
If I test with FXCM on zorro it just uses 1m OHLC; if I subscribe and get Zorro S, do I have access to reliable tick data on FXCM, or is it a bit hit and miss?

Thanks

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 12/20/16 14:40

You can download FXCM T1 data on Zorro Download page. The way I described is meant only for sources for which there is no possibility to download the data using Zorro or for situations in which it is necessary to backtest with variable spread. The variable spread backtesting can be achieved also with the new T6 struct as it allows for storing additional data besides OHLC. Related links:

http://zorro-trader.com/download.php

http://zorro-project.com/manual/en/export.htm

Posted By: CaptainChezza

Re: Backtesting with T1 data and variable spread - 12/21/16 00:36

Is FXCM T1 data that reliable, or is it worth using duka.
Variable spread is handy though.

Futher, I'm definitely a noob with this, but where do you include in the your py script the csv file name?

Also, do you then just drag and drop the py script into the cygwin terminal?

Code:

$ /cygdrive/d/Trading/DukaData/QmYybyXg.py
/cygdrive/d/Trading/DukaData/QmYybyXg.py: line 1: from: command not found
/cygdrive/d/Trading/DukaData/QmYybyXg.py: line 2: import: command not found
/cygdrive/d/Trading/DukaData/QmYybyXg.py: line 3: $'r': command not found
/cygdrive/d/Trading/DukaData/QmYybyXg.py: line 4: syntax error near unexpected t                  oken `sys.argv'
'cygdrive/d/Trading/DukaData/QmYybyXg.py: line 4: `if len(sys.argv) != 2:

I also tried using the CSVtoT6 script as I noticed in that script there has information about converting trade station csv into t1.
So I put my csv file name into that line:

Code:

#ifdef TRADESTATION
#define TCK T1
string InName = "HistoryEURCHF_UTC_Ticks_Bid_2015.02.01_2016.12.19.csv";  // name of the CSV file
string OutName = "Historyticks.t1";

But that didn't work. Is it meant to?
I am using the normal zorro, not zorro s. Do I need to have zorro s for this to function properly?

If I wanted to test T1 data on EURCHF for example can I create a fxcm account, and download it from there?
Or what is a suitable way to pull it from a CSV file from dukascopy? (without worrying about variable spread at this point in time)

Do I simply create a back the front CSV file divided into years, into the format of:
Date, value

Or does your python script do that?

I do apologise for jumping off topic a little (since i'm now just talking about tick imports)
Thanks

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 12/22/16 15:11

Originally Posted By: CaptainChezza

Is FXCM T1 data that reliable, or is it worth using duka.

I don't know. I use Dukas because I'm used to them. Also because I know that the results obtained on their data correlate well with our broker's data. Last but not least they provide long enough history.

Originally Posted By: CaptainChezza

Futher, I'm definitely a noob with this, but where do you include in the your py script the csv file name?

I'm using new scripts now. I'll link them in the following post and explain how to use them. After that you can ask if you have any more questions.

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 12/22/16 16:17

The new scripts are intended for batch conversion so you can run it on multiple files exported using Tick Downloader with no additional user input. It's more convenient because sometimes the conversion of multiple symbols can take hours or even days. Again I'm using Cygwin to run it. You must have python installed. You can check by running 'which python' command. If it doesn't return a python path, something is wrong.

How to use it - the short version

1. Save conversion bash script and python script into a folder. Edit history_path and zorro_path variables in convert.sh file and set them accordingly. The first variable is the path to the folder containing .csv files with tick data (exported using Tick Downloader).
2. Save T1 data conversion and T6 data conversion scripts in Zorro/Strategy folder and modify the basePath variable. It should be set to the same path as history_path in convert.sh but in Windows path notation style.
3. Run ./convert -p x, where x is the desired period of the converted data in minutes. -p 0 will produce T1 data, -p 1 will produce M1 T6 data with spread stored in fVal variable; higher periods don't work yet.

How to use it - the long version

The first file is a conversion bash script. It's important to edit it and set these two variables correctly: history_path and zorro_path. The first is a path of the folder containing tick data .csv files exported using Tick Downloader (by default they have names like EURUSD_tick.csv etc.). It shouldn't contain any other files otherwise unexpected things might happen

Folders are fine. The second variable is a path of the folder containing Zorro executable.

In the same folder as the previously mentioned script you should put this python script used for data preprocessing. It will be called automatically.

Then there are two Zorro strategies, one used for T1 data conversion and one for T6 data conversion. They belong to Zorro/Strategy folder. You have to edit them and set basePath variable correctly (should be the same folder as the previously mentioned history_path but in Windows notation style).

You can run the bash script for example like this:

Code:

./convert.sh -p 0

The first parameter is the desired period of the converted data. If you set it to 0 the conversion script will produce tick data in T1 format. If #define SPREAD directive is defined in the strategy file, two files will be created - one for ask and one for bid prices.

If you set -p to 1 it will produce M1 data in T6 format. Spread for opening price is stored in fVal variable. Higher periods don't work yet

You can use T1 export script to check the converted T1 prices. For T6 you can use the script included in Zorro.

The whole thing has been tested but not very thoroughly, it's an ongoing work. So if you find a bug please let me know.

How it works

The bash script loads list of files in history_path folder. Then it loops through them and calls python script with given parameters on each of them.

The python script reverses the line order of each file, splits them by year and for periods greater than 0 it converts the tick data to OHLC with opening price spread stored in the last column.

After that the bash script takes over again and loads the list of newly split files. It stores the first and the last available year together with each symbol. After that it calls Zorro with these arguments to convert the .csv files to .t1 or .t6 files depending on the chosen period.

Some final warnings

- Don't trust the scripts unless you check the converted data yourself
- Don't use special distributions of Python (e.g. Anaconda) in Cygwin for this task. It might complicate things.
- Don't use periods higher than 1
- Timestamps can be shifted using -s parameter but it works only for T6 data

EDIT: T6 Zorro script fixed (wrong variable names)

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 02/07/17 15:06

I simplified everything. I even tried to do the tick conversion using only Zorro but the implementation was slow so I'll stick with the combination of Python and Zorro. I've also tried two new approaches which do not use the 'tac' command. They are slower but if you want to run the script on Windows without using Cygwin, it might help.

The Bash script and Python script are now merged into one: convert.py (it made sense to make it like this from the beginning but I wasn't that proficient in Python)

The two Zorro scripts for T1 and T6 data were also merged: ConvertData.c

The usage is similar, you can find more info here - you can skip directly to the "Usage" section.

After the initial setup it's possible to convert all .csv data for various assets to T1 and T6 formats (containing spread) using just one command. But please note the assets have to be present in AssetsFix file.

Posted By: boatman

Re: Backtesting with T1 data and variable spread - 03/01/17 11:07

Thanks pcz, this looks really useful.

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 03/27/17 12:12

I've made some small modifications to the scripts and nothing changes regarding T1 data but I recommend re-creating all T6 data from scratch. There are two reasons for that:

The previous version of the conversion script stored opening tick spread in fVal variable while Zorro uses the previous bar's closing price for trade entry. So now the closing tick spread is stored instead.
The timestamps of T6 minute bars are now shifted by a minute. My understanding of the issue is that because FXCM's opening / closing prices are overlapping the candles' minute boundaries Zorro uses closing price of H:M:00 candle for entry / exit at the time H:M:00 while with T1 data it uses the previous candle (M-1). Therefore with (non-overlapping) data from other sources the trade entries / exits are up to 60 seconds late which is unnecessary and in certain situations it can create big differences in test results. For more information see this thread: Why are T1 and T6 results so different? (no SL/PT/Slippage...)

If you use the FXCM data downloaded from Zorro's website I recommend re-creating the T6 files as well either using Zorro's T1 data or using tick data downloaded from FXCM.

The good news is that with these changes you should be able to achieve exactly the same test results using T1 and T6 data if you enter / exit at the end of M1 (or higher period). If you use stop loss, profit targets or something like that the results might still differ.

Here's the modified Python script: convert.py
Plus the modified Zorro script: ConvertData.c

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/01/17 19:01

Hi pcz,

I am having some problems with the scripts. Specifically, the zorro script seems to be the problem. Tick files are successfully reversed by the python script, zorro is opened, the ConvertData script is called and then Cygwin just hangs.
I tried letting it run overnight but it did not complete.

Can we attach images here somehow so I can post a screenshot of cygwin?

Regardless, this is the output in the cygwin terminal:
$ python convert.py
Period: [0, 1], Time shift: 0, Price to use: ask
convert_all_files
E:/FxData_tickstory/AUDCHF_tick.csv
Reversing file lines...
E:/FxData_tickstory/AUDCHF_2017.csv
D:/Users/BobTewilliger/Zorro/Zorro.exe ConvertData -run -i 2017 -i 2017 -a AUD/CHF -d TCK
E:/FxData_tickstory/AUDCHF_2017.csv
D:/Users/BobTewilliger/Zorro/Zorro.exe ConvertData -run -i 2017 -i 2017 -a AUD/CHF

I'm running windows 7, 6700k, 16gb ram (so it shouldn't be a memory issue). Python v3.6, most recent cygwin (downloaded week beginning 26th June), Zorro v1.58. I've been testing the scripts on the past months worth of data for efficiency reasons though using a year or twos worth of data results in the same thing, it just hangs.

Cheers,
BobbyT

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/01/17 19:08

Please disregard the above message. The scripts have magically completed following another attempt (in which I did nothing different, so, weird).

A new problem has arisen though. There are no new t1/t6 files for the tested symbol in zorro/history. This is where ConvertData is meant to save the converted right?

Cheers,
BobbyT

PS: just ran a test again with the past months worth of data on a single symbol. It seems to be hanging again. Am I right in thinking this shouldn't be taking more than 30minutes to convert 1 months worth of data for a single symbol?

Posted By: Smile

Re: Backtesting with T1 data and variable spread - 07/02/17 10:58

PCZ , hello.
when you finish your *.t1 and *s.t1 files, during backtesting, the sell / stop sell order,
which price will be trade? is it ask(in t1 file) - spread(in assets file) ?

the *s.t1 file, how to show its useful?

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/02/17 15:51

So, a little update.

After adding copious print statements to everything there seems to be an issue with the way the zorro script is called. I have no idea what it is as I'm in way over my head here (interfacing two unfamiliar languages/platforms).

But I can say that calling either convert.sh/ConvertT6.c or convert.py/ConvertData.c separately gets the job done (although the t6 files are saved to the tick directory and not zorro/history). At this stage I will be using the 2-part script process as the combined bash/python script locks cygwin from any further operations whereas convert.sh terminates properly and frees up the cygwin terminal.

Cheers,
BobbyT

Posted By: jcl

Re: Backtesting with T1 data and variable spread - 07/03/17 09:41

Check if Zorro is not started _before_ the convert.py script has finished. Otherwise the source file does not exist yet, or can not be accessed since another program is writing into it.

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/03/17 17:04

Hi JCL,

Thanks for the reply. I'm sorry but you will have to noobify that answer a little for me.

Cygwin does not return to the command line before Zorro opens. The above copy/paste from the Cygwin terminal is all that appears before Zorro starts. THere are no messages printed in Zorro (I added a tonne of print statements to see where it's hanging and there is nothing returned).

Is there something I can add to the python or zorro script to ensure their operation lines up properly?

Cheers,
BobbyT

Edit: just had another look; neither the bash script or the python script terminate properly. They both hang Cygwin. Maybe there's a dependency or plugin issue (?)

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 07/03/17 19:44

Originally Posted By: Smile

Not sure if I understand the question correctly but I'll try to reply anyway (hopefully on topic) - in the latest version the name of the produced asset file is e.g. EURUSD_2016.t1 and the name of a dummy file is the same but beginning with an exclamation mark (e.g. !EURUSD_2016.t1). The first file contains ask prices (which are normally used by Zorro) and the second one contains bid prices. One of the ways how to use these files is decribed here: http://statsmage.com/zorro-on-steroids-iii-backtesting-with-variable-spread/ - then the price which is used during backtesting is the ask price and the bid price is used only to calculate the spread.

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 07/03/17 20:02

BobbyT: The produced files are indeed stored in different folder than Zorro/History so you don't accidentally overwrite some important history files.

The line reversion and file splitting are slow but when this part is done the Zorro conversion part should be very fast. I've just tried with a fresh install of Zorro 1.59 and everything works fine for me. The question is - are you using the latest version of the conversion scripts? I know the thread is a mess and I'm sorry for that. I would edit the first post but it's probably too old to do so. You can find links to the latest version here: http://statsmage.com/zorro-on-steroids-ii-data-conversion/ in the "Implementation" section. It's these two:

https://gitlab.com/panaczech/ZorroToolbox/blob/master/converter/convert.py
https://gitlab.com/panaczech/ZorroToolbox/blob/master/converter/ConvertData.c

There's no need for the bash script anymore. I'll try to keep the files in the repository updated in the future as well.

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/04/17 00:36

Hi PCZ,

That makes sense about the file locations. I had just misunderstood where they were meant to be saved.

The python/zorro scripts were downloaded Friday night so unless they have been updated since then they should be the most recent ones.

Regardless of if I use the bash/zorro or python/zorro combination the following happens:

-The tick files are correctly converted to M1 files with the lines reversed
-Zorro is opened
-Then nothing, no messages in zorro, no further messages in Cygwin (beyond what I had previously posted)

CSV files can successfully be converted to t6 files via the zorro interface after Cygwin is forcefully closed.
So it seems that ConvertData.c (or converT6.c) does it's job and convert.py (or convert.sh) also does it's job. The problem seems to be with the calling of ConvertData.c from convert.x (although as I mentioned, Zorro is successfully started by convert.x).

Could it be a windows thing? I'm running from Cygwin to try and avoid any Windows nonsense.

Cheers,
BobbyT

Posted By: pcz

Re: Backtesting with T1 data and variable spread - 07/04/17 09:52

BobbyT: In your example you tried to convert AUDCHF which is not in Zorro's asset list by default. Did you include it in the asset list file? When the asset is missing the conversion fails but the Cygwin shouldn't hang anyway.

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/04/17 15:45

Hi PCZ,

I had previously copied AssetsCurr to AssetsFix (minus USDMXN) and updated the broker information via the download script through FXCM. Zorro had also successfully downloaded AUDCHF data from FXCM following the updates to AssetsFix.

Dukas has more data available than FXCM though so that's why I'm here.

Cheers,
BobbyT

Posted By: BobbyT

Re: Backtesting with T1 data and variable spread - 07/07/17 14:22

Update: To anyone that is trying or had tried to get this to work, pcz is using Zorro S. Zorro S has a string of command line functions/options which are not available on Zorro F.

I will try an make a Zorro F compatible version over the next week. Stay tuned for any further updates.

Cheers,
BobbyT