Well technically the only reason I "mess with data" is because I know how lol (read: I would prefer not to, but having access to good data is a necessary evil to build a robot!)
Thanks for the dukascopy info, I will definitely keep that in mind. I would prefer to try the FXCM route first since that data is officially licensed (in other words, the effort we invest in formatting FXCM data could be repurposed for the Zorro community I think, whereas maybe not with dukascopy data due to licensing).
Now if someone wants to make an argument that XX-vendor's data is better/cleaner/superior to the FXCM data, then I'm interested to hear about it for sure. I know there are holes in all data (I've heard), so I think the best procedure might be to run analysis on each group of downloaded data and be aware of how significant those holes may be.
I've done some limited manipulation of tickdata in the past but not ready to go there yet (I have so much to play with already being the Zorro n00b that I am). Some interesting research I've done before with synthesizing tickdata included grabbing different segments of multiple price streams and sewing them together in a normalized fashion (ie, take a week from EURUSD and sew it to the next week of GBPUSD, normalized, for example). The intent with that research was to try to improve robustness of the strategy, but also because the testing platform did not (at the time) support multiple currencies at once I think.
Anyway, I'm sure there are a million directions we could go... I'd rather start with getting a full library of (in my case) the 27 pairs under my belt. That in and of itself will keep me busy for awhile with Zorro...