Make sure you re-Train after you add the additional data. That could definitely cause an issue if you just pressed Test without a new Train.
Also, I am a big fan of oversampling, which might help your situation. The timing of the trades (ie, which bar you enter) could vary between simulation and real life, so oversampling tests a variety of "starting points" and then uses the average. This helps confirm that it is your logic eeking-out the edge, and not just luck. The theory is that doing this would give you a more accurate simulation.
To try oversampling, add a line like this:
if(is(TESTMODE)) NumSampleCycles = 5; //oversampling on Test only, not Train
I too have noticed that varying the NumWFOCycles can sometimes cause dramatic differences. I think there is no one-size-fits all correct setting. If your logic requires more frequent parameter optimization, then you would want to have more WFO cycles. In my current strategy that I'm building, I am now testing the WFO cycles and determining which slicing is best. It would be different for each logic I think, and trying to philosophically say "it should be X" or "should be Y" seems wrong. I rather just let the data speak to me.
The error factor is a statistical measure that goes down as your # of trades increase.