Gamestudio Links
Zorro Links
Newest Posts
Data from CSV not parsed correctly
by EternallyCurious. 04/18/24 10:45
StartWeek not working as it should
by Zheka. 04/18/24 10:11
folder management functions
by VoroneTZ. 04/17/24 06:52
lookback setting performance issue
by 7th_zorro. 04/16/24 03:08
zorro 64bit command line support
by 7th_zorro. 04/15/24 09:36
Zorro FIX plugin - Experimental
by flink. 04/14/24 07:48
Zorro FIX plugin - Experimental
by flink. 04/14/24 07:46
AUM Magazine
Latest Screens
The Bible Game
A psychological thriller game
SHADOW (2014)
DEAD TASTE
Who's Online Now
1 registered members (AndrewAMD), 552 guests, and 1 spider.
Key: Admin, Global Mod, Mod
Newest Members
EternallyCurious, 11honza11, ccorrea, sakolin, rajesh7827
19046 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
Zorro, Neural, R, Caret, overfitting train results? #477063
05/14/19 13:22
05/14/19 13:22
Joined: Jan 2019
Posts: 73
berlin
L
laz Offline OP
Junior Member
laz  Offline OP
Junior Member
L

Joined: Jan 2019
Posts: 73
berlin
Hi Guys. I've been working on my "Zorro-R-Caret" framework for several weeks now and so far everything is going great.

I use the NEURAL functions of Zorro and send the training data to R, in R I use a modified "Timeslice" mode inside the Caret Packet.

I use Zorro (1.96), R (3.5.3) and the Caret package (6.0-81).

The WFA settings for Zorro are:
DataSplit = 70;
NumWFOCycles = 5;

Since I still have one parameter to optimize I use:
NumTrainCycles = 2;

In the first cycle, the models are fitted in R, in the second cycle, the appropriate parameter is determined by Zorro.

Now I noticed, that my optimization produces only positive results. Currently I train 2 algos ("knn", "rpart1SE") on 3 assets (AUDUSD, EURUSD, GBPUSD) and I use the data from 2016 to 2018. The optimized parameter is the stoploss / take profit ratio.

I also use a custom objective() that returns 0.0 if < 200 trades.

Since all results produce a profit factor> 1.0, a few random examples are enough:

Quote:
End of lookback period
Loop [1][1] p1 step 1: 1.00 => 1.65 4116/2411
End of lookback period
Loop [1][1] p1 step 2: 1.25 => 2.07 2187/1236
End of lookback period
Loop [1][1] p1 step 3: 1.50 => 2.69 1035/523
End of lookback period
Loop [1][1] p1 step 4: 1.75 => 3.75 521/205
End of lookback period
Loop [1][1] p1 step 5: 2.00 => 3.67 266/112
End of lookback period
Loop [1][1] p1 step 6: 2.25 => 0.00 103/39
Selected p1[4] = 1.708 => 2.78

AUDUSD:knn: 1.708=> 4.182

End of lookback period
Loop [1][4] p1 step 1: 1.00 => 1.61 4070/2464
End of lookback period
Loop [1][4] p1 step 2: 1.25 => 2.60 2061/973
End of lookback period
Loop [1][4] p1 step 3: 1.50 => 4.12 1509/540
End of lookback period
Loop [1][4] p1 step 4: 1.75 => 5.83 1289/364
End of lookback period
Loop [1][4] p1 step 5: 2.00 => 7.53 1121/272
End of lookback period
Loop [1][4] p1 step 6: 2.25 => 7.96 977/240
End of lookback period
Loop [1][4] p1 step 7: 2.50 => 9.00 738/163
End of lookback period
Loop [1][4] p1 step 8: 2.75 => 8.37 447/117
End of lookback period
Loop [1][4] p1 step 9: 3.00 => 8.90 436/113
End of lookback period
Loop [1][4] p1 step 10: 3.25 => 9.10 286/75
End of lookback period
Loop [1][4] p1 step 11: 3.50 => 12.25 168/33
End of lookback period
Loop [1][4] p1 step 12: 3.75 => 0.00 150/26
Selected p1[9] = 3.02 => 9.21

AUDUSD:rpart1SE: 3.02=> 10.171

End of lookback period
Loop [3][1] p1 step 1: 1.00 => 1.45 3932/2605
End of lookback period
Loop [3][1] p1 step 2: 1.25 => 1.87 2025/1253
End of lookback period
Loop [3][1] p1 step 3: 1.50 => 2.56 912/478
End of lookback period
Loop [3][1] p1 step 4: 1.75 => 3.35 386/161
End of lookback period
Loop [3][1] p1 step 5: 2.00 => 0.00 144/45
Selected p1[3] = 1.474 => 2.06

GBPUSD:knn: 1.474=> 2.769

No matter what the parameter is, all results from all WFOCycles, all assets and all algos have a profit factor > 1.0.

Since I use a modified timeslice function in Caret, I suspect the problem arises when fitting the models in R.

To save computation time I "skip" some TimeSlice positions in R/Caret:
Quote:
[41] [1] "#########################################################################"
[41] [1] "caret.trn() | .mth knn | call train control..."
[41] [1] "#########################################################################"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | NROW(.x) 6547 | NCOL(.x) 94"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | NROW(.y) 6547 | NCOL(.y) 1"
[41] [1] "#########################################################################"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | fixed/rolling | wns 2182 | mxh 1091"
[41] [1] "#########################################################################"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | slices 3275 | selected 4"
[41] [1] "#########################################################################"
[41] [1] "---------------------------------- trn fold[1] | length 2182"
[41] # from 1 2 3 4 5 6
[41] ### to 2177 2178 2179 2180 2181 2182
[41] [1] "---------------------------------- tst fold[1] | length 1091"
[41] # from 2183 2184 2185 2186 2187 2188
[41] ### to 3268 3269 3270 3271 3272 3273
[41] [1] "---------------------------------- trn fold[2] | length 2182"
[41] # from 1092 1093 1094 1095 1096 1097
[41] ### to 3268 3269 3270 3271 3272 3273
[41] [1] "---------------------------------- tst fold[2] | length 1091"
[41] # from 3274 3275 3276 3277 3278 3279
[41] ### to 4359 4360 4361 4362 4363 4364
[41] [1] "---------------------------------- trn fold[3] | length 2182"
[41] # from 2183 2184 2185 2186 2187 2188
[41] ### to 4359 4360 4361 4362 4363 4364
[41] [1] "---------------------------------- tst fold[3] | length 1091"
[41] # from 4365 4366 4367 4368 4369 4370
[41] ### to 5450 5451 5452 5453 5454 5455
[41] [1] "---------------------------------- trn fold[4] | length 2182"
[41] # from 3274 3275 3276 3277 3278 3279
[41] ### to 5450 5451 5452 5453 5454 5455
[41] [1] "---------------------------------- tst fold[4] | length 1091"
[41] # from 5456 5457 5458 5459 5460 5461
[41] ### to 6541 6542 6543 6544 6545 6546
[41] [1] "#########################################################################"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | length(idx) 4 | length(idx[[1]]) 2182"
[41] [1] "caret.tcl() | .mth knn | .tcm timeslice | length(ido) 4 | length(ido[[1]]) 1091"
[41] [1] "#########################################################################"

I would normally have to train 3275 slices (it takes weeks), so I only train on 4 overlapping slices.

When I start a TEST after training, the PF of the system goes down to < 1.2.

It is still profitable but the results are much worse than in training / optimization.

I do train on all Bars, not every single step but i use all the data.

What do you think, where is the problem? Do I train too few slices?

Are the models over-fitting in STEP 1 because of that? If so, how & why?

Why I can not see this over-fitting (bad results) during STEP 2 the parameter optimization?

Many Thanks!

Last edited by laz; 05/14/19 14:54.
Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #477297
06/13/19 21:04
06/13/19 21:04
Joined: Jan 2019
Posts: 73
berlin
L
laz Offline OP
Junior Member
laz  Offline OP
Junior Member
L

Joined: Jan 2019
Posts: 73
berlin
No ideas?

Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #477856
08/05/19 14:42
08/05/19 14:42
Joined: Jan 2019
Posts: 73
berlin
L
laz Offline OP
Junior Member
laz  Offline OP
Junior Member
L

Joined: Jan 2019
Posts: 73
berlin
problem solved, i use pca to control the overfitting wink

Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #480274
05/29/20 11:40
05/29/20 11:40
Joined: May 2020
Posts: 9
Ireland
O
onoff Offline
Newbie
onoff  Offline
Newbie
O

Joined: May 2020
Posts: 9
Ireland
I am no data scientist, but reading on the subject it seems that PCA can be a bad idea in many cases and Regularization should be used instead - have you any experience with it?


Moderated by  Petra 

Powered by UBB.threads™ PHP Forum Software 7.7.1