Zorro, Neural, R, Caret, overfitting train results?

Gamestudio Links

Zorro Links

Newest Posts

Data from CSV not parsed correctly
by EternallyCurious. 04/18/24 10:45

StartWeek not working as it should
by Zheka. 04/18/24 10:11

folder management functions
by VoroneTZ. 04/17/24 06:52

♪♫♪ [For hire] VOLKOVSTUDIO - Music, SFX, Voice over, implementa
by Volkovstudio. 04/16/24 09:54

lookback setting performance issue
by 7th_zorro. 04/16/24 03:08

zorro 64bit command line support
by 7th_zorro. 04/15/24 09:36

Zorro FIX plugin - Experimental
by flink. 04/14/24 07:48

Zorro FIX plugin - Experimental
by flink. 04/14/24 07:46

AUM Magazine

Latest Screens

Who's Online Now

1 registered members (AndrewAMD), 552 guests, and 1 spider.

Key: Admin, Global Mod, Mod

Newest Members

EternallyCurious, 11honza11, ccorrea, sakolin, rajesh7827
19046 Registered Users

Print Thread

Rate Thread

Zorro, Neural, R, Caret, overfitting train results? #477063 05/14/19 13:22 05/14/19 13:22
Joined: Jan 2019 Posts: 73 berlin L laz OP Junior Member
laz OP Junior Member L Joined: Jan 2019 Posts: 73 berlin	Hi Guys. I've been working on my "Zorro-R-Caret" framework for several weeks now and so far everything is going great. I use the NEURAL functions of Zorro and send the training data to R, in R I use a modified "Timeslice" mode inside the Caret Packet. I use Zorro (1.96), R (3.5.3) and the Caret package (6.0-81). The WFA settings for Zorro are: DataSplit = 70; NumWFOCycles = 5; Since I still have one parameter to optimize I use: NumTrainCycles = 2; In the first cycle, the models are fitted in R, in the second cycle, the appropriate parameter is determined by Zorro. Now I noticed, that my optimization produces only positive results. Currently I train 2 algos ("knn", "rpart1SE") on 3 assets (AUDUSD, EURUSD, GBPUSD) and I use the data from 2016 to 2018. The optimized parameter is the stoploss / take profit ratio. I also use a custom objective() that returns 0.0 if < 200 trades. Since all results produce a profit factor> 1.0, a few random examples are enough: Quote: End of lookback period Loop [1][1] p1 step 1: 1.00 => 1.65 4116/2411 End of lookback period Loop [1][1] p1 step 2: 1.25 => 2.07 2187/1236 End of lookback period Loop [1][1] p1 step 3: 1.50 => 2.69 1035/523 End of lookback period Loop [1][1] p1 step 4: 1.75 => 3.75 521/205 End of lookback period Loop [1][1] p1 step 5: 2.00 => 3.67 266/112 End of lookback period Loop [1][1] p1 step 6: 2.25 => 0.00 103/39 Selected p1[4] = 1.708 => 2.78 AUDUSD:knn: 1.708=> 4.182 End of lookback period Loop [1][4] p1 step 1: 1.00 => 1.61 4070/2464 End of lookback period Loop [1][4] p1 step 2: 1.25 => 2.60 2061/973 End of lookback period Loop [1][4] p1 step 3: 1.50 => 4.12 1509/540 End of lookback period Loop [1][4] p1 step 4: 1.75 => 5.83 1289/364 End of lookback period Loop [1][4] p1 step 5: 2.00 => 7.53 1121/272 End of lookback period Loop [1][4] p1 step 6: 2.25 => 7.96 977/240 End of lookback period Loop [1][4] p1 step 7: 2.50 => 9.00 738/163 End of lookback period Loop [1][4] p1 step 8: 2.75 => 8.37 447/117 End of lookback period Loop [1][4] p1 step 9: 3.00 => 8.90 436/113 End of lookback period Loop [1][4] p1 step 10: 3.25 => 9.10 286/75 End of lookback period Loop [1][4] p1 step 11: 3.50 => 12.25 168/33 End of lookback period Loop [1][4] p1 step 12: 3.75 => 0.00 150/26 Selected p1[9] = 3.02 => 9.21 AUDUSD:rpart1SE: 3.02=> 10.171 End of lookback period Loop [3][1] p1 step 1: 1.00 => 1.45 3932/2605 End of lookback period Loop [3][1] p1 step 2: 1.25 => 1.87 2025/1253 End of lookback period Loop [3][1] p1 step 3: 1.50 => 2.56 912/478 End of lookback period Loop [3][1] p1 step 4: 1.75 => 3.35 386/161 End of lookback period Loop [3][1] p1 step 5: 2.00 => 0.00 144/45 Selected p1[3] = 1.474 => 2.06 GBPUSD:knn: 1.474=> 2.769 No matter what the parameter is, all results from all WFOCycles, all assets and all algos have a profit factor > 1.0. Since I use a modified timeslice function in Caret, I suspect the problem arises when fitting the models in R. To save computation time I "skip" some TimeSlice positions in R/Caret: Quote: [41] [1] "#########################################################################" [41] [1] "caret.trn() \| .mth knn \| call train control..." [41] [1] "#########################################################################" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| NROW(.x) 6547 \| NCOL(.x) 94" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| NROW(.y) 6547 \| NCOL(.y) 1" [41] [1] "#########################################################################" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| fixed/rolling \| wns 2182 \| mxh 1091" [41] [1] "#########################################################################" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| slices 3275 \| selected 4" [41] [1] "#########################################################################" [41] [1] "---------------------------------- trn fold[1] \| length 2182" [41] # from 1 2 3 4 5 6 [41] ### to 2177 2178 2179 2180 2181 2182 [41] [1] "---------------------------------- tst fold[1] \| length 1091" [41] # from 2183 2184 2185 2186 2187 2188 [41] ### to 3268 3269 3270 3271 3272 3273 [41] [1] "---------------------------------- trn fold[2] \| length 2182" [41] # from 1092 1093 1094 1095 1096 1097 [41] ### to 3268 3269 3270 3271 3272 3273 [41] [1] "---------------------------------- tst fold[2] \| length 1091" [41] # from 3274 3275 3276 3277 3278 3279 [41] ### to 4359 4360 4361 4362 4363 4364 [41] [1] "---------------------------------- trn fold[3] \| length 2182" [41] # from 2183 2184 2185 2186 2187 2188 [41] ### to 4359 4360 4361 4362 4363 4364 [41] [1] "---------------------------------- tst fold[3] \| length 1091" [41] # from 4365 4366 4367 4368 4369 4370 [41] ### to 5450 5451 5452 5453 5454 5455 [41] [1] "---------------------------------- trn fold[4] \| length 2182" [41] # from 3274 3275 3276 3277 3278 3279 [41] ### to 5450 5451 5452 5453 5454 5455 [41] [1] "---------------------------------- tst fold[4] \| length 1091" [41] # from 5456 5457 5458 5459 5460 5461 [41] ### to 6541 6542 6543 6544 6545 6546 [41] [1] "#########################################################################" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| length(idx) 4 \| length(idx[[1]]) 2182" [41] [1] "caret.tcl() \| .mth knn \| .tcm timeslice \| length(ido) 4 \| length(ido[[1]]) 1091" [41] [1] "#########################################################################" I would normally have to train 3275 slices (it takes weeks), so I only train on 4 overlapping slices. When I start a TEST after training, the PF of the system goes down to < 1.2. It is still profitable but the results are much worse than in training / optimization. I do train on all Bars, not every single step but i use all the data. What do you think, where is the problem? Do I train too few slices? Are the models over-fitting in STEP 1 because of that? If so, how & why? Why I can not see this over-fitting (bad results) during STEP 2 the parameter optimization? Many Thanks! Last edited by laz; 05/14/19 14:54.

Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #477297 06/13/19 21:04 06/13/19 21:04
Joined: Jan 2019 Posts: 73 berlin L laz OP Junior Member
laz OP Junior Member L Joined: Jan 2019 Posts: 73 berlin	No ideas?

Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #477856 08/05/19 14:42 08/05/19 14:42
Joined: Jan 2019 Posts: 73 berlin L laz OP Junior Member
laz OP Junior Member L Joined: Jan 2019 Posts: 73 berlin	problem solved, i use pca to control the overfitting

Re: Zorro, Neural, R, Caret, overfitting train results? [Re: laz] #480274 05/29/20 11:40 05/29/20 11:40
Joined: May 2020 Posts: 9 Ireland O onoff Newbie
onoff Newbie O Joined: May 2020 Posts: 9 Ireland	I am no data scientist, but reading on the subject it seems that PCA can be a bad idea in many cases and Regularization should be used instead - have you any experience with it?

Moderated by Petra