2 registered members (AndrewAMD, VoroneTZ),
779
guests, and 3
spiders. |
Key:
Admin,
Global Mod,
Mod
|
|
|
Re: What algorithm is used for BALANCED?
[Re: jcl]
#478374
10/09/19 17:26
10/09/19 17:26
|
Joined: Oct 2018
Posts: 72
JamesHH
OP
Junior Member
|
OP
Junior Member
Joined: Oct 2018
Posts: 72
|
"Any point" means that any sub-range of the samples is still balanced. But that begs the question of what a "sub-range" is It can't possibly be balanced on every interval. Random numbers or sliding windows are not used, the algorithm is just as simple as described above. Yes, I reverse-engineered the algorithm (or at least I reproduced BALANCED on two different datasets): At any point, it looks at the *initial segment* of the time series so far, including duplicates, and then if there is an imbalance greater than 1 and if repeating the current sample reduces that imbalance then it duplicates the current sample, but there is a limit of three duplicates for each sample. What was surprising is that my standard feed-forward NN was barely able to learn on data that was only slightly imbalanced, something like 52%-48%. Yet it learned so well after this simple balancing algorithm. Everything I have read so far about ML seems to focus on the case of extreme imbalances. Also, I looked up quite a few balancing techniques and none of them was anything like the one in Zorro.
|
|
|
|