Originally Posted by jcl
"Any point" means that any sub-range of the samples is still balanced.


But that begs the question of what a "sub-range" is laugh It can't possibly be balanced on every interval.

Quote
Random numbers or sliding windows are not used, the algorithm is just as simple as described above.


Yes, I reverse-engineered the algorithm (or at least I reproduced BALANCED on two different datasets): At any point, it looks at the *initial segment* of the time series so far, including duplicates, and then if there is an imbalance greater than 1 and if repeating the current sample reduces that imbalance then it duplicates the current sample, but there is a limit of three duplicates for each sample.

What was surprising is that my standard feed-forward NN was barely able to learn on data that was only slightly imbalanced, something like 52%-48%. Yet it learned so well after this simple balancing algorithm. Everything I have read so far about ML seems to focus on the case of extreme imbalances. Also, I looked up quite a few balancing techniques and none of them was anything like the one in Zorro.