Yes, it does use mini-batch gradient descent, which I believe is by far the most common NN setup.

However, I was also shuffling before each epoch. So a standard random upsampling (in my case random sampling 651 samples form the minority class) is just as effective for balancing both locally and globally. And the OOS results were comparable using either standard upsampling or Zorro's balancing algorithm.

The strange thing is that when training with Zorro's algorithm there was a very large overfitting on the training set, but no sign of overfitting with the usual upsampling method. With unshuffled data, the NN could reverse-engineer Zorro's algorithm just as I did which not desirable. I'm not sure what is going on with the shuffled data, but the usual upsampling would seem to be better based on this experiment.