If a NN did not work well with slightly imbalanced data, but worked better with the Zorro balance algorithm, then it probably uses mini-batch gradient descent for learning. Mini-batch requires that not only the whole set, but also small intervals of the samples are still more or less balanced.