You may want to try predict_on_batch() instead of predict_proba(). I am using Keras with Tensorflow on Python, not R, but had the same issue with prediction time, and predict_on_batch() turned out to be more than an order of magnitude faster, in my setup.