I used 4x16bits floating point textures to render the world position and a 1x16bits floating point texture for the depth maps. Maybe you can change them to 32 bits images (14444 and 14) and tell us how it went grin

But that is not the real clue...

Both render pipelines do same computations, but deferred one does once per screen pixel instead of once per polygon pixel.

You can complicate or simplify both pipelines that you will get nearly same proportional performance. I did not look into Slins PCF technique, but in native pssm shadows there is a faster one. I simply choose the slower one to emphatize the deferring technique benefits.

What I learned of this technique is that writing into a 4x32bits bitmap for the deferred lightening takes its time, and it has to be smaller than the time taken to direct render the overlapped polygons. If you render an nearly empty scenery, deferred techniques are slower.