I just found out that tex2D inside an if or else statement prevents the use of dynamic branching. The solution is to use tex2Dlod instead, which will speed up the vp_pssm.fx a lot.
Just replace float fDepth = tex2D(sMap, vShadowTexCoord.xy + vOffset).x; with float fDepth = tex2Dlod(sMap, float4(vShadowTexCoord.xy + vOffset, 0.0f, 0.0f)).x;