Great work indeed, the fps improvement is a nice plus point
Stencil shadow volumes most have a preformance lack due to heavy silhouette computation and a costly invisible fillrate due to the technique of z-pass/z-fail.
Most of the time, this can be solved by using a a low-polygon model of the occluder to compute the shadow volume.
An even better methode is to use welded meshes. Since the structure of a vertex within D3D is not just the position but also color and normal information. E.g. you would assume that a cube exist out of 8 vertices but due to the algorithm of D3D, the 8 is turned into 24
For low poly models, this is still acceptable but for high end models, you can say: "Houston, we have a problem" ^^ I guess you made advantage of this technique which is a smart thing to do when comparing the framerates.
About the artifacts, my idea to this is that you use conditional branching to perform the entire operation.
Thanks in progress
Frazzle