hmmm. I think if your using shader model 2 or higher you can have values over 1 so I wonder if you multiplied you return by something like 128, clipped the excess and then re-divided it if you'd accomplish what you need with a decent speed?