Slin wrote a shader which is so simple, even I can understand. And it's so fast and effective that the small testlevel I created runs with full framerate at XGA resolution using this shader.
I try to write a tutorial about using it in the next time. And maybe I try to translate it into english, too.