First, it is not "bilinear", but "seperated". Bilinear sampling refers to texture filtering whereas seperability refers to the fact that a square-kernel could be seperate into a vector and a transposed vector, which, multiplied, generate the original kernel again. That way, a convolution K * I can be replaced by the convolution v * ( v^T * I), which requires less arithmetics.
The problem you have is that your kernel is always even. That way, you never sample the pixel on which the kernel is applied to. You could shift the kernel pivot, so that you sample the original pixel, but then you have an odd weighting scheme for the surrounding pixels and you are weighting more the next right (or left pixel), so an odd NUM value is beneficial.
Second, I don't get why you sample in an intended fashion bilinear texture samples (this time this is the correct term) by adding a fraction on the tex-coord:
... texcoord0 + float2(kernel[i].y ...
Why don't you simply use vecViewport to get the pixel size and sample "real" neighbour pixels.
Third, you are sampling in a tapped fashion. Since your NUM value is storing the number of samples, you are going a length of NUM to the left (or up) and the double of the i-counter to the right (or down). You should divide both values by 2 to get a non-tapped sampling.
Fourth, since the gaussian bell is symmetric, it is totally sufficient to store only the weights of the half of the bell. Though, I don't know if the arithmetic overhead then generates more instructions, but I would only store half the kernel weights.
Best regards,
-Christian
Last edited by HeelX; 06/06/11 20:49.