A short passage from an ATI paper:
Quote:

Dynamic branching
The ATI Radeon HD 2000 hardware has excellent dynamic branching performance for all three shader
stages. Using dynamic branching you can reduce the workload in e.g. a pixel shader by skipping past
instructions that don’t need to be executed. In a lighting situation, if the pixel is in shadow you don’t
need to compute the whole lighting equation but may return zero or the ambient value immediately.
Same thing if the pixel is beyond the light radius. It can be beneficial to rearrange your code so that you
can do some large scale culling early in the shader. For instance attenuation is usually cheap to compute,
so it makes sense to do that first in the shader. Depending on how your shader looks you may sometimes
see better performance if you stick to a small set of branches instead of several branches shortly after
each other. Thus it may be faster to multiply the attenuation with the shadow factor and check that value
against zero, rather than checking attenuation first and then shadow in a separate branch. This varies a
lot with the situation, so it’s recommended that you try both approaches and see which one comes out
faster in your application.
One thing to keep in mind though is that branches need to be coherent to achieve top performance. If
pixels within the same thread take different branches the hardware will have to execute both sides of the
branch and just select the results for each pixel. So for branches that generally are not coherent you will
probably see a performance loss compared to code without branching. For the ATI Radeon HD 2000 you need a coherency of at least 64 pixels or vertices. Anything that varies in smaller units than that in
general should not use dynamic branching. For example you may have the following code:

Code:
float diffuse = dot(lightVec, normal);
if (diffuse > 0.0)
{
// Compute lighting ...
}



If the normal is an interpolated normal across the surface this branch is fine. But if the normal comes out
of a high-frequency normal map this code may result in a performance loss. This is because normals
from a high-requency normal map can typically vary a lot from pixel to pixel. As a result, in most cases
the hardware will not be able to skip past the code within the if-statement, so there is no performance
gain to be had, but you incur a small performance hit from doing the actual branch test and possible
additional register pressure.


Better you google for more information. If you find something interesting let us know. We are as willing to learn as you. wink