Nothing in particular is wrong per se with how the shader is written, there are almost always optimizations to be found though. About those:

- Generally speaking ASM is faster than HLSL. Although I haven't looked at Doom 3's shaders I would bet that a good portion of them are written using ASM. You can also use ASM within HLSL to optimize costly portions of the shaders by enclosing the ASM portion with a {};

- When using a lot of shaders (and this would qualify as a lot since it is used on the majority of surfaces) it is important to eliminate anything that is not a Programmable Shader. When the GPU processes the scene it is very costly in terms of GPU cycles to purge the FFP commands and load up the Programmable Shader commands. Some tests in my own engine have shown as much as 10-15% difference in the frame rate. This is something that might need to be fixed from within the engine itself since I am not sure if the engine will still issue built in FFP commands even if you manage to eliminate objects which use them from the scene.

- Co-issue commands. I only did a quick glance through the shader itself, but I didn't notice any such optimizations. If you leave this up to the GPU you may or may not recieve the benefits of this technique. However if you manually apply the proper masks you will recieve the performance boost.

- Optimize the number of passes. I am pretty sure you should be able to get that down to a 2 pass shader - although I will admit that most my shader programming has not been with 3DGS so there may be a limit that I am missing out on that causes you to render in this method. Use the first pass for culling to prevent overdraw and the second to actually render the lighting effects.

- Doom uses several rendering systems. Each major chipset has its own highly optimized rendering system including shaders which are optimized for that chipset. You can include similiar features in 3DGS with a bit of work too. An example of this are optimizing texture fetches and ALU instructions for ATI cards. Each clock cycle the GPU for the last generation of ATI cards (9500,9700, 9800...) could process one of each. By tweaking the techniques used you can optimize the shader to make full use of the available processing power of the GPU. NVidia has their own tricks and traps as well which is why using a shader specific to each chipset is important when looking at making the most out of shaders.



Other issues can really only be addressed with proper analysis of the scene. You may find that using so many normal maps and what not is leading to a bottleneck in texture calls. If this is the case you can look into using alternative methods of storing that data. Some of the issues need to be addressed by Conitec within the engine itself - others are more a factor of the end user of your shader (Nadester using normal maps which were way too large) but the biggest difference between a game like Doom and a tech demo done with 3DGS is the time spent tweaking it.

Writing the switches needed to identify end user hardware and even testing shaders on that hardware is very time consuming - but without doing that 3DGS will look like it is holding a candle in the wind when compared to Doom 3 or Half Life 2.


Virtual Worlds - Rebuilding the Universe one Pixel at a Time. Take a look - daily news and weekly content updates.