the increased speed is mostly due to the usage of 8 additional 64 bit wide and 8 additional 128 bit (sse) cpu registers and because 64 bit integers can be computed with in the standard registers. of course all this depends on the compiler; mmx/sse registers (mmx/xmm... whatever) have ever been 64/128 bit wide and you still can't do floating point operations in the standard registers afaik.