Jason's article is not based on a realistic use case. The 1st comment below confirms my experience: looping over a vector is 2X+ faster with floats vs doubles (and compilers/autovectorization have improved since)

I would trade 2X performance difference for possible loss of precision any time. Given noisiness of financial time series, 'precision' is not meaningful that much (unlike with rocket trajectory calculations).

But performance gain will not only be achieved in loops: Zorro's structures will be much more compact, leading to fewer cache misses and significantly better performance. + Many operations - division, sqrt, etc - will be 2x faster.

With templating, most functions called from a user c++ code can work without changes to the syntax.

How best to handle it for lite-c api - I don't know.