Moving to floats from doubles?

Posted By: Zheka

Moving to floats from doubles? - 06/08/22 11:45

I would like to suggest to move Zorro's default 'var' from type double to type float.

Most trading system calculations involve looping over N prior values to do some sort of summing/dividing/multiplying.

With floats, 2X as many values will be prefetched to the CPU L1 cache, there will be fewer cache misses, and with c++ compilers auto-vectorization performance with floats on loops can be nearly 2X than with doubles.

Division, sqrt, pow, exp, log, trigonometric functions can be 2x faster with floats; TA_Lib also has a version that consumes float arrays.

I did some measurements several years ago - and the speed up was quite significant.

I understand type conversion for intermediate calculations can be costly, but this will occasional and a less common case than looping over N prices/values.

Float max value is big enough for overflow to occur very rarely, if ever.

What do you think?

Posted By: AndrewAMD

Re: Moving to floats from doubles? - 06/08/22 13:20

To change var from double to float has two other problems:
1) It would break all kinds of script code. (A big no-no.)
2) Precision loss.

Maybe better to add float support as a standalone feature? Like fseries() for float, typedef float fvar, and typedef float* fvars, where the 'f' is for "fast"?

And then write the indicator functions to have both var and fvar overloads. This simplifies scripting.

Posted By: jcl

Re: Moving to floats from doubles? - 06/08/22 14:49

https://articles.emptycrate.com/2012/02/11/double_vs_float_which_is_better.html

Posted By: Zheka

Re: Moving to floats from doubles? - 06/08/22 18:42

Jason's article is not based on a realistic use case. The 1st comment below confirms my experience: looping over a vector is 2X+ faster with floats vs doubles (and compilers/autovectorization have improved since)

I would trade 2X performance difference for possible loss of precision any time. Given noisiness of financial time series, 'precision' is not meaningful that much (unlike with rocket trajectory calculations).

But performance gain will not only be achieved in loops: Zorro's structures will be much more compact, leading to fewer cache misses and significantly better performance. + Many operations - division, sqrt, etc - will be 2x faster.

With templating, most functions called from a user c++ code can work without changes to the syntax.

How best to handle it for lite-c api - I don't know.

Posted By: AndrewAMD

Re: Moving to floats from doubles? - 06/08/22 20:51

Here's an actual *.cpp Zorro script. float is not any faster.

Code

#include <chrono>
#include <typeinfo>
#include <vector>
#include <zorro.h>

template<typename T> void sum_test1(int num_times, T value)
{
	T val = 0;

	std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < num_times; ++i)
	{
		val += value;
		if (!wait(-100000)) {
			printf("\nAborted!");
			return;
		}
	}
	auto dur = std::chrono::high_resolution_clock::now() - t1;
	auto dur_ms = std::chrono::duration_cast<std::chrono::milliseconds>(dur);
	printf("\nType name: %s", typeid(T).name());
	printf("\n... Size in bytes : %d", sizeof(T));
	printf("\n... Summation time: %d ms", dur_ms);
	printf("\n... Summed value: %0.8f", (double)val);
}

template<typename T> void vec_test2(size_t vecSize, int num_times, T value)
{
	std::vector<T> vec;
	vec.resize(vecSize);
	std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < num_times; ++i)
	{
		for (auto& val : vec) {
			val *= value;
			if (!wait(-100000)) {
				printf("\nAborted!");
				return;
			}
		}
	}
	auto dur = std::chrono::high_resolution_clock::now() - t1;
	auto dur_ms = std::chrono::duration_cast<std::chrono::milliseconds>(dur);
	printf("\nType name: %s", typeid(T).name());
	printf("\n... Size in bytes : %d", sizeof(T));
	printf("\n... Vector lenth: %d", vecSize);
	printf("\n... num_times: %d", num_times);
	printf("\n... Vector time: %d ms", dur_ms);
}


DLLFUNC void run() {
	set(LOGFILE);
	sum_test1(100000000, 3.3f);
	sum_test1(100000000, 3.3f);
	sum_test1(100000000, 3.3f);
	sum_test1(100000000, 3.3);
	sum_test1(100000000, 3.3);
	sum_test1(100000000, 3.3);
	vec_test2(3000000,30, 3.3f);
	vec_test2(3000000,30, 3.3f);
	vec_test2(3000000,30, 3.3f);
	vec_test2(3000000, 30, 3.3);
	vec_test2(3000000, 30, 3.3);
	vec_test2(3000000, 30, 3.3);

	quit("!Done!");
}

/*  LOG OUTPUT:

Type name: float
... Size in bytes : 4
... Summation time: 2247 ms
... Summed value: 67108864.00000000
Type name: float
... Size in bytes : 4
... Summation time: 2121 ms
... Summed value: 67108864.00000000
Type name: float
... Size in bytes : 4
... Summation time: 2079 ms
... Summed value: 67108864.00000000
Type name: double
... Size in bytes : 8
... Summation time: 2133 ms
... Summed value: 330000000.62168288
Type name: double
... Size in bytes : 8
... Summation time: 2164 ms
... Summed value: 330000000.62168288
Type name: double
... Size in bytes : 8
... Summation time: 2100 ms
... Summed value: 330000000.62168288
Type name: float
... Size in bytes : 4
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1849 ms
Type name: float
... Size in bytes : 4
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1816 ms
Type name: float
... Size in bytes : 4
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1822 ms
Type name: double
... Size in bytes : 8
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1829 ms
Type name: double
... Size in bytes : 8
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1851 ms
Type name: double
... Size in bytes : 8
... Vector lenth: 3000000
... num_times: 30
... Vector time: 1879 ms
Done!

*/

Posted By: AndrewAMD

Re: Moving to floats from doubles? - 06/09/22 01:14

Interestingly, the speeds were slightly improved in 64-bit Zorro (about 14% faster overall). Nonetheless, no advantage to float over double.

Posted By: jcl

Re: Moving to floats from doubles? - 06/09/22 09:01

In the Zorro structs you can see that they use both float and double. They use double where speed matters, and float where memory footprint or file size matters. Internal calculations always use double.

For using external libraries such as ta-lib, there is no choice anyway. No serious finance library has floats nowadays. They all expect data in double arrays.