thanks Jcl

Genetic optimization: when you say that it's slower than the current method what do you mean? If the optimization search domain is big enough, I think that the genetic algorithmic is faster than the actual method, isn't it?

Multiple cores and nodes: i understand the topic that you can use it only on specific parallelizable processes. As on WFO (that is an easy process - isolation speaking) you can use parallelization also in order to distribute the training process among different asset or algo (supposing obviously that you write the code, inside an asset\algo "while" loop, in a truly indipendent way from other loops).

CUDA: I'm not an expert and I could be wrong but I see that in Python is possible use it for Montecarlo, matrix manipulation,... (https://www.quantstart.com/articles).