I believe it cannot train because your system starts with no optimize call. The number of optimize calls must not change from bar to bar, since the parameter count is derived from it.