Gamestudio Links
Zorro Links
Newest Posts
ZorroGPT
by TipmyPip. 01/31/26 10:01
Lapsa's very own thread
by Lapsa. 01/30/26 19:44
Zorro version 3.0 prerelease!
by TipmyPip. 01/30/26 13:36
Historical Data with the 64bit FXCM Plugin
by Martin_HH. 01/28/26 18:45
SGT_FW
by Aku_Aku. 01/27/26 15:10
Buy A8 Pro Version
by Ezzett. 01/26/26 14:22
C++ direct start requires Zorro S
by jcl. 01/23/26 14:29
AUM Magazine
Latest Screens
Dorifto samurai
Shadow 2
Rocker`s Revenge
Stug 3 Stormartillery
Who's Online Now
2 registered members (TipmyPip, clonman), 10,791 guests, and 5 spiders.
Key: Admin, Global Mod, Mod
Newest Members
Sfrdragon, mayarik, Castromangos, Quantum, stephensdeborah
19195 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
Page 13 of 13 1 2 11 12 13
TwinPulse Fractal Accord [Re: TipmyPip] #489118
01/24/26 03:55
01/24/26 03:55
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192

TwinPulse Fractal Accord is a two timeframe learning trader that blends fractal style pattern sensing with cooperative self tuning. On each bar it builds one price stream at a fast timeframe and another at a slower timeframe. From each stream it extracts three signals: a fractal roughness estimate, a linear trend slope, and a volatility of log returns. Those six signals feed a built in classifier that produces separate long and short tendencies. The trade engine compares them, turns the balance into a trade direction, and sizes exposure using adaptive leverage controls. Trades are also managed with a time based exit so positions do not linger past the learned holding duration.

Instead of fixing the configuration, the strategy treats key settings as a team of bandit agents. Each agent controls one knob and can pick from a small list of discrete settings called arms. Agents cover the fast timeframe, the slow offset, feature window lengths on both timeframes, leverage scaling, maximum leverage, the prediction threshold, and the holding duration. At the start of an episode the agents choose arms, the strategy trades with that joint configuration, and when it returns to flat it measures the episode result. The reward is normalized by holding time and clipped, so one unusually large outcome does not dominate learning.

Exploration is guided by an upper confidence rule. Each arm gets a score made from its current value estimate plus an uncertainty bonus that is larger when the arm has fewer samples. This encourages careful exploration early and more consistent exploitation later. A small random explore chance remains to keep coverage and to cope with regime changes, data quirks, and surprise shifts in market behavior.

Communication between agents is implemented through coordination tables that store how two choices perform together beyond their individual quality. The most important table links the fast timeframe agent with the slow offset agent. The slow offset agent scores each candidate using its own estimate plus a gated pair term tied to the chosen fast arm. Symmetry is added by letting the fast timeframe agent evaluate its candidates while anticipating the best responding slow offset. After both are selected, a short negotiation step can reconsider the fast timeframe given the chosen offset, improving coherence between the two lenses.

The same pattern is applied to other dependencies. Feature window agents communicate so fractal windows and slope windows on each timeframe tend to match a compatible scale, improving stability of the feature estimates. Risk control agents communicate so leverage scaling aligns with the prediction threshold, and maximum leverage aligns with holding duration, which shapes exposure time.

Pair influence is gated. Pair terms begin weak and grow only after enough joint trials, preventing noisy early episodes from locking in fragile coordination. Pair learning is residual: each pair table updates only the extra benefit or harm not explained by the two single arm estimates, reducing double counting and sharpening credit assignment.

Finally, a safety constraint blocks extreme combinations that would demand excessive warmup history or unstable feature computation, reducing wasted episodes and keeping training productive. Over many episodes the strategy becomes a small community of specialists that explore, share compatibility knowledge, and gradually converge on coordinated multi timeframe and risk settings that support steadier signals and cleaner trade timing overall, even when conditions drift.

Code
// ============================================================================
// Fractal Learner (Strategy 3) - RL Bandits for Parameters + 2TF ML (RETURNS)
// File: Fractal_EURUSD_2TF_RL_Bandits_v2_COMM_UCB_SYM_RES_SAFE.c (Zorro / lite-C)
//
// ADDED FEATURES (as requested):
// 1) Symmetric TF communication (TF1 "listens" too) via best-response scoring.
// 2) UCB exploration bonus (single + pair) instead of plain epsilon-greedy.
// 3) Reward normalization (per HoldBars) + clipping.
// 4) Pair-gating: pair influence ramps up only after enough samples.
// 5) More dependencies communicated:
//      - LevScale ? PredThr
//      - MaxLev   ? HoldBars
// 6) Pair-table affects both directions (1-step coordinate descent negotiation).
// 7) Residual pair tables (prevents double counting of single Q):
//      QpairRes(a,b) updated with reward - (Q_A(a)+Q_B(b)+QpairRes(a,b))
// 8) Safe constraints to avoid nonsense combos (reduces wasted episodes)
//
// lite-C safe:
// - No ternary operator (uses ifelse())
// - Header uses multiple file_append calls
// - strf format string is ONE literal
// ============================================================================

#define NPAR 12
#define MAXARMS 64

// Exploration / learning
#define EPSILON 0.10
#define ALPHA   0.10

// Communication strength (base)
#define LAMBDA_PAIR 0.30

// UCB constants
#define C_UCB   0.40
#define CP_UCB  0.40

// Pair gating (minimum samples before full lambda)
#define NMIN_PAIR 25

// Reward normalization/clipping
#define REWARD_CLIP 1000.0

// Safe constraint limit (keeps TF2 + windows from exploding warmup)
#define SAFE_LIMIT 2500

#define P_INT 0
#define P_VAR 1

// Parameter indices (RL-controlled)
#define P_TF1        0
#define P_TF2D       1
#define P_FDLEN1     2
#define P_SLOPELEN1  3
#define P_VOLLEN1    4
#define P_FDLEN2     5
#define P_SLOPELEN2  6
#define P_VOLLEN2    7
#define P_LEVSCALE   8
#define P_MAXLEV     9
#define P_PREDTHR    10
#define P_HOLDBARS   11

// -----------------------------
// RL storage
// -----------------------------
string ParName[NPAR];

int ParType[NPAR] =
{
	P_INT,  // TF1
	P_INT,  // TF2d
	P_INT,  // FDLen1
	P_INT,  // SlopeLen1
	P_INT,  // VolLen1
	P_INT,  // FDLen2
	P_INT,  // SlopeLen2
	P_INT,  // VolLen2
	P_VAR,  // LevScale
	P_VAR,  // MaxLev
	P_VAR,  // PredThr
	P_INT   // HoldBars
};

var ParMin[NPAR];
var ParMax[NPAR];
var ParStep[NPAR];

var Q[NPAR][MAXARMS];
int Ncnt[NPAR][MAXARMS];
int ArmsCount[NPAR];
int CurArm[NPAR];

// Totals for UCB (avoid summing every time)
int TotCnt[NPAR];

// -----------------------------
// Pairwise "communication" RESIDUAL tables
// (all are QpairRes; keep Npair and TotPair for gating+UCB)
// -----------------------------

// TF1 arm × TF2d arm
var Qpair_TF[MAXARMS][MAXARMS];
int Npair_TF[MAXARMS][MAXARMS];
int TotPair_TF;

// FDLen1 arm × SlopeLen1 arm
var Qpair_FD1SL1[MAXARMS][MAXARMS];
int Npair_FD1SL1[MAXARMS][MAXARMS];
int TotPair_FD1SL1;

// FDLen2 arm × SlopeLen2 arm
var Qpair_FD2SL2[MAXARMS][MAXARMS];
int Npair_FD2SL2[MAXARMS][MAXARMS];
int TotPair_FD2SL2;

// LevScale arm × PredThr arm
var Qpair_LS_PT[MAXARMS][MAXARMS];
int Npair_LS_PT[MAXARMS][MAXARMS];
int TotPair_LS_PT;

// MaxLev arm × HoldBars arm
var Qpair_ML_HB[MAXARMS][MAXARMS];
int Npair_ML_HB[MAXARMS][MAXARMS];
int TotPair_ML_HB;

// -----------------------------
// Utility
// -----------------------------
int calcArms(var mn, var mx, var stp)
{
	if(stp <= 0) return 1;
	int n = (int)floor((mx - mn)/stp + 1.000001);
	if(n < 1) n = 1;
	if(n > MAXARMS) n = MAXARMS;
	return n;
}

var armValue(int p, int a)
{
	var v = ParMin[p] + (var)a * ParStep[p];
	if(v < ParMin[p]) v = ParMin[p];
	if(v > ParMax[p]) v = ParMax[p];
	if(ParType[p] == P_INT) v = (var)(int)(v + 0.5);
	return v;
}

void initParNames()
{
	ParName[P_TF1]       = "TF1";
	ParName[P_TF2D]      = "TF2d";
	ParName[P_FDLEN1]    = "FDLen1";
	ParName[P_SLOPELEN1] = "SlopeLen1";
	ParName[P_VOLLEN1]   = "VolLen1";
	ParName[P_FDLEN2]    = "FDLen2";
	ParName[P_SLOPELEN2] = "SlopeLen2";
	ParName[P_VOLLEN2]   = "VolLen2";
	ParName[P_LEVSCALE]  = "LevScale";
	ParName[P_MAXLEV]    = "MaxLev";
	ParName[P_PREDTHR]   = "PredThr";
	ParName[P_HOLDBARS]  = "HoldBars";
}

// UCB bonus: c * sqrt( ln(1+Tot) / (1+n) )
var ucbBonus(int tot, int n, var c)
{
	var lnT = log(1 + (var)tot);
	return c * sqrt(lnT / (1 + (var)n));
}

// Pair gating: lambda_eff = lambda * min(1, nPair/nMin)
var lambdaEff(int nPair)
{
	var frac = (var)nPair / (var)NMIN_PAIR;
	if(frac > 1) frac = 1;
	if(frac < 0) frac = 0;
	return LAMBDA_PAIR * frac;
}

// -----------------------------
// Single-arm selection with UCB + epsilon explore
// -----------------------------
int bestArm_UCB(int p)
{
	int a, best = 0;
	var bestScore = Q[p][0] + ucbBonus(TotCnt[p], Ncnt[p][0], C_UCB);

	for(a=1; a<ArmsCount[p]; a++)
	{
		var score = Q[p][a] + ucbBonus(TotCnt[p], Ncnt[p][a], C_UCB);
		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_UCB(int p)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[p]);
	return bestArm_UCB(p);
}

// Update single Q
void updateArm(int p, int a, var reward)
{
	Q[p][a] = Q[p][a] + ALPHA*(reward - Q[p][a]);
	Ncnt[p][a] += 1;
	TotCnt[p] += 1;
}

// -----------------------------
// Residual pair update: QpairRes += alpha * (r - (QA + QB + QpairRes))
// -----------------------------
void updatePairRes_TF(int a1, int a2, var reward)
{
	var pred = Q[P_TF1][a1] + Q[P_TF2D][a2] + Qpair_TF[a1][a2];
	Qpair_TF[a1][a2] = Qpair_TF[a1][a2] + ALPHA*(reward - pred);
	Npair_TF[a1][a2] += 1;
	TotPair_TF += 1;
}

void updatePairRes_FD1SL1(int a1, int a2, var reward)
{
	var pred = Q[P_FDLEN1][a1] + Q[P_SLOPELEN1][a2] + Qpair_FD1SL1[a1][a2];
	Qpair_FD1SL1[a1][a2] = Qpair_FD1SL1[a1][a2] + ALPHA*(reward - pred);
	Npair_FD1SL1[a1][a2] += 1;
	TotPair_FD1SL1 += 1;
}

void updatePairRes_FD2SL2(int a1, int a2, var reward)
{
	var pred = Q[P_FDLEN2][a1] + Q[P_SLOPELEN2][a2] + Qpair_FD2SL2[a1][a2];
	Qpair_FD2SL2[a1][a2] = Qpair_FD2SL2[a1][a2] + ALPHA*(reward - pred);
	Npair_FD2SL2[a1][a2] += 1;
	TotPair_FD2SL2 += 1;
}

void updatePairRes_LS_PT(int a1, int a2, var reward)
{
	var pred = Q[P_LEVSCALE][a1] + Q[P_PREDTHR][a2] + Qpair_LS_PT[a1][a2];
	Qpair_LS_PT[a1][a2] = Qpair_LS_PT[a1][a2] + ALPHA*(reward - pred);
	Npair_LS_PT[a1][a2] += 1;
	TotPair_LS_PT += 1;
}

void updatePairRes_ML_HB(int a1, int a2, var reward)
{
	var pred = Q[P_MAXLEV][a1] + Q[P_HOLDBARS][a2] + Qpair_ML_HB[a1][a2];
	Qpair_ML_HB[a1][a2] = Qpair_ML_HB[a1][a2] + ALPHA*(reward - pred);
	Npair_ML_HB[a1][a2] += 1;
	TotPair_ML_HB += 1;
}

// -----------------------------
// Pair-aware selection with UCB + gating + residual pair
// score(b) = Q_B(b)+UCB + lambda_eff*QpairRes(a,b) + pairUCB
// -----------------------------
int bestArm_TF2d_given_TF1(int armTF1)
{
	int a, best = 0;

	int n0 = Npair_TF[armTF1][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_TF[armTF1][0] + ucbBonus(TotPair_TF, n0, CP_UCB);

	var bestScore =
		Q[P_TF2D][0] + ucbBonus(TotCnt[P_TF2D], Ncnt[P_TF2D][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_TF2D]; a++)
	{
		int n = Npair_TF[armTF1][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_TF[armTF1][a] + ucbBonus(TotPair_TF, n, CP_UCB);

		var score =
			Q[P_TF2D][a] + ucbBonus(TotCnt[P_TF2D], Ncnt[P_TF2D][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_TF2d_given_TF1(int armTF1)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_TF2D]);
	return bestArm_TF2d_given_TF1(armTF1);
}

// TF1 symmetric best-response score:
// scoreTF1(a1) = Q_TF1(a1)+UCB + max_a2( lambda_eff*QpairRes(a1,a2) + pairUCB )
int bestArm_TF1_symmetric()
{
	int a1, best = 0;

	// compute bestScore for a1=0
	var bestPairTerm0 = 0;
	int a2;
	for(a2=0; a2<ArmsCount[P_TF2D]; a2++)
	{
		int n = Npair_TF[0][a2];
		var lam = lambdaEff(n);
		var term = lam*Qpair_TF[0][a2] + ucbBonus(TotPair_TF, n, CP_UCB);
		if(a2==0) bestPairTerm0 = term;
		else if(term > bestPairTerm0) bestPairTerm0 = term;
	}

	var bestScore =
		Q[P_TF1][0] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][0], C_UCB) + bestPairTerm0;

	for(a1=1; a1<ArmsCount[P_TF1]; a1++)
	{
		var bestPairTerm = 0;
		for(a2=0; a2<ArmsCount[P_TF2D]; a2++)
		{
			int n2 = Npair_TF[a1][a2];
			var lam2 = lambdaEff(n2);
			var term2 = lam2*Qpair_TF[a1][a2] + ucbBonus(TotPair_TF, n2, CP_UCB);
			if(a2==0) bestPairTerm = term2;
			else if(term2 > bestPairTerm) bestPairTerm = term2;
		}

		var score =
			Q[P_TF1][a1] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][a1], C_UCB) + bestPairTerm;

		if(score > bestScore)
		{
			bestScore = score;
			best = a1;
		}
	}
	return best;
}

int selectArm_TF1_symmetric()
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_TF1]);
	return bestArm_TF1_symmetric();
}

// Coordinate descent step: re-pick TF1 given TF2d
int bestArm_TF1_given_TF2d(int armTF2d)
{
	int a1, best = 0;

	int n0 = Npair_TF[0][armTF2d];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_TF[0][armTF2d] + ucbBonus(TotPair_TF, n0, CP_UCB);

	var bestScore =
		Q[P_TF1][0] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][0], C_UCB) + pairB0;

	for(a1=1; a1<ArmsCount[P_TF1]; a1++)
	{
		int n = Npair_TF[a1][armTF2d];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_TF[a1][armTF2d] + ucbBonus(TotPair_TF, n, CP_UCB);

		var score =
			Q[P_TF1][a1] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][a1], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a1;
		}
	}
	return best;
}

// Generic pair selectors for the other 4 pairs (FD1-SL1, FD2-SL2, LS-PT, ML-HB)

int bestArm_SlopeLen1_given_FDLen1(int armFD1)
{
	int a, best = 0;

	int n0 = Npair_FD1SL1[armFD1][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_FD1SL1[armFD1][0] + ucbBonus(TotPair_FD1SL1, n0, CP_UCB);

	var bestScore =
		Q[P_SLOPELEN1][0] + ucbBonus(TotCnt[P_SLOPELEN1], Ncnt[P_SLOPELEN1][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_SLOPELEN1]; a++)
	{
		int n = Npair_FD1SL1[armFD1][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_FD1SL1[armFD1][a] + ucbBonus(TotPair_FD1SL1, n, CP_UCB);

		var score =
			Q[P_SLOPELEN1][a] + ucbBonus(TotCnt[P_SLOPELEN1], Ncnt[P_SLOPELEN1][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_SlopeLen1_given_FDLen1(int armFD1)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_SLOPELEN1]);
	return bestArm_SlopeLen1_given_FDLen1(armFD1);
}

int bestArm_SlopeLen2_given_FDLen2(int armFD2)
{
	int a, best = 0;

	int n0 = Npair_FD2SL2[armFD2][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_FD2SL2[armFD2][0] + ucbBonus(TotPair_FD2SL2, n0, CP_UCB);

	var bestScore =
		Q[P_SLOPELEN2][0] + ucbBonus(TotCnt[P_SLOPELEN2], Ncnt[P_SLOPELEN2][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_SLOPELEN2]; a++)
	{
		int n = Npair_FD2SL2[armFD2][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_FD2SL2[armFD2][a] + ucbBonus(TotPair_FD2SL2, n, CP_UCB);

		var score =
			Q[P_SLOPELEN2][a] + ucbBonus(TotCnt[P_SLOPELEN2], Ncnt[P_SLOPELEN2][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_SlopeLen2_given_FDLen2(int armFD2)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_SLOPELEN2]);
	return bestArm_SlopeLen2_given_FDLen2(armFD2);
}

int bestArm_PredThr_given_LevScale(int armLS)
{
	int a, best = 0;

	int n0 = Npair_LS_PT[armLS][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_LS_PT[armLS][0] + ucbBonus(TotPair_LS_PT, n0, CP_UCB);

	var bestScore =
		Q[P_PREDTHR][0] + ucbBonus(TotCnt[P_PREDTHR], Ncnt[P_PREDTHR][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_PREDTHR]; a++)
	{
		int n = Npair_LS_PT[armLS][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_LS_PT[armLS][a] + ucbBonus(TotPair_LS_PT, n, CP_UCB);

		var score =
			Q[P_PREDTHR][a] + ucbBonus(TotCnt[P_PREDTHR], Ncnt[P_PREDTHR][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_PredThr_given_LevScale(int armLS)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_PREDTHR]);
	return bestArm_PredThr_given_LevScale(armLS);
}

int bestArm_HoldBars_given_MaxLev(int armML)
{
	int a, best = 0;

	int n0 = Npair_ML_HB[armML][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_ML_HB[armML][0] + ucbBonus(TotPair_ML_HB, n0, CP_UCB);

	var bestScore =
		Q[P_HOLDBARS][0] + ucbBonus(TotCnt[P_HOLDBARS], Ncnt[P_HOLDBARS][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_HOLDBARS]; a++)
	{
		int n = Npair_ML_HB[armML][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_ML_HB[armML][a] + ucbBonus(TotPair_ML_HB, n, CP_UCB);

		var score =
			Q[P_HOLDBARS][a] + ucbBonus(TotCnt[P_HOLDBARS], Ncnt[P_HOLDBARS][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_HoldBars_given_MaxLev(int armML)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_HOLDBARS]);
	return bestArm_HoldBars_given_MaxLev(armML);
}

// -----------------------------
// Init ranges + tables
// -----------------------------
void initParamsRL()
{
	// Ranges roughly matching your old optimize() ranges/steps
	ParMin[P_TF1]        = 1;    ParMax[P_TF1]        = 3;    ParStep[P_TF1]        = 1;
	ParMin[P_TF2D]       = 1;    ParMax[P_TF2D]       = 11;   ParStep[P_TF2D]       = 1;

	ParMin[P_FDLEN1]     = 20;   ParMax[P_FDLEN1]     = 220;  ParStep[P_FDLEN1]     = 5;
	ParMin[P_SLOPELEN1]  = 20;   ParMax[P_SLOPELEN1]  = 200;  ParStep[P_SLOPELEN1]  = 5;
	ParMin[P_VOLLEN1]    = 20;   ParMax[P_VOLLEN1]    = 200;  ParStep[P_VOLLEN1]    = 5;

	ParMin[P_FDLEN2]     = 10;   ParMax[P_FDLEN2]     = 160;  ParStep[P_FDLEN2]     = 5;
	ParMin[P_SLOPELEN2]  = 10;   ParMax[P_SLOPELEN2]  = 140;  ParStep[P_SLOPELEN2]  = 5;
	ParMin[P_VOLLEN2]    = 10;   ParMax[P_VOLLEN2]    = 140;  ParStep[P_VOLLEN2]    = 5;

	ParMin[P_LEVSCALE]   = 2;    ParMax[P_LEVSCALE]   = 30;   ParStep[P_LEVSCALE]   = 1;
	ParMin[P_MAXLEV]     = 0.1;  ParMax[P_MAXLEV]     = 1.0;  ParStep[P_MAXLEV]     = 0.1;
	ParMin[P_PREDTHR]    = 0.0;  ParMax[P_PREDTHR]    = 0.20; ParStep[P_PREDTHR]    = 0.01;
	ParMin[P_HOLDBARS]   = 1;    ParMax[P_HOLDBARS]   = 30;   ParStep[P_HOLDBARS]   = 1;

	int p, a;
	for(p=0; p<NPAR; p++)
	{
		ArmsCount[p] = calcArms(ParMin[p], ParMax[p], ParStep[p]);
		CurArm[p] = 0;
		TotCnt[p] = 0;
		for(a=0; a<ArmsCount[p]; a++)
		{
			Q[p][a] = 0;
			Ncnt[p][a] = 0;
		}
	}

	// Init pair tables
	int i,j;
	TotPair_TF = 0;
	TotPair_FD1SL1 = 0;
	TotPair_FD2SL2 = 0;
	TotPair_LS_PT = 0;
	TotPair_ML_HB = 0;

	for(i=0; i<MAXARMS; i++)
	{
		for(j=0; j<MAXARMS; j++)
		{
			Qpair_TF[i][j] = 0;      Npair_TF[i][j] = 0;
			Qpair_FD1SL1[i][j] = 0;  Npair_FD1SL1[i][j] = 0;
			Qpair_FD2SL2[i][j] = 0;  Npair_FD2SL2[i][j] = 0;
			Qpair_LS_PT[i][j] = 0;   Npair_LS_PT[i][j] = 0;
			Qpair_ML_HB[i][j] = 0;   Npair_ML_HB[i][j] = 0;
		}
	}
}

// -----------------------------
// Safe constraints helper
// -----------------------------
int safeOK(int TF1_arm, int TF2d_arm, int FDLen2_arm, int SlopeLen2_arm, int VolLen2_arm)
{
	int TF1v = (int)armValue(P_TF1, TF1_arm);
	int TF2dv = (int)armValue(P_TF2D, TF2d_arm);
	int TF2v = TF1v + TF2dv;
	if(TF2v > 12) TF2v = 12;

	int FD2v = (int)armValue(P_FDLEN2, FDLen2_arm);
	int SL2v = (int)armValue(P_SLOPELEN2, SlopeLen2_arm);
	int VL2v = (int)armValue(P_VOLLEN2, VolLen2_arm);

	int mx = FD2v;
	if(SL2v > mx) mx = SL2v;
	if(VL2v > mx) mx = VL2v;

	// TF2 * max(window2) <= SAFE_LIMIT
	if(TF2v * mx > SAFE_LIMIT)
		return 0;
	return 1;
}

// -----------------------------
// COMMUNICATING parameter pick order (symmetric + extra pairs + coord descent)
// -----------------------------
void pickParams()
{
	int p;

	// 1) TF1 symmetric pick (anticipate TF2d best response)
	CurArm[P_TF1] = selectArm_TF1_symmetric();

	// 2) TF2d conditioned on TF1 (pair-aware)
	CurArm[P_TF2D] = selectArm_TF2d_given_TF1(CurArm[P_TF1]);

	// 3) Coordinate descent: re-pick TF1 given TF2d (tiny negotiation step)
	if(random(1) >= EPSILON) // keep some stochasticity
		CurArm[P_TF1] = bestArm_TF1_given_TF2d(CurArm[P_TF2D]);

	// 4) FD/Slope pair TF1
	CurArm[P_FDLEN1]    = selectArm_UCB(P_FDLEN1);
	CurArm[P_SLOPELEN1] = selectArm_SlopeLen1_given_FDLen1(CurArm[P_FDLEN1]);

	// 5) FD/Slope pair TF2
	CurArm[P_FDLEN2]    = selectArm_UCB(P_FDLEN2);
	CurArm[P_SLOPELEN2] = selectArm_SlopeLen2_given_FDLen2(CurArm[P_FDLEN2]);

	// 6) Remaining independent params first (we'll coordinate the extra pairs after)
	for(p=0; p<NPAR; p++)
	{
		if(p == P_TF1) continue;
		if(p == P_TF2D) continue;
		if(p == P_FDLEN1) continue;
		if(p == P_SLOPELEN1) continue;
		if(p == P_FDLEN2) continue;
		if(p == P_SLOPELEN2) continue;
		CurArm[p] = selectArm_UCB(p);
	}

	// 7) Extra pairs communication:
	// LevScale -> PredThr conditioned
	CurArm[P_LEVSCALE] = selectArm_UCB(P_LEVSCALE);
	CurArm[P_PREDTHR]  = selectArm_PredThr_given_LevScale(CurArm[P_LEVSCALE]);

	// MaxLev -> HoldBars conditioned
	CurArm[P_MAXLEV]   = selectArm_UCB(P_MAXLEV);
	CurArm[P_HOLDBARS] = selectArm_HoldBars_given_MaxLev(CurArm[P_MAXLEV]);

	// 8) Safe constraints: if violated, resample the most relevant arms a few times
	int tries = 0;
	while(tries < 20)
	{
		if(safeOK(CurArm[P_TF1], CurArm[P_TF2D], CurArm[P_FDLEN2], CurArm[P_SLOPELEN2], CurArm[P_VOLLEN2]))
			break;

		// resample TF2d (conditioned) and TF2-side windows
		CurArm[P_TF2D] = selectArm_TF2d_given_TF1(CurArm[P_TF1]);

		CurArm[P_FDLEN2]    = selectArm_UCB(P_FDLEN2);
		CurArm[P_SLOPELEN2] = selectArm_SlopeLen2_given_FDLen2(CurArm[P_FDLEN2]);
		CurArm[P_VOLLEN2]   = selectArm_UCB(P_VOLLEN2);

		tries += 1;
	}
}

// -----------------------------
// Feature helpers (lite-C safe)
// -----------------------------
function fractalDimKatz(vars P, int N)
{
	if(N < 2) return 1.0;

	var L = 0;
	int i;
	for(i=0; i<N-1; i++)
		L += abs(P[i] - P[i+1]);

	var d = 0;
	for(i=1; i<N; i++)
	{
		var di = abs(P[i] - P[0]);
		if(di > d) d = di;
	}

	if(L <= 0 || d <= 0) return 1.0;

	var n  = (var)N;
	var fd = log(n) / (log(n) + log(d / L));
	return clamp(fd, 1.0, 2.0);
}

function linSlope(vars P, int N)
{
	if(N < 2) return 0;

	var sumT=0, sumP=0, sumTT=0, sumTP=0;
	int i;
	for(i=0; i<N; i++)
	{
		var t = (var)i;
		sumT  += t;
		sumP  += P[i];
		sumTT += t*t;
		sumTP += t*P[i];
	}

	var denom = (var)N*sumTT - sumT*sumT;
	if(abs(denom) < 1e-12) return 0;

	return ((var)N*sumTP - sumT*sumP) / denom;
}

function stdevReturns(vars R, int N)
{
	if(N < 2) return 0;

	var mean = 0;
	int i;
	for(i=0; i<N; i++) mean += R[i];
	mean /= (var)N;

	var v = 0;
	for(i=0; i<N; i++)
	{
		var d = R[i] - mean;
		v += d*d;
	}
	v /= (var)(N-1);

	return sqrt(max(0, v));
}

// ============================================================================
// RUN
// ============================================================================
function run()
{
	BarPeriod = 5;
	StartDate = 20100101;
	EndDate   = 0;

	set(PLOTNOW|RULES|LOGFILE);

	asset("EUR/USD");
	algo("FRACTAL2TF_EUR_RL_v2_COMM_UCB_SYM");

	var eps = 1e-12;
	DataSplit = 50;

	// LookBack must cover the MAX possible TF * MAX window.
	LookBack = 3000;

	// One-time init
	static int Inited = 0;
	static int PrevOpenTotal = 0;
	static var LastBalance = 0;
	static int Flip = 0;

	string LogFN = "Log\\FRACTAL2TF_EUR_RL_v2_COMM_UCB_SYM.csv";

	if(is(FIRSTINITRUN))
	{
		file_delete(LogFN);

		file_append(LogFN,"Date,Time,Mode,Bar,");
		file_append(LogFN,"TF1,TF2,TF2d,FDLen1,SlopeLen1,VolLen1,FDLen2,SlopeLen2,VolLen2,");
		file_append(LogFN,"LevScale,MaxLev,PredThr,HoldBars,");
		file_append(LogFN,"FD1,Slope1,Vol1,FD2,Slope2,Vol2,");
		file_append(LogFN,"PredL,PredS,Pred,Lev,RewardNorm\n");

		Inited = 0;
		PrevOpenTotal = 0;
		LastBalance = 0;
		Flip = 0;
	}

	if(!Inited)
	{
		initParNames();
		initParamsRL();
		pickParams();

		LastBalance = Balance;
		PrevOpenTotal = NumOpenTotal;

		Inited = 1;
	}

	// Convert chosen arms -> parameter values (current episode)
	int TF1 = (int)armValue(P_TF1, CurArm[P_TF1]);
	int TF2d = (int)armValue(P_TF2D, CurArm[P_TF2D]);
	int TF2 = TF1 + TF2d;
	if(TF2 > 12) TF2 = 12;

	int FDLen1    = (int)armValue(P_FDLEN1,    CurArm[P_FDLEN1]);
	int SlopeLen1 = (int)armValue(P_SLOPELEN1, CurArm[P_SLOPELEN1]);
	int VolLen1   = (int)armValue(P_VOLLEN1,   CurArm[P_VOLLEN1]);

	int FDLen2    = (int)armValue(P_FDLEN2,    CurArm[P_FDLEN2]);
	int SlopeLen2 = (int)armValue(P_SLOPELEN2, CurArm[P_SLOPELEN2]);
	int VolLen2   = (int)armValue(P_VOLLEN2,   CurArm[P_VOLLEN2]);

	var LevScale  = armValue(P_LEVSCALE, CurArm[P_LEVSCALE]);
	var MaxLev    = armValue(P_MAXLEV,   CurArm[P_MAXLEV]);
	var PredThr   = armValue(P_PREDTHR,  CurArm[P_PREDTHR]);
	int HoldBars  = (int)armValue(P_HOLDBARS, CurArm[P_HOLDBARS]);

	// Build series (2 TF)
	TimeFrame = TF1;
	vars P1 = series(priceClose());
	vars R1 = series(log(max(eps,P1[0]) / max(eps,P1[1])));

	vars FD1S    = series(0);
	vars Slope1S = series(0);
	vars Vol1S   = series(0);

	TimeFrame = TF2;
	vars P2 = series(priceClose());
	vars R2 = series(log(max(eps,P2[0]) / max(eps,P2[1])));

	vars FD2S    = series(0);
	vars Slope2S = series(0);
	vars Vol2S   = series(0);

	TimeFrame = 1;

	// Warmup gate based on current episode params
	int Need1 = max(max(FDLen1, SlopeLen1), VolLen1) + 5;
	int Need2 = max(max(FDLen2, SlopeLen2), VolLen2) + 5;
	int WarmupBars = max(TF1*Need1, TF2*Need2) + 10;

	if(Bar < WarmupBars)
		return;

	// Do NOT block TRAIN during LOOKBACK
	if(is(LOOKBACK) && !Train)
		return;

	// Compute features
	TimeFrame = TF1;
	FD1S[0]    = fractalDimKatz(P1, FDLen1);
	Slope1S[0] = linSlope(P1, SlopeLen1);
	Vol1S[0]   = stdevReturns(R1, VolLen1);

	TimeFrame = TF2;
	FD2S[0]    = fractalDimKatz(P2, FDLen2);
	Slope2S[0] = linSlope(P2, SlopeLen2);
	Vol2S[0]   = stdevReturns(R2, VolLen2);

	TimeFrame = 1;

	var Sig[6];
	Sig[0] = FD1S[0];
	Sig[1] = Slope1S[0];
	Sig[2] = Vol1S[0];
	Sig[3] = FD2S[0];
	Sig[4] = Slope2S[0];
	Sig[5] = Vol2S[0];

	// Trading logic
	int MethodBase = PERCEPTRON + FUZZY + BALANCED;
	int MethodRet  = MethodBase + RETURNS;

	var PredL=0, PredS=0, Pred=0, Lev=0;

	// time-based exit
	if(NumOpenTotal > 0)
		for(open_trades)
			if(TradeIsOpen && TradeBars >= HoldBars)
				exitTrade(ThisTrade);

	if(Train)
	{
		// Forced alternating trades so ML always gets samples
		if(NumOpenTotal == 0)
		{
			Flip = 1 - Flip;
			LastBalance = Balance;

			if(Flip)
			{
				adviseLong(MethodRet, 0, Sig, 6);
				Lots = 1; enterLong();
			}
			else
			{
				adviseShort(MethodRet, 0, Sig, 6);
				Lots = 1; enterShort();
			}
		}
	}
	else
	{
		PredL = adviseLong(MethodBase, 0, Sig, 6);
		PredS = adviseShort(MethodBase, 0, Sig, 6);

		// Bootstrap if model has no signal yet
		if(NumOpenTotal == 0 && PredL == 0 && PredS == 0)
		{
			LastBalance = Balance;
			var s = Sig[1] + Sig[4];
			if(s > 0) { Lots=1; enterLong(); }
			else if(s < 0) { Lots=1; enterShort(); }
			else
			{
				if(random(1) < 0.5) { Lots=1; enterLong(); }
				else                { Lots=1; enterShort(); }
			}
		}
		else
		{
			Pred = PredL - PredS;
			Lev  = clamp(Pred * LevScale, -MaxLev, MaxLev);

			if(Lev > PredThr)       { exitShort(); Lots=1; enterLong();  }
			else if(Lev < -PredThr) { exitLong();  Lots=1; enterShort(); }
			else                    { exitLong();  exitShort(); }
		}
	}

	// RL reward + update (episode ends when we go from having positions to flat)
	var RewardNorm = 0;

	if(PrevOpenTotal > 0 && NumOpenTotal == 0)
	{
		var dBal = Balance - LastBalance;

		// reward normalization: per (1+HoldBars)
		RewardNorm = dBal / (1 + (var)HoldBars);

		// clip reward
		if(RewardNorm >  REWARD_CLIP) RewardNorm =  REWARD_CLIP;
		if(RewardNorm < -REWARD_CLIP) RewardNorm = -REWARD_CLIP;

		// --- Update singles (always, even if RewardNorm==0, for counts/UCB) ---
		int p;
		for(p=0; p<NPAR; p++)
			updateArm(p, CurArm[p], RewardNorm);

		// --- Update residual pair tables (COMMUNICATION LEARNING) ---
		updatePairRes_TF(CurArm[P_TF1], CurArm[P_TF2D], RewardNorm);
		updatePairRes_FD1SL1(CurArm[P_FDLEN1], CurArm[P_SLOPELEN1], RewardNorm);
		updatePairRes_FD2SL2(CurArm[P_FDLEN2], CurArm[P_SLOPELEN2], RewardNorm);
		updatePairRes_LS_PT(CurArm[P_LEVSCALE], CurArm[P_PREDTHR], RewardNorm);
		updatePairRes_ML_HB(CurArm[P_MAXLEV], CurArm[P_HOLDBARS], RewardNorm);

		// Next episode parameters
		pickParams();
		LastBalance = Balance;
	}

	PrevOpenTotal = NumOpenTotal;

	// Logging
	string ModeStr = "Trade";
	if(Train) ModeStr = "Train";
	else if(Test) ModeStr = "Test";

	file_append(LogFN, strf("%04i-%02i-%02i,%02i:%02i,%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%.6f,%.3f,%.6f,%d,%.6f,%.8f,%.8f,%.6f,%.8f,%.8f,%.8f,%.8f,%.8f,%.6f,%.6f\n",
		year(0),month(0),day(0), hour(0),minute(0),
		ModeStr, Bar,
		TF1, TF2, TF2d,
		FDLen1, SlopeLen1, VolLen1,
		FDLen2, SlopeLen2, VolLen2,
		LevScale, MaxLev, PredThr, HoldBars,
		Sig[0], Sig[1], Sig[2], Sig[3], Sig[4], Sig[5],
		PredL, PredS, Pred, Lev, RewardNorm
	));

	// Plots
	plot("FD_TF1",    Sig[0], NEW, 0);
	plot("FD_TF2",    Sig[3], 0, 0);
	plot("Slope_TF1", Sig[1], 0, 0);
	plot("Slope_TF2", Sig[4], 0, 0);
	plot("Vol_TF1",   Sig[2], 0, 0);
	plot("Vol_TF2",   Sig[5], 0, 0);
	plot("Pred",      Pred, 0, 0);
	plot("Lev",       Lev, 0, 0);
	plot("RewardN",   RewardNorm, 0, 0);
}

ScaleWeave Sentinel [Re: TipmyPip] #489119
01/24/26 04:10
01/24/26 04:10
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
This sentinel is a two pace fractal learner that tunes itself while it trades. It observes the market through a fast lens and a slow lens, then teaches those lenses to cooperate instead of competing. A bandit controller selects values for timeframes, feature windows, and risk controls. It balances exploration with exploitation by adding an optimism boost for options sampled less often. It records how combinations behave, not just single settings, so coordination improves over time.

Coordination begins with how the slow lens is represented. The strategy does not learn synergy on the offset used to reach the slow lens. It learns synergy on the derived slow timeframe that is actually used to build the slow price series. The fast timeframe and the offset are combined into that final slow timeframe, and cooperation feedback is stored against the pair formed from the fast choice and the final slow choice. If different offsets create the same slow view, they reinforce the same cooperation signal, keeping learning focused on the true slow scale and cleaner credit assignment.

The fast lens can also revise itself after the slow lens is chosen. This is a negotiation step. The fast side reevaluates candidates given the slow decision and the cooperation memories, and it may switch to a better matching fast setting. The revision is applied only sometimes, which preserves randomness and prevents early lock in. This back and forth approximates joint search without enumerating every full combination.

A second communication channel ties the slow timeframe to the slow side window sizes. Window lengths determine how much history the slow lens needs before its features become stable, and they also determine how long warmup lasts. Very large windows combined with very slow sampling can cause long delays, fewer learning episodes, and brittle behavior. Instead of rejecting and resampling until a safe combination appears, the strategy applies a safety cost during selection. When a candidate pairing implies excessive history demand, its score is reduced, steering choices toward feasible regions without hard bans.

To make this interaction explicit, the strategy keeps a dedicated memory table keyed by the slow timeframe and a binned summary of the largest slow window. The summary is the maximum among the slow fractal window, the slow slope window, and the slow volatility window. The table learns which pairings are productive from realized trade outcomes, capturing patterns such as slow sampling working best with modest windows, or moderate sampling supporting broader context without starving the learner of episodes.

Coordination is also improved at the feature level. Alongside the core fractal, slope, and volatility features from both lenses, the strategy adds three alignment tags. One tag describes trend agreement, reporting whether the fast and slow slopes point the same way, point opposite ways, or show no clear direction. A second tag describes volatility agreement, reporting whether the slow view is calmer or noisier relative to the fast view. A third tag describes structural contrast, reporting whether the slow fractal texture looks smoother or more intricate than the fast one.

A further channel links the slow timeframe to the holding horizon. Rewards are normalized by holding duration, and timeframe affects signal frequency, so the best slow sampling pace depends on how long positions are kept. The strategy therefore learns a cooperation table between the slow timeframe and the chosen holding setting. This reduces rhythm failures such as trading too often with long holds, or trading too slowly when the hold setting is short.

Cooperation memories are stored as residual effects. Each single setting has its own running value estimate, and each cooperation table stores only the extra contribution beyond those single values. Pair influence ramps up gradually as joint evidence accumulates, so early noise does not dominate. This keeps the communication signal honest and avoids double counting. Pair memories are updated only when a cycle closes, so the controller learns from outcomes. Rewards are normalized by holding time and clipped to prevent spikes from dominating. This keeps learning stable during training and testing.

During trading, the inner learner converts the feature snapshot into long and short preferences. Position intensity is scaled by a leverage factor, capped by a maximum exposure, and filtered by a threshold so weak signals stay flat. Exits are both signal driven and time driven, creating clear episodes that the controller can score. Over many episodes, ScaleWeave Sentinel becomes a stable coordinator that matches timeframes, windows, and holding rhythm while keeping behavior inside practical limits.

Code
// ============================================================================
// Fractal Learner (Strategy 3) - RL Bandits for Parameters + 2TF ML (RETURNS)
// File: Fractal_EURUSD_2TF_RL_Bandits_v3_TF2_W2_ALIGN_TF2HB_SOFTSAFE.c
//
// Adds requested improvements:
// 1) Communicate on derived TF2 (not TF2d): residual table Qpair_TF12(TF1_arm, TF2_idx)
// 2) Learn TF2 ? W2max interaction: Qpair_TF2_W2(TF2_idx, W2bin)
// 3) Alignment meta-features added to ML: Aslope, Asigma, AFD  (Sig size 9)
// 4) Learn TF2 ? HoldBars: Qpair_TF2_HB(TF2_idx, HoldBars_arm)
// 5) Replace rejection sampling with soft safety penalty in selection scoring
//
// lite-C safe:
// - No ternary operator (uses ifelse())
// - Header uses multiple file_append calls
// - strf format string is ONE literal
// ============================================================================

#define NPAR 12
#define MAXARMS 64

// Exploration / learning
#define EPSILON 0.10
#define ALPHA   0.10

// Communication strength (base)
#define LAMBDA_PAIR 0.30

// UCB constants
#define C_UCB   0.40
#define CP_UCB  0.40

// Pair gating (minimum samples before full lambda)
#define NMIN_PAIR 25

// Reward normalization/clipping
#define REWARD_CLIP 1000.0

// Safe constraint limit (keeps TF2 + windows from exploding warmup)
#define SAFE_LIMIT 2500

// Soft safety penalty weight (score -= ETA_SAFE * pi)
#define ETA_SAFE  2.0

// TF limits
#define TF_MAX 12

// W2 binning (based on your window2 ranges: 10..160 step 5)
#define W2_MIN  10
#define W2_MAX  160
#define W2_STEP 5
#define W2_BINS 31   // (160-10)/5 + 1 = 31

#define P_INT 0
#define P_VAR 1

// Parameter indices (RL-controlled)
#define P_TF1        0
#define P_TF2D       1
#define P_FDLEN1     2
#define P_SLOPELEN1  3
#define P_VOLLEN1    4
#define P_FDLEN2     5
#define P_SLOPELEN2  6
#define P_VOLLEN2    7
#define P_LEVSCALE   8
#define P_MAXLEV     9
#define P_PREDTHR    10
#define P_HOLDBARS   11

// -----------------------------
// RL storage
// -----------------------------
string ParName[NPAR];

int ParType[NPAR] =
{
	P_INT,  // TF1
	P_INT,  // TF2d
	P_INT,  // FDLen1
	P_INT,  // SlopeLen1
	P_INT,  // VolLen1
	P_INT,  // FDLen2
	P_INT,  // SlopeLen2
	P_INT,  // VolLen2
	P_VAR,  // LevScale
	P_VAR,  // MaxLev
	P_VAR,  // PredThr
	P_INT   // HoldBars
};

var ParMin[NPAR];
var ParMax[NPAR];
var ParStep[NPAR];

var Q[NPAR][MAXARMS];
int Ncnt[NPAR][MAXARMS];
int ArmsCount[NPAR];
int CurArm[NPAR];

// Totals for UCB (avoid summing every time)
int TotCnt[NPAR];

// -----------------------------
// Pairwise "communication" RESIDUAL tables
// -----------------------------

// (A) Derived TF2 communication: TF1_arm × TF2_idx(0..11)
var Qpair_TF12[MAXARMS][TF_MAX];
int Npair_TF12[MAXARMS][TF_MAX];
int TotPair_TF12;

// (B) TF2 ? W2max bin: TF2_idx(0..11) × W2bin(0..30)
var Qpair_TF2_W2[TF_MAX][W2_BINS];
int Npair_TF2_W2[TF_MAX][W2_BINS];
int TotPair_TF2_W2;

// (C) Existing FDLen1 ? SlopeLen1
var Qpair_FD1SL1[MAXARMS][MAXARMS];
int Npair_FD1SL1[MAXARMS][MAXARMS];
int TotPair_FD1SL1;

// (D) Existing FDLen2 ? SlopeLen2
var Qpair_FD2SL2[MAXARMS][MAXARMS];
int Npair_FD2SL2[MAXARMS][MAXARMS];
int TotPair_FD2SL2;

// (E) Existing LevScale ? PredThr
var Qpair_LS_PT[MAXARMS][MAXARMS];
int Npair_LS_PT[MAXARMS][MAXARMS];
int TotPair_LS_PT;

// (F) Existing MaxLev ? HoldBars
var Qpair_ML_HB[MAXARMS][MAXARMS];
int Npair_ML_HB[MAXARMS][MAXARMS];
int TotPair_ML_HB;

// (G) NEW: TF2 ? HoldBars_arm (approx TF2-HB factor)
var Qpair_TF2_HB[TF_MAX][MAXARMS];
int Npair_TF2_HB[TF_MAX][MAXARMS];
int TotPair_TF2_HB;

// -----------------------------
// Utility
// -----------------------------
int calcArms(var mn, var mx, var stp)
{
	if(stp <= 0) return 1;
	int n = (int)floor((mx - mn)/stp + 1.000001);
	if(n < 1) n = 1;
	if(n > MAXARMS) n = MAXARMS;
	return n;
}

var armValue(int p, int a)
{
	var v = ParMin[p] + (var)a * ParStep[p];
	if(v < ParMin[p]) v = ParMin[p];
	if(v > ParMax[p]) v = ParMax[p];
	if(ParType[p] == P_INT) v = (var)(int)(v + 0.5);
	return v;
}

void initParNames()
{
	ParName[P_TF1]       = "TF1";
	ParName[P_TF2D]      = "TF2d";
	ParName[P_FDLEN1]    = "FDLen1";
	ParName[P_SLOPELEN1] = "SlopeLen1";
	ParName[P_VOLLEN1]   = "VolLen1";
	ParName[P_FDLEN2]    = "FDLen2";
	ParName[P_SLOPELEN2] = "SlopeLen2";
	ParName[P_VOLLEN2]   = "VolLen2";
	ParName[P_LEVSCALE]  = "LevScale";
	ParName[P_MAXLEV]    = "MaxLev";
	ParName[P_PREDTHR]   = "PredThr";
	ParName[P_HOLDBARS]  = "HoldBars";
}

// UCB bonus: c * sqrt( ln(1+Tot) / (1+n) )
var ucbBonus(int tot, int n, var c)
{
	var lnT = log(1 + (var)tot);
	return c * sqrt(lnT / (1 + (var)n));
}

// Pair gating: lambda_eff = lambda * min(1, nPair/nMin)
var lambdaEff(int nPair)
{
	var frac = (var)nPair / (var)NMIN_PAIR;
	if(frac > 1) frac = 1;
	if(frac < 0) frac = 0;
	return LAMBDA_PAIR * frac;
}

// Derived TF2 value from TF1_arm and TF2d_arm
int tf2ValueFromArms(int armTF1, int armTF2d)
{
	int TF1v  = (int)armValue(P_TF1, armTF1);
	int TF2dv = (int)armValue(P_TF2D, armTF2d);
	int TF2v = TF1v + TF2dv;
	if(TF2v > TF_MAX) TF2v = TF_MAX;
	if(TF2v < 1) TF2v = 1;
	return TF2v;
}

// TF2 index 0..11
int tf2IndexFromValue(int TF2v)
{
	int idx = TF2v - 1;
	if(idx < 0) idx = 0;
	if(idx >= TF_MAX) idx = TF_MAX - 1;
	return idx;
}

// W2 bin index 0..30 from a window value
int w2BinFromValue(int w)
{
	if(w < W2_MIN) w = W2_MIN;
	if(w > W2_MAX) w = W2_MAX;
	return (int)((w - W2_MIN) / W2_STEP);
}

// Soft safety penalty pi(theta) = max(0, (TF2*W2max - SAFE_LIMIT)/SAFE_LIMIT)
var safetyPenalty(int TF2v, int W2maxv)
{
	var x = (var)(TF2v * W2maxv - SAFE_LIMIT);
	if(x <= 0) return 0;
	return x / (var)SAFE_LIMIT;
}

// sign helper: -1,0,1
var sign3(var x)
{
	if(x > 0) return 1;
	if(x < 0) return -1;
	return 0;
}

// -----------------------------
// Single-arm selection with UCB + epsilon explore
// -----------------------------
int bestArm_UCB(int p)
{
	int a, best = 0;
	var bestScore = Q[p][0] + ucbBonus(TotCnt[p], Ncnt[p][0], C_UCB);

	for(a=1; a<ArmsCount[p]; a++)
	{
		var score = Q[p][a] + ucbBonus(TotCnt[p], Ncnt[p][a], C_UCB);
		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_UCB(int p)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[p]);
	return bestArm_UCB(p);
}

// Update single Q
void updateArm(int p, int a, var reward)
{
	Q[p][a] = Q[p][a] + ALPHA*(reward - Q[p][a]);
	Ncnt[p][a] += 1;
	TotCnt[p] += 1;
}

// -----------------------------
// Residual pair updates
// -----------------------------

// (1) TF1 × TF2_idx residual:
// pred = Q_TF1(a1) + Qpair(a1,tf2)
void updatePairRes_TF12(int aTF1, int tf2Idx, var reward)
{
	var pred = Q[P_TF1][aTF1] + Qpair_TF12[aTF1][tf2Idx];
	Qpair_TF12[aTF1][tf2Idx] = Qpair_TF12[aTF1][tf2Idx] + ALPHA*(reward - pred);
	Npair_TF12[aTF1][tf2Idx] += 1;
	TotPair_TF12 += 1;
}

// (2) TF2_idx × W2bin (standalone interaction; no clean singles for derived vars)
// pred = Qpair(tf2,w2)
void updatePair_TF2_W2(int tf2Idx, int w2bin, var reward)
{
	var pred = Qpair_TF2_W2[tf2Idx][w2bin];
	Qpair_TF2_W2[tf2Idx][w2bin] = Qpair_TF2_W2[tf2Idx][w2bin] + ALPHA*(reward - pred);
	Npair_TF2_W2[tf2Idx][w2bin] += 1;
	TotPair_TF2_W2 += 1;
}

// (3) Existing residuals
void updatePairRes_FD1SL1(int a1, int a2, var reward)
{
	var pred = Q[P_FDLEN1][a1] + Q[P_SLOPELEN1][a2] + Qpair_FD1SL1[a1][a2];
	Qpair_FD1SL1[a1][a2] = Qpair_FD1SL1[a1][a2] + ALPHA*(reward - pred);
	Npair_FD1SL1[a1][a2] += 1;
	TotPair_FD1SL1 += 1;
}

void updatePairRes_FD2SL2(int a1, int a2, var reward)
{
	var pred = Q[P_FDLEN2][a1] + Q[P_SLOPELEN2][a2] + Qpair_FD2SL2[a1][a2];
	Qpair_FD2SL2[a1][a2] = Qpair_FD2SL2[a1][a2] + ALPHA*(reward - pred);
	Npair_FD2SL2[a1][a2] += 1;
	TotPair_FD2SL2 += 1;
}

void updatePairRes_LS_PT(int a1, int a2, var reward)
{
	var pred = Q[P_LEVSCALE][a1] + Q[P_PREDTHR][a2] + Qpair_LS_PT[a1][a2];
	Qpair_LS_PT[a1][a2] = Qpair_LS_PT[a1][a2] + ALPHA*(reward - pred);
	Npair_LS_PT[a1][a2] += 1;
	TotPair_LS_PT += 1;
}

void updatePairRes_ML_HB(int a1, int a2, var reward)
{
	var pred = Q[P_MAXLEV][a1] + Q[P_HOLDBARS][a2] + Qpair_ML_HB[a1][a2];
	Qpair_ML_HB[a1][a2] = Qpair_ML_HB[a1][a2] + ALPHA*(reward - pred);
	Npair_ML_HB[a1][a2] += 1;
	TotPair_ML_HB += 1;
}

// (4) NEW TF2_idx × HoldBars_arm residual:
// pred = Q_HB(aHB) + Qpair(tf2,aHB)
void updatePairRes_TF2_HB(int tf2Idx, int aHB, var reward)
{
	var pred = Q[P_HOLDBARS][aHB] + Qpair_TF2_HB[tf2Idx][aHB];
	Qpair_TF2_HB[tf2Idx][aHB] = Qpair_TF2_HB[tf2Idx][aHB] + ALPHA*(reward - pred);
	Npair_TF2_HB[tf2Idx][aHB] += 1;
	TotPair_TF2_HB += 1;
}

// -----------------------------
// Pair-aware selection pieces
// -----------------------------

// --- TF2d selection conditioned on TF1, but "listens" via derived TF2 index ---
// score = UCB_TF2d + lambda_eff*Qpair_TF12(TF1,TF2idx) + pairUCB(TF12)
int bestArm_TF2d_given_TF1_TF2(int armTF1)
{
	int a2d, best = 0;

	int TF2v0 = tf2ValueFromArms(armTF1, 0);
	int tf2Idx0 = tf2IndexFromValue(TF2v0);
	int n0 = Npair_TF12[armTF1][tf2Idx0];
	var lam0 = lambdaEff(n0);

	var pairB0 = lam0*Qpair_TF12[armTF1][tf2Idx0] + ucbBonus(TotPair_TF12, n0, CP_UCB);

	var bestScore =
		Q[P_TF2D][0] + ucbBonus(TotCnt[P_TF2D], Ncnt[P_TF2D][0], C_UCB) + pairB0;

	for(a2d=1; a2d<ArmsCount[P_TF2D]; a2d++)
	{
		int TF2v = tf2ValueFromArms(armTF1, a2d);
		int tf2Idx = tf2IndexFromValue(TF2v);

		int n = Npair_TF12[armTF1][tf2Idx];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_TF12[armTF1][tf2Idx] + ucbBonus(TotPair_TF12, n, CP_UCB);

		var score =
			Q[P_TF2D][a2d] + ucbBonus(TotCnt[P_TF2D], Ncnt[P_TF2D][a2d], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a2d;
		}
	}
	return best;
}

int selectArm_TF2d_given_TF1_TF2(int armTF1)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_TF2D]);
	return bestArm_TF2d_given_TF1_TF2(armTF1);
}

// --- TF1 symmetric: anticipates best TF2d response (via TF2 index) ---
int bestArm_TF1_symmetric_TF2()
{
	int a1, best = 0;

	// a1=0 baseline
	var bestPairTerm0 = 0;
	int a2d;
	for(a2d=0; a2d<ArmsCount[P_TF2D]; a2d++)
	{
		int TF2v = tf2ValueFromArms(0, a2d);
		int tf2Idx = tf2IndexFromValue(TF2v);

		int n = Npair_TF12[0][tf2Idx];
		var lam = lambdaEff(n);
		var term = lam*Qpair_TF12[0][tf2Idx] + ucbBonus(TotPair_TF12, n, CP_UCB);

		if(a2d==0) bestPairTerm0 = term;
		else if(term > bestPairTerm0) bestPairTerm0 = term;
	}

	var bestScore =
		Q[P_TF1][0] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][0], C_UCB) + bestPairTerm0;

	for(a1=1; a1<ArmsCount[P_TF1]; a1++)
	{
		var bestPairTerm = 0;

		for(a2d=0; a2d<ArmsCount[P_TF2D]; a2d++)
		{
			int TF2v2 = tf2ValueFromArms(a1, a2d);
			int tf2Idx2 = tf2IndexFromValue(TF2v2);

			int n2 = Npair_TF12[a1][tf2Idx2];
			var lam2 = lambdaEff(n2);
			var term2 = lam2*Qpair_TF12[a1][tf2Idx2] + ucbBonus(TotPair_TF12, n2, CP_UCB);

			if(a2d==0) bestPairTerm = term2;
			else if(term2 > bestPairTerm) bestPairTerm = term2;
		}

		var score =
			Q[P_TF1][a1] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][a1], C_UCB) + bestPairTerm;

		if(score > bestScore)
		{
			bestScore = score;
			best = a1;
		}
	}
	return best;
}

int selectArm_TF1_symmetric_TF2()
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_TF1]);
	return bestArm_TF1_symmetric_TF2();
}

// Coordinate descent refinement: TF1 given TF2d, using derived TF2 index
int bestArm_TF1_given_TF2d_TF2(int armTF2d)
{
	int a1, best = 0;

	int TF2v0 = tf2ValueFromArms(0, armTF2d);
	int tf2Idx0 = tf2IndexFromValue(TF2v0);
	int n0 = Npair_TF12[0][tf2Idx0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_TF12[0][tf2Idx0] + ucbBonus(TotPair_TF12, n0, CP_UCB);

	var bestScore =
		Q[P_TF1][0] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][0], C_UCB) + pairB0;

	for(a1=1; a1<ArmsCount[P_TF1]; a1++)
	{
		int TF2v = tf2ValueFromArms(a1, armTF2d);
		int tf2Idx = tf2IndexFromValue(TF2v);

		int n = Npair_TF12[a1][tf2Idx];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_TF12[a1][tf2Idx] + ucbBonus(TotPair_TF12, n, CP_UCB);

		var score =
			Q[P_TF1][a1] + ucbBonus(TotCnt[P_TF1], Ncnt[P_TF1][a1], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a1;
		}
	}
	return best;
}

// --- TF2 ? W2max influence + soft safety applied during window2 selection ---

// Score contribution from TF2-W2 table:
var tf2w2Term(int tf2Idx, int w2bin)
{
	int n = Npair_TF2_W2[tf2Idx][w2bin];
	var lam = lambdaEff(n);
	return lam*Qpair_TF2_W2[tf2Idx][w2bin] + ucbBonus(TotPair_TF2_W2, n, CP_UCB);
}

// FDLen2 choice conditioned on TF2 (uses TF2-W2 term + penalty)
int bestArm_FDLen2_given_TF2(int tf2Idx, int TF2v)
{
	int a, best = 0;

	int v0 = (int)armValue(P_FDLEN2, 0);
	int w2bin0 = w2BinFromValue(v0);
	var pen0 = safetyPenalty(TF2v, v0);
	var bestScore =
		Q[P_FDLEN2][0] + ucbBonus(TotCnt[P_FDLEN2], Ncnt[P_FDLEN2][0], C_UCB)
		+ tf2w2Term(tf2Idx, w2bin0)
		- ETA_SAFE*pen0;

	for(a=1; a<ArmsCount[P_FDLEN2]; a++)
	{
		int v = (int)armValue(P_FDLEN2, a);
		int w2bin = w2BinFromValue(v);
		var pen = safetyPenalty(TF2v, v);

		var score =
			Q[P_FDLEN2][a] + ucbBonus(TotCnt[P_FDLEN2], Ncnt[P_FDLEN2][a], C_UCB)
			+ tf2w2Term(tf2Idx, w2bin)
			- ETA_SAFE*pen;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_FDLen2_given_TF2(int tf2Idx, int TF2v)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_FDLEN2]);
	return bestArm_FDLen2_given_TF2(tf2Idx, TF2v);
}

// SlopeLen2 conditioned on FDLen2 AND TF2 (FD2-SL2 pair + TF2-W2 + penalty)
int bestArm_SlopeLen2_given_FD2_TF2(int armFD2, int tf2Idx, int TF2v)
{
	int a, best = 0;

	int vFD = (int)armValue(P_FDLEN2, armFD2);

	// a=0 baseline
	int v0 = (int)armValue(P_SLOPELEN2, 0);
	int mx0 = vFD; if(v0 > mx0) mx0 = v0;
	int w2bin0 = w2BinFromValue(mx0);
	var pen0 = safetyPenalty(TF2v, mx0);

	// FD2-SL2 pair term
	int nP0 = Npair_FD2SL2[armFD2][0];
	var lamP0 = lambdaEff(nP0);
	var fdslTerm0 = lamP0*Qpair_FD2SL2[armFD2][0] + ucbBonus(TotPair_FD2SL2, nP0, CP_UCB);

	var bestScore =
		Q[P_SLOPELEN2][0] + ucbBonus(TotCnt[P_SLOPELEN2], Ncnt[P_SLOPELEN2][0], C_UCB)
		+ fdslTerm0
		+ tf2w2Term(tf2Idx, w2bin0)
		- ETA_SAFE*pen0;

	for(a=1; a<ArmsCount[P_SLOPELEN2]; a++)
	{
		int v = (int)armValue(P_SLOPELEN2, a);
		int mx = vFD; if(v > mx) mx = v;
		int w2bin = w2BinFromValue(mx);
		var pen = safetyPenalty(TF2v, mx);

		int nP = Npair_FD2SL2[armFD2][a];
		var lamP = lambdaEff(nP);
		var fdslTerm = lamP*Qpair_FD2SL2[armFD2][a] + ucbBonus(TotPair_FD2SL2, nP, CP_UCB);

		var score =
			Q[P_SLOPELEN2][a] + ucbBonus(TotCnt[P_SLOPELEN2], Ncnt[P_SLOPELEN2][a], C_UCB)
			+ fdslTerm
			+ tf2w2Term(tf2Idx, w2bin)
			- ETA_SAFE*pen;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_SlopeLen2_given_FD2_TF2(int armFD2, int tf2Idx, int TF2v)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_SLOPELEN2]);
	return bestArm_SlopeLen2_given_FD2_TF2(armFD2, tf2Idx, TF2v);
}

// VolLen2 conditioned on current W2max and TF2 (TF2-W2 + penalty)
int bestArm_VolLen2_given_TF2(int curMax, int tf2Idx, int TF2v)
{
	int a, best = 0;

	int v0 = (int)armValue(P_VOLLEN2, 0);
	int mx0 = curMax; if(v0 > mx0) mx0 = v0;
	int w2bin0 = w2BinFromValue(mx0);
	var pen0 = safetyPenalty(TF2v, mx0);

	var bestScore =
		Q[P_VOLLEN2][0] + ucbBonus(TotCnt[P_VOLLEN2], Ncnt[P_VOLLEN2][0], C_UCB)
		+ tf2w2Term(tf2Idx, w2bin0)
		- ETA_SAFE*pen0;

	for(a=1; a<ArmsCount[P_VOLLEN2]; a++)
	{
		int v = (int)armValue(P_VOLLEN2, a);
		int mx = curMax; if(v > mx) mx = v;
		int w2bin = w2BinFromValue(mx);
		var pen = safetyPenalty(TF2v, mx);

		var score =
			Q[P_VOLLEN2][a] + ucbBonus(TotCnt[P_VOLLEN2], Ncnt[P_VOLLEN2][a], C_UCB)
			+ tf2w2Term(tf2Idx, w2bin)
			- ETA_SAFE*pen;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_VolLen2_given_TF2(int curMax, int tf2Idx, int TF2v)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_VOLLEN2]);
	return bestArm_VolLen2_given_TF2(curMax, tf2Idx, TF2v);
}

// --- Existing FDLen1 ? SlopeLen1 (unchanged) ---
int bestArm_SlopeLen1_given_FDLen1(int armFD1)
{
	int a, best = 0;

	int n0 = Npair_FD1SL1[armFD1][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_FD1SL1[armFD1][0] + ucbBonus(TotPair_FD1SL1, n0, CP_UCB);

	var bestScore =
		Q[P_SLOPELEN1][0] + ucbBonus(TotCnt[P_SLOPELEN1], Ncnt[P_SLOPELEN1][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_SLOPELEN1]; a++)
	{
		int n = Npair_FD1SL1[armFD1][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_FD1SL1[armFD1][a] + ucbBonus(TotPair_FD1SL1, n, CP_UCB);

		var score =
			Q[P_SLOPELEN1][a] + ucbBonus(TotCnt[P_SLOPELEN1], Ncnt[P_SLOPELEN1][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_SlopeLen1_given_FDLen1(int armFD1)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_SLOPELEN1]);
	return bestArm_SlopeLen1_given_FDLen1(armFD1);
}

// --- LevScale -> PredThr (unchanged) ---
int bestArm_PredThr_given_LevScale(int armLS)
{
	int a, best = 0;

	int n0 = Npair_LS_PT[armLS][0];
	var lam0 = lambdaEff(n0);
	var pairB0 = lam0*Qpair_LS_PT[armLS][0] + ucbBonus(TotPair_LS_PT, n0, CP_UCB);

	var bestScore =
		Q[P_PREDTHR][0] + ucbBonus(TotCnt[P_PREDTHR], Ncnt[P_PREDTHR][0], C_UCB) + pairB0;

	for(a=1; a<ArmsCount[P_PREDTHR]; a++)
	{
		int n = Npair_LS_PT[armLS][a];
		var lam = lambdaEff(n);
		var pairB = lam*Qpair_LS_PT[armLS][a] + ucbBonus(TotPair_LS_PT, n, CP_UCB);

		var score =
			Q[P_PREDTHR][a] + ucbBonus(TotCnt[P_PREDTHR], Ncnt[P_PREDTHR][a], C_UCB) + pairB;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_PredThr_given_LevScale(int armLS)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_PREDTHR]);
	return bestArm_PredThr_given_LevScale(armLS);
}

// --- HoldBars selection conditioned on MaxLev AND TF2_idx (adds TF2-HB communication) ---
int bestArm_HoldBars_given_MaxLev_TF2(int armML, int tf2Idx)
{
	int a, best = 0;

	// ML-HB term for a=0
	int n0 = Npair_ML_HB[armML][0];
	var lam0 = lambdaEff(n0);
	var mlhb0 = lam0*Qpair_ML_HB[armML][0] + ucbBonus(TotPair_ML_HB, n0, CP_UCB);

	// TF2-HB term for a=0
	int nT0 = Npair_TF2_HB[tf2Idx][0];
	var lamT0 = lambdaEff(nT0);
	var tf2hb0 = lamT0*Qpair_TF2_HB[tf2Idx][0] + ucbBonus(TotPair_TF2_HB, nT0, CP_UCB);

	var bestScore =
		Q[P_HOLDBARS][0] + ucbBonus(TotCnt[P_HOLDBARS], Ncnt[P_HOLDBARS][0], C_UCB)
		+ mlhb0 + tf2hb0;

	for(a=1; a<ArmsCount[P_HOLDBARS]; a++)
	{
		int n1 = Npair_ML_HB[armML][a];
		var lam1 = lambdaEff(n1);
		var mlhb = lam1*Qpair_ML_HB[armML][a] + ucbBonus(TotPair_ML_HB, n1, CP_UCB);

		int nT = Npair_TF2_HB[tf2Idx][a];
		var lamT = lambdaEff(nT);
		var tf2hb = lamT*Qpair_TF2_HB[tf2Idx][a] + ucbBonus(TotPair_TF2_HB, nT, CP_UCB);

		var score =
			Q[P_HOLDBARS][a] + ucbBonus(TotCnt[P_HOLDBARS], Ncnt[P_HOLDBARS][a], C_UCB)
			+ mlhb + tf2hb;

		if(score > bestScore)
		{
			bestScore = score;
			best = a;
		}
	}
	return best;
}

int selectArm_HoldBars_given_MaxLev_TF2(int armML, int tf2Idx)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[P_HOLDBARS]);
	return bestArm_HoldBars_given_MaxLev_TF2(armML, tf2Idx);
}

// -----------------------------
// Init ranges + tables
// -----------------------------
void initParamsRL()
{
	// Ranges roughly matching your old optimize() ranges/steps
	ParMin[P_TF1]        = 1;    ParMax[P_TF1]        = 3;    ParStep[P_TF1]        = 1;
	ParMin[P_TF2D]       = 1;    ParMax[P_TF2D]       = 11;   ParStep[P_TF2D]       = 1;

	ParMin[P_FDLEN1]     = 20;   ParMax[P_FDLEN1]     = 220;  ParStep[P_FDLEN1]     = 5;
	ParMin[P_SLOPELEN1]  = 20;   ParMax[P_SLOPELEN1]  = 200;  ParStep[P_SLOPELEN1]  = 5;
	ParMin[P_VOLLEN1]    = 20;   ParMax[P_VOLLEN1]    = 200;  ParStep[P_VOLLEN1]    = 5;

	ParMin[P_FDLEN2]     = 10;   ParMax[P_FDLEN2]     = 160;  ParStep[P_FDLEN2]     = 5;
	ParMin[P_SLOPELEN2]  = 10;   ParMax[P_SLOPELEN2]  = 140;  ParStep[P_SLOPELEN2]  = 5;
	ParMin[P_VOLLEN2]    = 10;   ParMax[P_VOLLEN2]    = 140;  ParStep[P_VOLLEN2]    = 5;

	ParMin[P_LEVSCALE]   = 2;    ParMax[P_LEVSCALE]   = 30;   ParStep[P_LEVSCALE]   = 1;
	ParMin[P_MAXLEV]     = 0.1;  ParMax[P_MAXLEV]     = 1.0;  ParStep[P_MAXLEV]     = 0.1;
	ParMin[P_PREDTHR]    = 0.0;  ParMax[P_PREDTHR]    = 0.20; ParStep[P_PREDTHR]    = 0.01;
	ParMin[P_HOLDBARS]   = 1;    ParMax[P_HOLDBARS]   = 30;   ParStep[P_HOLDBARS]   = 1;

	int p, a;
	for(p=0; p<NPAR; p++)
	{
		ArmsCount[p] = calcArms(ParMin[p], ParMax[p], ParStep[p]);
		CurArm[p] = 0;
		TotCnt[p] = 0;
		for(a=0; a<ArmsCount[p]; a++)
		{
			Q[p][a] = 0;
			Ncnt[p][a] = 0;
		}
	}

	// Init pair tables
	int i,j;

	TotPair_TF12 = 0;
	TotPair_TF2_W2 = 0;
	TotPair_FD1SL1 = 0;
	TotPair_FD2SL2 = 0;
	TotPair_LS_PT = 0;
	TotPair_ML_HB = 0;
	TotPair_TF2_HB = 0;

	// TF12 table
	for(i=0; i<MAXARMS; i++)
	{
		for(j=0; j<TF_MAX; j++)
		{
			Qpair_TF12[i][j] = 0;
			Npair_TF12[i][j] = 0;
		}
	}

	// TF2-W2 table
	for(i=0; i<TF_MAX; i++)
	{
		for(j=0; j<W2_BINS; j++)
		{
			Qpair_TF2_W2[i][j] = 0;
			Npair_TF2_W2[i][j] = 0;
		}
	}

	// Large MAXARMS×MAXARMS tables
	for(i=0; i<MAXARMS; i++)
	{
		for(j=0; j<MAXARMS; j++)
		{
			Qpair_FD1SL1[i][j] = 0;  Npair_FD1SL1[i][j] = 0;
			Qpair_FD2SL2[i][j] = 0;  Npair_FD2SL2[i][j] = 0;
			Qpair_LS_PT[i][j]  = 0;  Npair_LS_PT[i][j]  = 0;
			Qpair_ML_HB[i][j]  = 0;  Npair_ML_HB[i][j]  = 0;
		}
	}

	// TF2-HB table: TF_MAX × MAXARMS
	for(i=0; i<TF_MAX; i++)
	{
		for(j=0; j<MAXARMS; j++)
		{
			Qpair_TF2_HB[i][j] = 0;
			Npair_TF2_HB[i][j] = 0;
		}
	}
}

// -----------------------------
// Parameter pick order (TF2-aware, TF2-W2 aware, TF2-HB aware, soft safety)
// -----------------------------
void pickParams()
{
	int p;

	// 1) TF1 symmetric pick (anticipate TF2d best response) using derived TF2 table
	CurArm[P_TF1] = selectArm_TF1_symmetric_TF2();

	// 2) TF2d conditioned on TF1 using derived TF2 communication
	CurArm[P_TF2D] = selectArm_TF2d_given_TF1_TF2(CurArm[P_TF1]);

	// 3) Coordinate descent: re-pick TF1 given TF2d
	if(random(1) >= EPSILON)
		CurArm[P_TF1] = bestArm_TF1_given_TF2d_TF2(CurArm[P_TF2D]);

	// Derived TF2 for the episode (used by TF2-side window selection and TF2-HB)
	int TF2v = tf2ValueFromArms(CurArm[P_TF1], CurArm[P_TF2D]);
	int tf2Idx = tf2IndexFromValue(TF2v);

	// 4) FD/Slope pair TF1
	CurArm[P_FDLEN1]    = selectArm_UCB(P_FDLEN1);
	CurArm[P_SLOPELEN1] = selectArm_SlopeLen1_given_FDLen1(CurArm[P_FDLEN1]);

	// 5) TF2-side windows are TF2-aware (TF2-W2) + FD2-SL2 communication + soft safety
	CurArm[P_FDLEN2]    = selectArm_FDLen2_given_TF2(tf2Idx, TF2v);
	CurArm[P_SLOPELEN2] = selectArm_SlopeLen2_given_FD2_TF2(CurArm[P_FDLEN2], tf2Idx, TF2v);

	// current max after FD2+SL2
	int vFD2 = (int)armValue(P_FDLEN2, CurArm[P_FDLEN2]);
	int vSL2 = (int)armValue(P_SLOPELEN2, CurArm[P_SLOPELEN2]);
	int curMax = vFD2; if(vSL2 > curMax) curMax = vSL2;

	CurArm[P_VOLLEN2]   = selectArm_VolLen2_given_TF2(curMax, tf2Idx, TF2v);

	// 6) Remaining independent params (except the paired ones we do later)
	for(p=0; p<NPAR; p++)
	{
		if(p == P_TF1) continue;
		if(p == P_TF2D) continue;
		if(p == P_FDLEN1) continue;
		if(p == P_SLOPELEN1) continue;
		if(p == P_FDLEN2) continue;
		if(p == P_SLOPELEN2) continue;
		if(p == P_VOLLEN2) continue;
		CurArm[p] = selectArm_UCB(p);
	}

	// 7) LevScale -> PredThr conditioned
	CurArm[P_LEVSCALE] = selectArm_UCB(P_LEVSCALE);
	CurArm[P_PREDTHR]  = selectArm_PredThr_given_LevScale(CurArm[P_LEVSCALE]);

	// 8) MaxLev -> HoldBars conditioned AND TF2-HB communication
	CurArm[P_MAXLEV]   = selectArm_UCB(P_MAXLEV);
	CurArm[P_HOLDBARS] = selectArm_HoldBars_given_MaxLev_TF2(CurArm[P_MAXLEV], tf2Idx);
}

// -----------------------------
// Feature helpers (lite-C safe)
// -----------------------------
function fractalDimKatz(vars P, int N)
{
	if(N < 2) return 1.0;

	var L = 0;
	int i;
	for(i=0; i<N-1; i++)
		L += abs(P[i] - P[i+1]);

	var d = 0;
	for(i=1; i<N; i++)
	{
		var di = abs(P[i] - P[0]);
		if(di > d) d = di;
	}

	if(L <= 0 || d <= 0) return 1.0;

	var n  = (var)N;
	var fd = log(n) / (log(n) + log(d / L));
	return clamp(fd, 1.0, 2.0);
}

function linSlope(vars P, int N)
{
	if(N < 2) return 0;

	var sumT=0, sumP=0, sumTT=0, sumTP=0;
	int i;
	for(i=0; i<N; i++)
	{
		var t = (var)i;
		sumT  += t;
		sumP  += P[i];
		sumTT += t*t;
		sumTP += t*P[i];
	}

	var denom = (var)N*sumTT - sumT*sumT;
	if(abs(denom) < 1e-12) return 0;

	return ((var)N*sumTP - sumT*sumP) / denom;
}

function stdevReturns(vars R, int N)
{
	if(N < 2) return 0;

	var mean = 0;
	int i;
	for(i=0; i<N; i++) mean += R[i];
	mean /= (var)N;

	var v = 0;
	for(i=0; i<N; i++)
	{
		var d = R[i] - mean;
		v += d*d;
	}
	v /= (var)(N-1);

	return sqrt(max(0, v));
}

// ============================================================================
// RUN
// ============================================================================
function run()
{
	BarPeriod = 15;
	StartDate = 20100101;
	EndDate   = 0;

	set(PLOTNOW|RULES|LOGFILE);

	asset("EUR/USD");
	algo("FRACTAL2TF");

	var eps = 1e-12;
	DataSplit = 50;

	LookBack = 3000;

	// One-time init
	static int Inited = 0;
	static int PrevOpenTotal = 0;
	static var LastBalance = 0;
	static int Flip = 0;

	string LogFN = "Log\\FRACTAL2TF.csv";

	if(is(FIRSTINITRUN))
	{
		file_delete(LogFN);

		file_append(LogFN,"Date,Time,Mode,Bar,");
		file_append(LogFN,"TF1,TF2,TF2d,FDLen1,SlopeLen1,VolLen1,FDLen2,SlopeLen2,VolLen2,");
		file_append(LogFN,"LevScale,MaxLev,PredThr,HoldBars,");
		file_append(LogFN,"FD1,Slope1,Vol1,FD2,Slope2,Vol2,Aslope,Asigma,AFD,");
		file_append(LogFN,"PredL,PredS,Pred,Lev,RewardNorm,TF2idx,W2max,W2bin\n");

		Inited = 0;
		PrevOpenTotal = 0;
		LastBalance = 0;
		Flip = 0;
	}

	if(!Inited)
	{
		initParNames();
		initParamsRL();
		pickParams();

		LastBalance = Balance;
		PrevOpenTotal = NumOpenTotal;

		Inited = 1;
	}

	// Convert chosen arms -> parameter values (current episode)
	int TF1 = (int)armValue(P_TF1, CurArm[P_TF1]);
	int TF2d = (int)armValue(P_TF2D, CurArm[P_TF2D]);
	int TF2 = TF1 + TF2d;
	if(TF2 > TF_MAX) TF2 = TF_MAX;

	int FDLen1    = (int)armValue(P_FDLEN1,    CurArm[P_FDLEN1]);
	int SlopeLen1 = (int)armValue(P_SLOPELEN1, CurArm[P_SLOPELEN1]);
	int VolLen1   = (int)armValue(P_VOLLEN1,   CurArm[P_VOLLEN1]);

	int FDLen2    = (int)armValue(P_FDLEN2,    CurArm[P_FDLEN2]);
	int SlopeLen2 = (int)armValue(P_SLOPELEN2, CurArm[P_SLOPELEN2]);
	int VolLen2   = (int)armValue(P_VOLLEN2,   CurArm[P_VOLLEN2]);

	var LevScale  = armValue(P_LEVSCALE, CurArm[P_LEVSCALE]);
	var MaxLev    = armValue(P_MAXLEV,   CurArm[P_MAXLEV]);
	var PredThr   = armValue(P_PREDTHR,  CurArm[P_PREDTHR]);
	int HoldBars  = (int)armValue(P_HOLDBARS, CurArm[P_HOLDBARS]);

	// Derived indices for logging + pair updates
	int tf2Idx = tf2IndexFromValue(TF2);

	int W2max = FDLen2;
	if(SlopeLen2 > W2max) W2max = SlopeLen2;
	if(VolLen2 > W2max) W2max = VolLen2;
	int W2bin = w2BinFromValue(W2max);

	// Build series (2 TF)
	TimeFrame = TF1;
	vars P1 = series(priceClose());
	vars R1 = series(log(max(eps,P1[0]) / max(eps,P1[1])));

	vars FD1S    = series(0);
	vars Slope1S = series(0);
	vars Vol1S   = series(0);

	TimeFrame = TF2;
	vars P2 = series(priceClose());
	vars R2 = series(log(max(eps,P2[0]) / max(eps,P2[1])));

	vars FD2S    = series(0);
	vars Slope2S = series(0);
	vars Vol2S   = series(0);

	TimeFrame = 1;

	// Warmup gate based on current episode params
	int Need1 = max(max(FDLen1, SlopeLen1), VolLen1) + 5;
	int Need2 = max(max(FDLen2, SlopeLen2), VolLen2) + 5;
	int WarmupBars = max(TF1*Need1, TF2*Need2) + 10;

	if(Bar < WarmupBars)
		return;

	// Do NOT block TRAIN during LOOKBACK
	if(is(LOOKBACK) && !Train)
		return;

	// Compute features
	TimeFrame = TF1;
	FD1S[0]    = fractalDimKatz(P1, FDLen1);
	Slope1S[0] = linSlope(P1, SlopeLen1);
	Vol1S[0]   = stdevReturns(R1, VolLen1);

	TimeFrame = TF2;
	FD2S[0]    = fractalDimKatz(P2, FDLen2);
	Slope2S[0] = linSlope(P2, SlopeLen2);
	Vol2S[0]   = stdevReturns(R2, VolLen2);

	TimeFrame = 1;

	// Alignment meta-features (Improvement 3)
	var Aslope = sign3(Slope1S[0]) * sign3(Slope2S[0]);          // -1,0,1
	var Asigma = Vol2S[0] / (Vol1S[0] + eps);                    // ratio
	var AFD    = FD2S[0] - FD1S[0];                              // contrast

	// Feature vector for ML: now 9 features
	var Sig[9];
	Sig[0] = FD1S[0];
	Sig[1] = Slope1S[0];
	Sig[2] = Vol1S[0];
	Sig[3] = FD2S[0];
	Sig[4] = Slope2S[0];
	Sig[5] = Vol2S[0];
	Sig[6] = Aslope;
	Sig[7] = Asigma;
	Sig[8] = AFD;

	// Trading logic
	int MethodBase = PERCEPTRON + FUZZY + BALANCED;
	int MethodRet  = MethodBase + RETURNS;

	var PredL=0, PredS=0, Pred=0, Lev=0;

	// time-based exit
	if(NumOpenTotal > 0)
		for(open_trades)
			if(TradeIsOpen && TradeBars >= HoldBars)
				exitTrade(ThisTrade);

	if(Train)
	{
		// Forced alternating trades so ML always gets samples
		if(NumOpenTotal == 0)
		{
			Flip = 1 - Flip;
			LastBalance = Balance;

			if(Flip)
			{
				adviseLong(MethodRet, 0, Sig, 9);
				Lots = 1; enterLong();
			}
			else
			{
				adviseShort(MethodRet, 0, Sig, 9);
				Lots = 1; enterShort();
			}
		}
	}
	else
	{
		PredL = adviseLong(MethodBase, 0, Sig, 9);
		PredS = adviseShort(MethodBase, 0, Sig, 9);

		// Bootstrap if model has no signal yet
		if(NumOpenTotal == 0 && PredL == 0 && PredS == 0)
		{
			LastBalance = Balance;
			var s = Sig[1] + Sig[4];
			if(s > 0) { Lots=1; enterLong(); }
			else if(s < 0) { Lots=1; enterShort(); }
			else
			{
				if(random(1) < 0.5) { Lots=1; enterLong(); }
				else                { Lots=1; enterShort(); }
			}
		}
		else
		{
			Pred = PredL - PredS;
			Lev  = clamp(Pred * LevScale, -MaxLev, MaxLev);

			if(Lev > PredThr)       { exitShort(); Lots=1; enterLong();  }
			else if(Lev < -PredThr) { exitLong();  Lots=1; enterShort(); }
			else                    { exitLong();  exitShort(); }
		}
	}

	// RL reward + update (episode ends when we go from having positions to flat)
	var RewardNorm = 0;

	if(PrevOpenTotal > 0 && NumOpenTotal == 0)
	{
		var dBal = Balance - LastBalance;

		// reward normalization: per (1+HoldBars)
		RewardNorm = dBal / (1 + (var)HoldBars);

		// clip reward
		if(RewardNorm >  REWARD_CLIP) RewardNorm =  REWARD_CLIP;
		if(RewardNorm < -REWARD_CLIP) RewardNorm = -REWARD_CLIP;

		// Update singles (always)
		int p;
		for(p=0; p<NPAR; p++)
			updateArm(p, CurArm[p], RewardNorm);

		// Derived TF2 index (based on CURRENT arms)
		int tf2v_now = tf2ValueFromArms(CurArm[P_TF1], CurArm[P_TF2D]);
		int tf2Idx_now = tf2IndexFromValue(tf2v_now);

		// W2max bin (based on CURRENT arms)
		int FD2v = (int)armValue(P_FDLEN2, CurArm[P_FDLEN2]);
		int SL2v = (int)armValue(P_SLOPELEN2, CurArm[P_SLOPELEN2]);
		int VL2v = (int)armValue(P_VOLLEN2, CurArm[P_VOLLEN2]);
		int w2mx = FD2v;
		if(SL2v > w2mx) w2mx = SL2v;
		if(VL2v > w2mx) w2mx = VL2v;
		int w2bin_now = w2BinFromValue(w2mx);

		// Update derived communication tables
		updatePairRes_TF12(CurArm[P_TF1], tf2Idx_now, RewardNorm);
		updatePair_TF2_W2(tf2Idx_now, w2bin_now, RewardNorm);
		updatePairRes_TF2_HB(tf2Idx_now, CurArm[P_HOLDBARS], RewardNorm);

		// Update existing residual pairs
		updatePairRes_FD1SL1(CurArm[P_FDLEN1], CurArm[P_SLOPELEN1], RewardNorm);
		updatePairRes_FD2SL2(CurArm[P_FDLEN2], CurArm[P_SLOPELEN2], RewardNorm);
		updatePairRes_LS_PT(CurArm[P_LEVSCALE], CurArm[P_PREDTHR], RewardNorm);
		updatePairRes_ML_HB(CurArm[P_MAXLEV], CurArm[P_HOLDBARS], RewardNorm);

		// Next episode parameters
		pickParams();
		LastBalance = Balance;
	}

	PrevOpenTotal = NumOpenTotal;

	// Logging
	string ModeStr = "Trade";
	if(Train) ModeStr = "Train";
	else if(Test) ModeStr = "Test";

	file_append(LogFN, strf("%04i-%02i-%02i,%02i:%02i,%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%.6f,%.3f,%.6f,%d,%.6f,%.8f,%.8f,%.6f,%.8f,%.8f,%.8f,%.6f,%.6f,%.6f,%.6f,%.6f,%d,%d,%d\n",
		year(0),month(0),day(0), hour(0),minute(0),
		ModeStr, Bar,
		TF1, TF2, TF2d,
		FDLen1, SlopeLen1, VolLen1,
		FDLen2, SlopeLen2, VolLen2,
		LevScale, MaxLev, PredThr, HoldBars,
		Sig[0], Sig[1], Sig[2], Sig[3], Sig[4], Sig[5],
		Sig[6], Sig[7], Sig[8],
		PredL, PredS, Pred, Lev, RewardNorm,
		tf2Idx, W2max, W2bin
	));

	// Plots
	plot("FD_TF1",    Sig[0], NEW, 0);
	plot("FD_TF2",    Sig[3], 0, 0);
	plot("Slope_TF1", Sig[1], 0, 0);
	plot("Slope_TF2", Sig[4], 0, 0);
	plot("Vol_TF1",   Sig[2], 0, 0);
	plot("Vol_TF2",   Sig[5], 0, 0);
	plot("Aslope",    Sig[6], 0, 0);
	plot("Asigma",    Sig[7], 0, 0);
	plot("AFD",       Sig[8], 0, 0);
	plot("Pred",      Pred, 0, 0);
	plot("Lev",       Lev, 0, 0);
	plot("RewardN",   RewardNorm, 0, 0);
}

Runge–Kutta methods [Re: TipmyPip] #489121
01/24/26 07:26
01/24/26 07:26
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
[Linked Image]

Stage-relabeling invariance for Runge–Kutta methods

In normal Runge–Kutta / numerical methods language, the idea is usually described like this:

1) Runge–Kutta stage reordering does not change the method

An explicit Runge–Kutta method works by doing several internal “sub-steps” (called stages) inside one time step.
The theorem says:

If you relabel or reorder those stages (stage 1 becomes stage 2, stage 2 becomes stage 1, etc.)

and you also rearrange the method’s coefficient tables to match that new ordering then the method still produces the same final result for the step.

So even though the internal bookkeeping changes, the actual computed step result does not change (except for tiny rounding differences).

This is commonly called: Stage permutation invariance or invariance under stage relabeling

2) This still stays true when you chain methods

Your code doesn’t just run one RK method; it chains two of them: Run method 2 for one step, then run method 1 for one step

The theorem also says:

If you replace method 1 with an equivalent “stage-reordered version” and/or replace method 2 with an equivalent “stage-reordered version”
then the chained result is still the same.

This is often described as:

Composition is compatible with stage reordering or more formally: composition is well-defined under the equivalence

If you want one “research-style umbrella description” (still readable):

Reordering stages does not change RK results, and it also does not change the results of chaining RK methods.

Short introduction to theorem_382A (in terms of your code)

Your code is testing this exact claim:

You build two normal RK methods:

Heun RK2 (2 stages) Classical RK4 (4 stages)

You build reordered versions of the same methods by shuffling their stage order, while also shuffling the tables correctly.

You run both versions on the same ODE step and compare:

Original chain: RK4 step then Heun step

Reordered chain: permuted RK4 step then permuted Heun step

The theorem predicts:

The two final answers should match (tiny floating-point rounding differences are okay)

So the code shows you can reorder internal stages for implementation reasons (like memory layout or simpler storage) without changing what the method computes.

How demo_theorem_382A_5cases_main.c shows this, step by step
Step 1 — Build the original RK methods

Your code creates two well-known methods:

make_heun_rk2() fills M1_s, M1_A, M1_b, M1_c
This represents Heun’s RK2.

make_rk4() fills M2_s, M2_A, M2_b, M2_c
This represents classical RK4.

Each method is stored in the standard RK “tableau” parts:

s = how many stages the method has

c[] = where each stage evaluates time inside the step

b[] = how the final result mixes the stage derivatives

A = how later stages use earlier stage derivatives

Important detail in your implementation:
You store only the lower triangle of A (Atri), because explicit RK never needs “future stage depends on future stage” terms.

That’s why you use:

Aidx(i,j), getA(), setA() for triangular indexing.

Step 2 — Build equivalent “stage-reordered” versions

Inside buildAll() you define permutations:

p1 reorders Heun’s stages

p2 reorders RK4’s stages

Then permute_method_into(...) produces:

M1B_* from M1_*

M2B_* from M2_*

What that function does (in plain words):

It rearranges b[] and c[] so stage i in the new method corresponds to the correct original stage.

It also rearranges the stored A entries so that “stage i depends on stage j” is preserved under the new labels.

So the permuted versions are not “different methods.”
They are the same methods written with a different stage numbering.

That is exactly what theorem_382A is about.

Step 3 — Define composition exactly as the theorem describes

The theorem talks about chaining methods.

Your code implements chaining here:

var compose_step(...)
{
var y_mid = rk_step(method2, ...);
var y_out = rk_step(method1, ..., y_mid, ...);
return y_out;
}


So composition means:

Apply method 2 for one step ? get intermediate result

Apply method 1 for one step ? get final result

Then your comparison is:

yA = composition using original methods (M1 and M2)

yB = composition using permuted methods (M1B and M2B)

The theorem predicts: yA and yB should match.

Step 4 — Test it on 5 different ODE examples

run_case(k, ...) sets G_CASE = k, which changes what f_ode(t,y) returns.

You test five different behaviors: simple decay, simple growth, faster decay (stiffer), nonlinear growth with saturation (logistic)
forcing term that depends on time.

This shows the theorem isn’t only working for one “easy” equation — it works across different ODE styles.

Step 5 — What the printed output means

Each test prints: yA result (original chain), yB result (permuted chain), diff = how far apart they are
If theorem_382A is correct (and your permutation is applied correctly): diff should be extremely small (often near machine precision)

Code
// demo_theorem_5cases.c (Zorro lite-C)
// Demonstrates theorem_382A: stage-permuted RK methods are equivalent.

#define MAXS 4
#define TRI_A 6

static var G_k[MAXS];

// ---- RK tableaus (Heun2 and RK4) + permuted versions ----
static int   M1_s = 0; static float M1_A[TRI_A]; static float M1_b[MAXS]; static float M1_c[MAXS];
static int   M2_s = 0; static float M2_A[TRI_A]; static float M2_b[MAXS]; static float M2_c[MAXS];
static int   M1B_s = 0; static float M1B_A[TRI_A]; static float M1B_b[MAXS]; static float M1B_c[MAXS];
static int   M2B_s = 0; static float M2B_A[TRI_A]; static float M2B_b[MAXS]; static float M2B_c[MAXS];

// ---- selects which ODE example to run ----
static int G_CASE = 0;

int Aidx(int i, int j)
{
  return (i*(i-1))/2 + j;
}

var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// --------- 5 different ODEs (scalar) ---------
// Case 1: y' = -y                      (stable linear)
// Case 2: y' =  y                      (unstable linear)
// Case 3: y' = -10y                    (stiffer linear)
// Case 4: y' = y*(1-y)                 (logistic nonlinear)
// Case 5: y' = -2y + sin(t)            (time-dependent forcing)
var f_ode(var t, var y)
{
  if(G_CASE == 1) return -y;
  if(G_CASE == 2) return  y;
  if(G_CASE == 3) return -10*y;
  if(G_CASE == 4) return y*(1-y);
  // case 5
  return -2*y + sin(t);
}

var rk_step(int s, float* Atri, float* b, float* c, var t, var y, var h)
{
  int i,j;

  if(s < 1) return y;
  if(s > MAXS) s = MAXS;

  for(i=0;i<s;i++) G_k[i] = 0;

  for(i=0;i<s;i++)
  {
    var yi = y;
    for(j=0;j<i;j++)
      yi += h * getA(Atri,i,j) * G_k[j];

    G_k[i] = f_ode(t + (var)c[i]*h, yi);
  }

  var y1 = y;
  for(i=0;i<s;i++)
    y1 += h * (var)b[i] * G_k[i];

  return y1;
}

var compose_step(int s1,float* A1,float* b1,float* c1,
                 int s2,float* A2,float* b2,float* c2,
                 var t, var y, var h)
{
  var y_mid = rk_step(s2,A2,b2,c2,t,y,h);
  var y_out = rk_step(s1,A1,b1,c1,t,y_mid,h);
  return y_out;
}

void permute_method_into(int s, float* A, float* b, float* c,
                         int* p,
                         int* sOut, float* Aout, float* bout, float* cout)
{
  int i,j;

  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(i=0;i<TRI_A;i++) Aout[i] = 0;
  for(i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  for(i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  for(i=0;i<*sOut;i++)
    for(j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

void make_heun_rk2()
{
  zeroTableau(&M1_s, M1_A, M1_b, M1_c);
  M1_s = 2;
  M1_c[0] = 0.0; M1_c[1] = 1.0;
  setA(M1_A, 1, 0, 1.0);
  M1_b[0] = 0.5; M1_b[1] = 0.5;
}

void make_rk4()
{
  zeroTableau(&M2_s, M2_A, M2_b, M2_c);
  M2_s = 4;

  M2_c[0]=0.0; M2_c[1]=0.5; M2_c[2]=0.5; M2_c[3]=1.0;
  setA(M2_A, 1, 0, 0.5);
  setA(M2_A, 2, 1, 0.5);
  setA(M2_A, 3, 2, 1.0);

  M2_b[0]=(float)(1./6.);
  M2_b[1]=(float)(1./3.);
  M2_b[2]=(float)(1./3.);
  M2_b[3]=(float)(1./6.);
}

void buildAll()
{
  make_heun_rk2();
  make_rk4();

  // example permutations (you can change these freely; theorem_382A says results stay the same)
  int p1[MAXS]; p1[0]=1; p1[1]=0; p1[2]=0; p1[3]=0;
  int p2[MAXS]; p2[0]=0; p2[1]=2; p2[2]=1; p2[3]=3;

  permute_method_into(M1_s, M1_A, M1_b, M1_c, p1, &M1B_s, M1B_A, M1B_b, M1B_c);
  permute_method_into(M2_s, M2_A, M2_b, M2_c, p2, &M2B_s, M2B_A, M2B_b, M2B_c);
}

void run_case(int k, var t, var y0, var h)
{
  G_CASE = k;

  var yA = compose_step(M1_s,  M1_A,  M1_b,  M1_c,
                        M2_s,  M2_A,  M2_b,  M2_c,
                        t, y0, h);

  var yB = compose_step(M1B_s, M1B_A, M1B_b, M1B_c,
                        M2B_s, M2B_A, M2B_b, M2B_c,
                        t, y0, h);

  printf("\nCase %d: yA=%.17g  yB=%.17g  |diff|=%.3e", k, yA, yB, abs(yA-yB));
}

function main()
{
  buildAll();

  // 5 different examples demonstrating theorem_382A invariance under stage permutation
  run_case(1, 0.0, 1.0, 0.1);   // stable linear
  run_case(2, 0.0, 1.0, 0.1);   // unstable linear
  run_case(3, 0.0, 1.0, 0.05);  // stiffer linear (smaller h)
  run_case(4, 0.0, 0.2, 0.1);   // nonlinear logistic
  run_case(5, 1.0, 1.0, 0.1);   // time-dependent forcing (t!=0)

  printf("\n");
  quit("done");
}

Last edited by TipmyPip; 01/28/26 10:59.
RK-382A Neural ODE [Re: TipmyPip] #489128
01/27/26 06:01
01/27/26 06:01
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
RK-382A Neural ODE Composition Invariance Demo

This code is a small “bridge” between two ideas:

Neural ODE (the model being solved)
The system has a hidden state called h that changes over time.
The rule for how it changes is a neural-style function called f_theta.
So at any moment, you plug the current time and the current state into f_theta, and it tells you the “instantaneous change” of h.
In this demo, f_theta is a tiny hard-coded neural network:

it mixes time and state using fixed weights,

passes that through a safe activation (softsign),

then outputs the change rate.

Runge–Kutta method (the numerical engine)
Because you usually can’t solve this kind of time-evolution exactly, you approximate it by stepping forward in small time increments.
A Runge–Kutta method does this by computing several stage evaluations of f_theta inside one step.
Each stage is like “peek at the slope using a specific intermediate time and intermediate state.”
Then it combines those stage slopes into the next state value.

What Theorem 382A is demonstrating here

Runge–Kutta methods have “internal stages.” You can relabel the order of stages as long as you also relabel all the method’s coefficients consistently (the A, b, and c tables).
When you do that correctly, you have not changed the actual numerical method—only the labeling of its internal work.

So Theorem 382A, in practical terms, says:

If you swap or reorder stages and you remap every coefficient consistently, the method’s output for a step stays the same (up to tiny floating-point rounding differences).

What the code does, in plain symbolic steps

It defines a tiny Neural ODE: “state h evolves according to f_theta(time, h).”

It builds two Runge–Kutta solvers: one is RK4 (four stages), one is Heun (two stages).

It creates permuted versions of both solvers by reordering their stages and remapping their coefficients.

It then performs a two-step composition: First it advances one step using RK4.

Then it advances one step using Heun, starting from the later time (important because f_theta depends on time).

It runs that same two-step procedure again, but using the permuted-but-equivalent RK4 and Heun.

Finally it prints both results and the absolute difference.

Code
// demo_neuralODE_theorem_382A_main.c (Zorro lite-C)
// Minimal Neural ODE example + Theorem 382A demo:
// - dh/dt = f_theta(t,h) where f_theta is a tiny "neural net" (hard-coded weights)
// - Integrate with composition: RK4 step then Heun RK2 step
// - Build stage-permuted equivalents of both methods
// - Show composed result is unchanged (Theorem 382A)
// Runs in main() (no bars/history). No malloc. No typedef unsigned.

#define MAXS 4
#define TRI_A 6   // for MAXS=4: 4*3/2 = 6

static var G_k[MAXS];

// ---- RK tableaus: Heun2 (M1) and RK4 (M2) + permuted versions (M1B, M2B)
static int   M1_s = 0;  static float M1_A[TRI_A];  static float M1_b[MAXS];  static float M1_c[MAXS];
static int   M2_s = 0;  static float M2_A[TRI_A];  static float M2_b[MAXS];  static float M2_c[MAXS];
static int   M1B_s = 0; static float M1B_A[TRI_A]; static float M1B_b[MAXS]; static float M1B_c[MAXS];
static int   M2B_s = 0; static float M2B_A[TRI_A]; static float M2B_b[MAXS]; static float M2B_c[MAXS];

// ----------------- Triangular indexing (explicit RK uses only i>j) -----------------
int Aidx(int i, int j)
{
  return (i*(i-1))/2 + j;
}

var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// ----------------- "Neural network" for the Neural ODE -----------------
// Hidden state: h(t) (scalar for simplicity)
// dh/dt = f_theta(t,h)
//
// We avoid fancy ops and use a safe activation: softsign(z) = z / (1 + abs(z))
var softsign(var z)
{
  return z / (1.0 + abs(z));
}

// Hard-coded weights (this stands in for a trained NN)
var f_theta(var t, var h)
{
  // "features" could be more inputs; here we just use t and h
  // layer1: z = w_h*h + w_t*t + b
  var z = 1.20*h + 0.70*t - 0.30;

  // activation
  var a = softsign(z);

  // output layer: dh/dt = w*a + b2
  return 0.90*a + 0.10;
}

// ----------------- One explicit RK step for Neural ODE -----------------
var rk_step(int s, float* Atri, float* b, float* c, var t, var y, var h)
{
  int i,j;

  if(s < 1) return y;
  if(s > MAXS) s = MAXS;

  for(i=0;i<s;i++) G_k[i] = 0;

  for(i=0;i<s;i++)
  {
    var yi = y;
    for(j=0;j<i;j++)
      yi += h * getA(Atri,i,j) * G_k[j];

    G_k[i] = f_theta(t + (var)c[i]*h, yi);
  }

  var y1 = y;
  for(i=0;i<s;i++)
    y1 += h * (var)b[i] * G_k[i];

  return y1;
}

// ----------------- Composition: do method2 step, then method1 step -----------------
// IMPORTANT for time-dependent ODE: the second step starts at t+h.
var compose_step(int s1,float* A1,float* b1,float* c1,
                 int s2,float* A2,float* b2,float* c2,
                 var t, var y, var h)
{
  var y_mid = rk_step(s2,A2,b2,c2,t,   y,     h);
  var y_out = rk_step(s1,A1,b1,c1,t+h, y_mid, h);
  return y_out;
}

// ----------------- Stage permutation (build equivalent tableau) -----------------
void permute_method_into(int s, float* A, float* b, float* c,
                         int* p,
                         int* sOut, float* Aout, float* bout, float* cout)
{
  int i,j;

  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(i=0;i<TRI_A;i++) Aout[i] = 0;
  for(i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  // permute b,c
  for(i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  // permute A (lower triangle only)
  for(i=0;i<*sOut;i++)
    for(j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

// ----------------- Build Heun RK2 and classic RK4 -----------------
void make_heun_rk2()
{
  zeroTableau(&M1_s, M1_A, M1_b, M1_c);
  M1_s = 2;
  M1_c[0] = 0.0; M1_c[1] = 1.0;
  setA(M1_A, 1, 0, 1.0);
  M1_b[0] = 0.5; M1_b[1] = 0.5;
}

void make_rk4()
{
  zeroTableau(&M2_s, M2_A, M2_b, M2_c);
  M2_s = 4;

  M2_c[0]=0.0; M2_c[1]=0.5; M2_c[2]=0.5; M2_c[3]=1.0;
  setA(M2_A, 1, 0, 0.5);
  setA(M2_A, 2, 1, 0.5);
  setA(M2_A, 3, 2, 1.0);

  M2_b[0]=(float)(1./6.);
  M2_b[1]=(float)(1./3.);
  M2_b[2]=(float)(1./3.);
  M2_b[3]=(float)(1./6.);
}

void buildAll()
{
  make_heun_rk2();
  make_rk4();

  // Example stage permutations:
  // Heun has 2 stages: swap them (1,0)
  int p1[MAXS]; p1[0]=1; p1[1]=0; p1[2]=0; p1[3]=0;

  // RK4 has 4 stages: swap middle two (0,2,1,3)
  int p2[MAXS]; p2[0]=0; p2[1]=2; p2[2]=1; p2[3]=3;

  permute_method_into(M1_s, M1_A, M1_b, M1_c, p1, &M1B_s, M1B_A, M1B_b, M1B_c);
  permute_method_into(M2_s, M2_A, M2_b, M2_c, p2, &M2B_s, M2B_A, M2B_b, M2B_c);
}

// ----------------- Entry point -----------------
function main()
{
  buildAll();

  // Neural ODE initial condition (hidden state)
  var t0 = 0.0;
  var h0 = 0.25;

  // One composed "macro step": RK4 step of size dt, then Heun step of size dt
  // Total time advanced = 2*dt because we advance t by dt between the two steps.
  var dt = 0.10;

  // Original composition
  var hA = compose_step(M1_s,  M1_A,  M1_b,  M1_c,
                        M2_s,  M2_A,  M2_b,  M2_c,
                        t0, h0, dt);

  // Permuted-but-equivalent composition (Theorem 382A says this should match)
  var hB = compose_step(M1B_s, M1B_A, M1B_b, M1B_c,
                        M2B_s, M2B_A, M2B_b, M2B_c,
                        t0, h0, dt);

  printf("\nNeural ODE demo: dh/dt = f_theta(t,h)");
  printf("\nStart: t=%.3f  h=%.17g", t0, h0);
  printf("\nStep:  RK4(dt) then Heun(dt), dt=%.3f (total time +%.3f)", dt, 2*dt);

  printf("\n\nhA = compose(original RK4, original Heun)   = %.17g", hA);
  printf("\nhB = compose(permuted RK4, permuted Heun)   = %.17g", hB);
  printf("\nabs diff = %.3e\n", abs(hA - hB));

  quit("done");
}

Attached Files
NeuralODE01.zip (12 downloads)
The Stage Shuffle Cache Saver [Re: TipmyPip] #489131
01/28/26 09:08
01/28/26 09:08
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
The following code is a demonstration tale about stage shuffling. It shows that you can reorder the internal steps of an explicit solver, rebuild its coefficient tables to match the new order, and still land on the same final state after one composed update. At the same time, the new order can be cheaper because it improves the reuse of an expensive time based feature.

The program defines a tiny time dependent neural style rule that drives a single hidden variable forward. The rule mixes the current hidden value with a transformed view of the current time, pushes the mixture through a safe softsign activation, and returns a rate of change. The time transform uses the sine function, treated here as the costly part. A deliberately small cache stores only the most recent time and the most recent sine value, and a counter records how many real sine evaluations were required.

Two explicit methods are built using compact Runge Kutta tables. The first is a two stage Heun step used as a finishing pass. The second is a custom four stage step designed to include a repeated stage time. In the original stage order, two stages share the same time offset within the step, but they are separated by another stage that uses a different time. With a one entry cache, that separation forces the cache to forget the repeated time and compute sine again later.

To keep memory small, the dependency coefficients are stored in packed triangular form. Helper functions translate a stage pair into a packed index, read a coefficient, and write a coefficient. Another helper clears a tableau so the build routines start from a clean slate.

The core routine that advances the state performs one explicit step: it loops over stages, forms each stage state from earlier stage derivatives and dependency weights, evaluates the neural rule at the stage time, then combines all stage derivatives with the output weights to form the step result. A composition routine chains two such steps, advancing time between them because the rule depends on time.

The stage permutation routine builds an equivalent tableau from an original tableau and a permutation map. It permutes the stage time list and output weights, and it relabels the dependency weights consistently so each new stage still refers to the correct original stage values. In this demo, the Heun method is left in its natural order, while the custom four stage method is permuted by swapping two stages so the two equal time evaluations become adjacent.

The entry point runs the composed update twice. First it resets the cache and counter, applies the original methods, and records the output and sine count. Then it resets again, applies the permuted methods, and records the output and sine count. It prints a correctness check that the two outputs match closely, and it prints an advantage check showing fewer sine calls after the shuffle. The story is simple: the reveal stays the same, but the backstage choreography becomes smarter. In real Zorro strategies, the expensive piece could be an indicator update, a custom feature lookup, or a broker query; arranging repeated inputs next to each other can reduce overhead without changing trading logic.

Code
// demo_neuralODE_theorem_382A_advantage_main.c (Zorro lite-C)
//
// What this program demonstrates
// ------------------------------
// We integrate a simple *time-dependent* Neural ODE:
//
//     dh/dt = f_theta(t, h)
//
// using a composed integrator: first a 4-stage explicit RK method (M2),
// then a 2-stage Heun RK2 method (M1).
//
// Theorem 382A (as used here) says:
// If you *relabel / reorder* the internal RK stages and you permute the
// RK coefficient tables (A,b,c) *consistently*, you obtain an *equivalent*
// RK method that produces the same one-step result. This remains true
// when chaining methods (composition).
//
// The "advantage" shown here:
// The ODE right-hand-side f_theta() uses sin(t) (treated as expensive).
// M2 evaluates f_theta twice at the same stage time (t + 0.5*dt), but those
// two equal-time stages are NOT adjacent in the original ordering. We keep a
// very small cache that remembers only the *last* sin(t) computed.
// By permuting M2’s stages so the equal-time stages become adjacent, the cache
// hits more often -> fewer sin() computations, while the final result is unchanged.

#define MAXS 4
#define TRI_A 6   // number of stored A-entries for an explicit 4-stage RK (lower triangle only)

static var  G_k[MAXS]; // stage derivatives k[i] for one RK step

// ---- Method M1: Heun RK2 (2 stages) ----
// Stored as a classic RK tableau (A,b,c) in compact triangular form for A.
static int   M1_s = 0;         // number of stages
static float M1_A[TRI_A];      // A coefficients (lower triangle packed)
static float M1_b[MAXS];       // output weights
static float M1_c[MAXS];       // stage time offsets (as fraction of step size)

// ---- Method M2: custom 4-stage explicit RK with a repeated stage time ----
// Stage evaluation times (c values) are: 0, 0.5, 1.0, 0.5
// Stages 1 and 3 both evaluate at t + 0.5*dt, but they are separated by stage 2.
// With a "remember only last t" cache, this separation causes cache misses.
// We'll build a permuted version where the two 0.5 stages are adjacent.
static int   M2_s = 0;
static float M2_A[TRI_A];
static float M2_b[MAXS];
static float M2_c[MAXS];

// ---- Permuted (“B”) versions of the same methods ----
// These are produced by reordering stages and permuting A,b,c consistently.
// They should yield the same numerical one-step result as the originals.
static int   M1B_s = 0;
static float M1B_A[TRI_A];
static float M1B_b[MAXS];
static float M1B_c[MAXS];

static int   M2B_s = 0;
static float M2B_A[TRI_A];
static float M2B_b[MAXS];
static float M2B_c[MAXS];

// ---------------------------------------------------------------------------
// Compact storage for explicit RK tableaus
// ---------------------------------------------------------------------------
// For explicit RK, only entries with i>j are used (lower triangle).
// We store those coefficients in a 1D array using triangular indexing.

int Aidx(int i, int j)
{
  // maps (i,j) with i>j into packed array position
  return (i*(i-1))/2 + j;
}

var getA(float* Atri, int i, int j)
{
  // explicit RK has A[i,j] = 0 for i<=j
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

void setA(float* Atri, int i, int j, var v)
{
  // only store lower-triangular entries
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  // reset an RK tableau to all zeros
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// ---------------------------------------------------------------------------
// Neural ODE right-hand side + cost accounting
// ---------------------------------------------------------------------------
// We count how many *actual* sin() computations happen to show the optimization.
// sin_cached() is intentionally tiny: it remembers only the last (t, sin(t)) pair.
// Therefore, two equal-time evaluations only benefit if they occur consecutively.

static int G_sinCalls = 0;   // number of times we really call sin()
static var G_lastSinT = 1e99;
static var G_lastSinV = 0;

var sin_cached(var t)
{
  // if the time matches the previous call (within tolerance), reuse the last value
  if(abs(t - G_lastSinT) < 1e-12)
    return G_lastSinV;

  // otherwise compute and remember
  G_lastSinT = t;
  G_lastSinV = sin(t);
  G_sinCalls += 1;
  return G_lastSinV;
}

// smooth bounded activation used inside the toy neural function
var softsign(var z)
{
  return z / (1.0 + abs(z));
}

// Neural ODE: dh/dt = f_theta(t,h)
// Uses sin_cached(t) so stage ordering can reduce sin() calls.
var f_theta(var t, var h)
{
  // treat this as a small "neural net" with hard-coded weights.
  // sin(t) acts as an expensive feature transform.
  var s = sin_cached(t);

  var z = 1.20*h + 0.70*s - 0.30;
  var a = softsign(z);
  return 0.90*a + 0.10;
}

// ---------------------------------------------------------------------------
// One explicit RK step: y -> y1 over step size h
// ---------------------------------------------------------------------------
// Computes stages k[i] = f(t + c[i]*h, y + h * sum_j A[i,j]*k[j])
// then combines them with weights b[i].

var rk_step(int s, float* Atri, float* b, float* c, var t, var y, var h)
{
  int i,j;

  if(s < 1) return y;
  if(s > MAXS) s = MAXS;

  // clear stage array
  for(i=0;i<s;i++) G_k[i] = 0;

  // stage loop
  for(i=0;i<s;i++)
  {
    // build the stage state yi from previous stages
    var yi = y;
    for(j=0;j<i;j++)
      yi += h * getA(Atri,i,j) * G_k[j];

    // evaluate derivative at stage time
    G_k[i] = f_theta(t + (var)c[i]*h, yi);
  }

  // combine stages into output
  var y1 = y;
  for(i=0;i<s;i++)
    y1 += h * (var)b[i] * G_k[i];

  return y1;
}

// ---------------------------------------------------------------------------
// Method composition: do M2 step first, then M1 step
// ---------------------------------------------------------------------------
// Because f_theta depends on time, the second step must start at (t + h).
// This function advances the solution by two substeps of size h in time.

var compose_step(int s1,float* A1,float* b1,float* c1,
                 int s2,float* A2,float* b2,float* c2,
                 var t, var y, var h)
{
  var y_mid = rk_step(s2,A2,b2,c2,t,   y,     h);  // first substep at time t
  var y_out = rk_step(s1,A1,b1,c1,t+h, y_mid, h);  // second substep at time t+h
  return y_out;
}

// ---------------------------------------------------------------------------
// Build a stage-permuted (equivalent) RK tableau
// ---------------------------------------------------------------------------
// Input: original (A,b,c) with s stages and a permutation array p[].
// Output: (Aout,bout,cout) where stage i of the new method corresponds to stage p[i]
// of the old method. This is the “stage relabeling” used for Theorem 382A.

void permute_method_into(int s, float* A, float* b, float* c,
                         int* p,
                         int* sOut, float* Aout, float* bout, float* cout)
{
  int i,j;

  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  // clear outputs
  for(i=0;i<TRI_A;i++) Aout[i] = 0;
  for(i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  // permute b and c (stage weights and stage times)
  for(i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  // permute A (stage-to-stage dependencies), lower triangle only
  for(i=0;i<*sOut;i++)
    for(j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

// ---------------------------------------------------------------------------
// Define the two base RK methods (M1, M2)
// ---------------------------------------------------------------------------

void make_heun_rk2()
{
  // Heun / explicit trapezoid method (2 stages)
  zeroTableau(&M1_s, M1_A, M1_b, M1_c);
  M1_s = 2;

  M1_c[0] = 0.0; // stage 0 at t
  M1_c[1] = 1.0; // stage 1 at t+h

  setA(M1_A, 1, 0, 1.0); // y1 stage uses k0

  M1_b[0] = 0.5;         // average of k0 and k1
  M1_b[1] = 0.5;
}

void make_custom4_repeatTime()
{
  // A simple 4-stage explicit method that evaluates two stages at the same time (0.5)
  // but places them apart in the stage sequence to create cache misses.
  zeroTableau(&M2_s, M2_A, M2_b, M2_c);
  M2_s = 4;

  // stage times (fractions of step size)
  M2_c[0] = 0.0;
  M2_c[1] = 0.5;  // repeated time #1
  M2_c[2] = 1.0;
  M2_c[3] = 0.5;  // repeated time #2 (same as stage 1)

  // explicit dependencies between stages
  setA(M2_A, 1, 0, 0.5); // stage1 uses stage0
  setA(M2_A, 2, 1, 1.0); // stage2 uses stage1
  setA(M2_A, 3, 0, 0.5); // stage3 uses stage0 (so we can move it relative to stage2)

  // simple equal weights (this is a demo method, not a named high-order scheme)
  M2_b[0] = 0.25;
  M2_b[1] = 0.25;
  M2_b[2] = 0.25;
  M2_b[3] = 0.25;
}

void buildAll()
{
  // Build original methods
  make_heun_rk2();
  make_custom4_repeatTime();

  // Stage permutations used to construct equivalent “B” methods:
  //
  // - M1 (Heun) is left unchanged (identity permutation).
  // - M2 swaps stages 2 and 3 so the two 0.5-time stages become adjacent:
  //     old: 0, 1(0.5), 2(1.0), 3(0.5)
  //     new: 0, 1(0.5), 3(0.5), 2(1.0)
  //
  // This improves the hit rate of the tiny sin(t) cache.

  int p1[MAXS]; p1[0]=0; p1[1]=1; p1[2]=0; p1[3]=0;
  int p2[MAXS]; p2[0]=0; p2[1]=1; p2[2]=3; p2[3]=2;

  permute_method_into(M1_s, M1_A, M1_b, M1_c, p1, &M1B_s, M1B_A, M1B_b, M1B_c);
  permute_method_into(M2_s, M2_A, M2_b, M2_c, p2, &M2B_s, M2B_A, M2B_b, M2B_c);
}

// reset sin-cache state and call counter (for fair “before vs after” cost comparison)
void reset_cost_counters()
{
  G_sinCalls = 0;
  G_lastSinT = 1e99;
  G_lastSinV = 0;
}

// ---------------------------------------------------------------------------
// Program entry point
// ---------------------------------------------------------------------------
// 1) Build original and permuted RK methods.
// 2) Run the composed integrator once with original methods; record sin() count.
// 3) Run again with permuted-but-equivalent methods; record sin() count.
// 4) Print:
//    - output equality (correctness / equivalence)
//    - sin() calls saved (cost advantage)

function main()
{
  buildAll();

  var t0 = 0.0;   // start time
  var h0 = 0.25;  // initial hidden state
  var dt = 0.10;  // step size used by each substep of the composition

  // ---- Original composition (M2 then M1) ----
  reset_cost_counters();
  var outA = compose_step(M1_s, M1_A, M1_b, M1_c,
                          M2_s, M2_A, M2_b, M2_c,
                          t0, h0, dt);
  int sinA = G_sinCalls;

  // ---- Permuted composition (M2B then M1B) ----
  reset_cost_counters();
  var outB = compose_step(M1B_s, M1B_A, M1B_b, M1B_c,
                          M2B_s, M2B_A, M2B_b, M2B_c,
                          t0, h0, dt);
  int sinB = G_sinCalls;

  // ---- Report results: same output, fewer sin() calls ----
  printf("\nNeural ODE: dh/dt = f_theta(t,h) with sin_cached(t) inside f_theta");
  printf("\nStart: t=%.3f  h=%.17g  dt=%.3f", t0, h0, dt);

  printf("\n\n(1) Correctness: permuting stages changes nothing in the final output");
  printf("\n    outA (original stages) = %.17g", outA);
  printf("\n    outB (permuted stages) = %.17g", outB);
  printf("\n    abs diff               = %.3e", abs(outA - outB));

  printf("\n\n(2) Advantage: permuting stages reduces repeated expensive sin(t) work");
  printf("\n    sin() calls original   = %d", sinA);
  printf("\n    sin() calls permuted   = %d", sinB);
  printf("\n    sin() calls saved      = %d", (sinA - sinB));
  printf("\n");
  quit("done");
}

Last edited by TipmyPip; 01/28/26 09:09.
The Next Stage. [Re: TipmyPip] #489132
01/28/26 09:48
01/28/26 09:48
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
The slow, manual style of trading-strategy development is becoming less useful in a world where automated environments evolve faster than any human-led iteration cycle. While traditional workflows once made sense, they now struggle to keep up with systems that can test, adapt, and optimize continuously. As these environments increasingly remove human bottlenecks from research and execution, spending months refining old-style techniques becomes a poor use of effort. For that reason, I’m changing my approach: away from incremental tuning of legacy methods and toward more sophisticated, automation-native development that aligns with how modern markets and tools actually move.

The Clockwork Storm Weaver [Re: TipmyPip] #489140
2 hours ago
2 hours ago
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
This program is a small world in a bottle, built to watch a hidden choir of states move through time. At the start, a seed is chosen like a signature ring, so the same story can be replayed or a new one can be born. A private dice maker lives inside the file, separate from the host environment, turning that seed into a steady stream of chance. From that stream come whispers of randomness that later become fog, static, and sudden shocks.

The world has two faces, called regimes, like day and night. A quiet gatekeeper decides when the face changes, using the current stress of the outside wind as a hint. The outside wind is the exogenous input, a trio of slow waves that bend and drift. This wind does not merely push; it also reshapes the rules of motion. The rules, named theta in the code, are like mood settings that alter damping, coupling strength, and stiffness. When the mood shifts, certain lanes of the hidden choir become harder to move.

The core motion is computed by a staged ritual, a stepper that samples the drift field several times before committing to the next position. Each stage prepares a temporary portrait of the current state, passes it through a softsign mask to keep extremes tame, then blends every voice with every other voice through a dense web of sines. This web is the symbolic neural fabric: it is not trained here, but it behaves like a synthetic network that couples many channels into one flowing response.

To walk safely, the traveler uses an adaptive stride. It tries a full stride and also two half strides, then compares the outcomes. If the difference is small enough, the stride is accepted; if not, the stride is trimmed and tried again. Memory pressure is treated like weather. When memory rises beyond a budget, the traveler becomes less picky about perfection, limits the largest stride, and tightens patience for endless retrials.

After each accepted drift step, the story becomes hybrid. A layer of diffusion is painted onto every coordinate, scaled by the wind, by the current face of the world, and by the magnitude of the state itself. Then, with a rare roll of the internal dice, a jump may strike: a heavy tailed shock clipped to protect the narrative from blowing apart. These jumps touch only a random subset of coordinates, making the path jagged in places.

The world is only partly visible. An observer takes a handful of channels, mixes them, smooths them, and adds a small measurement hiss. The simulation watches both the hidden truth and the noisy report, and it will stop early if either grows unrealistically large or if the drift grows quiet for long enough to be called steady.

Finally, three travelers are sent through the same world. One uses the reference ritual. One uses a carefully permuted ritual that should be equivalent. One uses a careless permutation that is subtly wrong. Their destinations are compared, and the printout reveals whether the refactor preserved meaning or merely rearranged the words.

It is a tale of structure, randomness, and bookkeeping.

Code
// demo_neuralODE_complex_382A_adaptive_main.c (Zorro lite-C)
//
// What this program demonstrates
// ------------------------------
// This is a self-contained simulation of a high-dimensional (N=48) nonlinear
// *hybrid* dynamical system:
//
//   - Deterministic drift:     y' = f_theta(t, y, u(t))
//   - Stochastic diffusion:    additive/multiplicative noise (Euler-style increment)
//   - Random jump shocks:      Poisson-triggered heavy-tailed jumps
//   - Regime switching:        2-state Markov regime (0/1) with state-dependent rates
//   - Partial observability:   produce noisy observed channels x_obs from hidden state y
//
// Numerical method
// ----------------
// The drift (ODE part) is integrated by an explicit Runge–Kutta method with
// adaptive step size using step-doubling:
//
//   full step  : y(t+dt)   from one RK step of size dt
//   half steps : y(t+dt)   from two RK steps of size dt/2
//   error      : ||half - full||  -> accept/reject and adapt dt
//
// If accepted, we then apply (in this order):
//   1) update regime state (Markov transition)
//   2) diffusion increment (noise)
//   3) jump event (rare, heavy-tailed)
//   4) produce observations x_hat and noisy x_obs
//
// What "382A" is about in this file
// --------------------------------
// We build one 4-stage explicit RK tableau. Then we create:
//
//   A) Reference tableau (baseline)
//   B) "Safe" stage refactor: a stage permutation mapped into an equivalent tableau
//      (via permutation of the Butcher coefficients). This should produce identical
//      results up to numerical tolerance.
//   C) "Buggy" stage refactor: a naive permutation that changes the evaluation order
//      without correctly transforming the tableau. This generally changes results.
//
// The program integrates the same initial condition with A/B/C and prints the
// RMS differences vs the reference.
//
// Fixed seeding note
// ------------------
// Zorro has its own RNG (seed(), random()). This simulation DOES NOT use it.
// Instead it uses a custom LCG-based RNG (G_Rng via rng_u01/rng_norm).
// Therefore:
//   - We must seed our custom RNG explicitly (rng_seed(seed)).
//   - We generate a time-varying integer seed in main() and pass it into the
//     integrator. This avoids repeated identical runs when started quickly.

#define MAXS 4
#define TRI_A 6

// -------------------- Dimensions --------------------
// N: hidden state dimension (the "neural" state we integrate)
#define N         48
// K_SIZE: flattened storage for stage derivatives, sized for MAXS stages
#define K_SIZE    192   // MUST be literal: MAXS*N = 4*48 = 192

// u(t): exogenous forcing input (small dimension)
#define U_DIM     3
// partial observation vector size (we observe only a subset of hidden channels)
#define NOBS      12

// -------------------- Base integration settings --------------------
// Final time horizon for integration
#define T_END     20.0
// Starting step size guess for adaptive integration
#define DT_INIT   0.20
// Hard floor and ceiling for the step size
#define DT_MIN    1e-6
#define DT_MAX    0.50
// Base error tolerance for step accept/reject
#define TOL       1e-6

// -------------------- Memory shedding (pressure response) --------------------
// If the process memory rises above this budget, loosen accuracy + reduce dtMax.
// This is a defensive mechanism for long runs / limited environments.
#define MEM_BUDGET_MB 2000
static int G_ShedTriggered = 0;

// Runtime-adjustable knobs (can be modified when shedding triggers)
static var G_Tol   = TOL;
static var G_DtMax = DT_MAX;

// Budgets to stop pathological runs (NOT a strict max-step cap)
// - MaxRejects: how many rejected steps we tolerate before aborting
// - DtMinHitMax: how often we hit dt == DT_MIN before aborting
static int G_MaxRejects     = 2000000;
static int G_DtMinHitMax    = 500000;

// Dynamics-driven termination conditions
// - "steady" means drift magnitude stays tiny for many accepted steps
static var G_SteadyEps      = 1e-7;     // RMS(||y_dot||) threshold
static int G_SteadyNeed     = 250;      // consecutive accepted steps below threshold
// - "blow-up" means state or observed channels become numerically huge
static var G_StateLimitRMS  = 1e6;      // stop if RMS(||y||) explodes
static var G_ObsLimitRMS    = 1e6;      // stop if RMS(||x_obs||) explodes

// -------------------- Stochastic / hybrid parameters --------------------
// Baseline diffusion scale
static var G_SigmaBase = 0.02;
// Baseline jump (shock) intensity and magnitude
static var G_JumpLambdaBase = 0.20;
static var G_JumpMagBase = 0.20;
// Regime switching base rates: 0->1 and 1->0
static var G_RegRate01 = 0.15;
static var G_RegRate10 = 0.10;

// -------------------- Optional memory-tight types --------------------
// Use float for large buffers to reduce memory bandwidth and footprint.
#define TIGHT_MEM 1
#ifdef TIGHT_MEM
typedef float fvar;   // 4 bytes
#else
typedef var fvar;     // 8 bytes (Zorro var)
#endif

// -------------------- Global working buffers (no malloc) --------------------
// RK stage derivatives K_i for each stage i (flattened by stage and component)
static fvar G_K[K_SIZE];
// activation buffer for nonlinear transform of stage state
static fvar G_act[N];

// Stage state input to drift f_theta(t, y)
// and drift output (dy/dt) for the current evaluation
static var  G_stage[N];
static var  G_dh[N];

// Integrator work buffers for step-doubling
static var  G_ycur[N];   // current accepted state
static var  G_full[N];   // one full dt step
static var  G_half[N];   // two half steps combined
static var  G_mid[N];    // intermediate state after first half step

// Initial / final state buffers for the three methods compared
static var  Y0[N];
static var  Yref[N];
static var  Ysafe[N];
static var  Ybug[N];

// Partial observability: predicted clean observation + noisy measurement
static var  G_xhat[NOBS];
static var  G_xobs[NOBS];

// Exogenous input u(t)
static var  G_u[U_DIM];

// -------------------- Custom RNG (deterministic, local) --------------------
// We use our own RNG so results do not depend on Zorro's global RNG state.
static unsigned int G_Rng = 1;

// Seed the custom RNG. Seed must be non-zero.
void rng_seed(unsigned int s)
{
  if(s == 0) s = 1;
  G_Rng = s;
}

// Uniform random in (0,1). Uses a classic LCG update.
var rng_u01()
{
  G_Rng = 1664525u * G_Rng + 1013904223u;
  unsigned int x = (G_Rng >> 8) & 0x00FFFFFFu;
  return ((var)x + 1.0) / 16777217.0; // strictly (0,1)
}

// Standard normal N(0,1) using Box–Muller transform
var rng_norm()
{
  var u1 = rng_u01();
  var u2 = rng_u01();
  var r = sqrt(-2.0 * log(u1));
  var th = 6.283185307179586 * u2;
  return r * cos(th);
}

// -------------------- Utility: memory and shedding --------------------
// Return current process memory usage in MB (best-effort).
int memMB()
{
  var m = memory(0);
  if(!(m > 0)) return 0;
  return (int)(m/(1024*1024));
}

// If memory exceeds a budget, reduce work/pressure by:
//   - loosening tolerance (less strict accuracy)
//   - lowering max dt (more stable steps in volatile regions)
//   - tightening reject/dt-min budgets (avoid endless loops)
void shed_if_needed()
{
  if(G_ShedTriggered) return;

  int mb = memMB();
  if(mb >= MEM_BUDGET_MB)
  {
    G_ShedTriggered = 1;

    // accept larger local error to reduce computation
    G_Tol *= 10.0;

    // cap maximum dt to reduce instability after shedding
    G_DtMax *= 0.5;
    if(G_DtMax < DT_MIN) G_DtMax = DT_MIN;

    // reduce budgets to fail faster if the run is unhealthy
    G_MaxRejects  = (int)(G_MaxRejects/2);
    if(G_MaxRejects < 10000) G_MaxRejects = 10000;

    G_DtMinHitMax = (int)(G_DtMinHitMax/2);
    if(G_DtMinHitMax < 10000) G_DtMinHitMax = 10000;
  }
}

// -------------------- RK tableau helpers (explicit lower triangle storage) --------------------
// We store only the strictly lower triangular part of A (since explicit RK has A[i][j]=0 for j>=i).
// For MAXS=4, the number of stored entries is TRI_A = 4*3/2 = 6.

// Map (i,j) with i>j to index into the packed triangular array
int Aidx(int i, int j)
{
  return (i*(i-1))/2 + j;
}

// Read A(i,j) from packed storage (returns 0 for i<=j)
var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

// Write A(i,j) into packed storage (ignored for i<=j)
void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

// Reset a tableau to zero (s, A, b, c)
void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// -------------------- Nonlinearity used throughout --------------------
// softsign bounds values to (-1,1) and prevents runaway growth vs tanh/sigmoid costs.
var softsign(var z)
{
  return z / (1.0 + abs(z));
}

// -------------------- Exogenous input u(t) --------------------
// u(t) is a small external driving signal that affects both drift and stochastic terms.
// Regime changes can also offset u0 to create different behavior.
static int G_Regime = 0; // 0 or 1

void eval_u(var t)
{
  var u0 = sin(0.11*t);
  u0 *= 0.8;
  if(G_Regime) u0 += 0.35;

  var u1 = cos(0.37*t);
  u1 *= 0.6;

  var u2 = sin(2.30*t);
  u2 *= 0.3;

  *(G_u + 0) = u0;
  *(G_u + 1) = u1;
  *(G_u + 2) = u2;
}

// -------------------- Time-varying drift parameters theta(t) --------------------
// These parameters modulate the drift field over time and with u(t):
//   damp  : linear damping coefficient
//   amp   : forcing amplitude
//   scale : coupling scale in the interaction sum
//   stiff : stiffening factor for certain coordinates (multiscale behavior)
void theta_t(var t, var* dampOut, var* ampOut, var* scaleOut, var* stiffOut)
{
  // baseline values
  var damp  = -0.40;
  var amp   =  0.35;
  var scale =  0.06;

  // slow temporal drift in parameters
  var drift = sin(0.03*t);
  drift *= 0.25;

  // use exogenous input channels to modulate parameters
  var u0 = 0 + *(G_u + 0);
  var u1 = 0 + *(G_u + 1);

  // regime-dependent scaling
  if(G_Regime)
  {
    amp   *= 1.15;
    scale *= 1.10;
    damp  *= 1.05;
  }

  // combine time drift + inputs into parameter modulation
  amp   *= (1.0 + 0.30*drift + 0.15*u0);
  scale *= (1.0 + 0.20*drift + 0.10*u1);
  damp  *= (1.0 + 0.15*drift - 0.10*u0);

  // stiff factor applied to selected coordinates (creates fast/slow modes)
  var stiff = 2.8;

  *(dampOut)  = damp;
  *(ampOut)   = amp;
  *(scaleOut) = scale;
  *(stiffOut) = stiff;
}

// -------------------- Regime switching (2-state Markov chain) --------------------
// The regime flips 0<->1 with probabilities derived from rates r01, r10.
// Rates increase under "stress", which depends on |u0| and |u1|.
void update_regime(var t, var dt)
{
  var u0 = 0 + *(G_u + 0);
  var u1 = 0 + *(G_u + 1);

  var stress = abs(u0) + 0.5*abs(u1);

  var r01 = G_RegRate01 * (1.0 + 0.8*stress);
  var r10 = G_RegRate10 * (1.0 + 0.5*stress);

  var p;
  if(G_Regime == 0)
  {
    p = 1.0 - exp(-r01*dt);
    if(rng_u01() < p) G_Regime = 1;
  }
  else
  {
    p = 1.0 - exp(-r10*dt);
    if(rng_u01() < p) G_Regime = 0;
  }
}

// -------------------- Jump shocks (Poisson events) --------------------
// With probability p ~ 1-exp(-lambda*dt), we apply a rare jump.
// Jump size uses tan(pi*(u-0.5)) which is heavy-tailed; we clamp extremes.
void apply_jump_if_needed(var t, var dt, var* y)
{
  var u0 = 0 + *(G_u + 0);
  var lam = G_JumpLambdaBase * (1.0 + 0.7*abs(u0));
  if(G_Regime) lam *= 1.25;

  var p = 1.0 - exp(-lam*dt);
  if(rng_u01() >= p) return;

  // heavy-tailed variate (Cauchy-like); clamp to keep numerics finite
  var uu = rng_u01() - 0.5;
  var j = tan(3.141592653589793 * uu);

  if(j >  6.0) j =  6.0;
  if(j < -6.0) j = -6.0;

  // jump magnitude depends on regime and |u0|
  var mag = G_JumpMagBase;
  mag *= (1.0 + 0.5*abs(u0));
  if(G_Regime) mag *= 1.2;

  // apply jump to a random subset of coordinates
  int k;
  for(k=0;k<N;k++)
  {
    if(rng_u01() < 0.25)
    {
      var delta = mag * j;
      var kk = (k+1);
      delta *= (1.0 + 0.01*kk);
      *(y + k) += delta;
    }
  }
}

// -------------------- Partial observability model --------------------
// Map hidden state h (N-dim) to a smaller observed vector (NOBS-dim),
// then add small Gaussian measurement noise.
// Produces:
//   G_xhat: "clean" observation prediction
//   G_xobs: noisy measurement (what an observer would see)
void observe_from_hidden(var t, var* h)
{
  int k;
  for(k=0;k<NOBS;k++)
  {
    // pick two hidden channels per observation
    int j = k;
    int j2 = k + NOBS;

    var a = 0 + *(h + j);
    var b = 0 + *(h + j2);

    // simple linear mix + a small deterministic wobble
    var z = a + 0.30*b;
    z += 0.05*sin(0.17*t + 0.3*(k+1));

    // bound the observation through softsign
    var xhat = z / (1.0 + abs(z));
    *(G_xhat + k) = xhat;

    // measurement noise
    var eta = 0.01 * rng_norm();
    *(G_xobs + k) = xhat + eta;
  }
}

// -------------------- Drift vector field f_theta(t, y, u) --------------------
// This computes G_dh = dy/dt for the current "stage state" stored in G_stage.
// Internally it:
//   - updates u(t)
//   - computes time-varying parameters theta(t)
//   - computes activation = softsign(stage + bias + injected input)
//   - computes a dense nonlinear coupling sum across all neurons (i over j)
//   - adds external forcing, damping, and a weak self-nonlinearity
void f_theta_vec(var t)
{
  int i,j;

  // refresh exogenous inputs at this time
  eval_u(t);

  // compute time-varying parameters
  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  // activation pass (store to float buffer for speed/memory)
  for(j=0;j<N;j++)
  {
    var jj = (j+1);

    var bj = sin(0.17*jj);
    bj *= 0.03;

    var u0 = 0 + *(G_u + 0);
    var inj = u0;
    inj *= 0.02;
    inj *= sin(0.07*jj + 0.19*t);

    var hj = 0 + *(G_stage + j);
    var in = hj + bj + inj;

    *(G_act + j) = (fvar)softsign(in);
  }

  // drift assembly per coordinate i
  for(i=0;i<N;i++)
  {
    var sum = 0;
    var ii = (i+1);

    // dense coupling term (nonlinear "neural" interaction)
    for(j=0;j<N;j++)
    {
      var jj = (j+1);

      var term1 = 0.013;
      term1 *= ii;
      term1 *= jj;

      var term2 = 0.11;
      term2 *= ii;

      var term3 = 0.07;
      term3 *= jj;

      var ang = term1 + term2 - term3;

      var w = sin(ang);

      var ww = w;
      ww *= scale;

      var aj = (var)(*(G_act + j));
      aj = 0 + aj;

      var prod = ww;
      prod *= aj;

      sum += prod;
    }

    // smooth forcing term + fast oscillation + exogenous mix
    var phi = 0.05;
    phi *= ii;

    var forcing = sin(t + phi);
    forcing *= amp;

    var fast = sin(20.0*t + 0.10*ii);
    fast *= 0.08;

    var u1 = 0 + *(G_u + 1);
    var u2 = 0 + *(G_u + 2);

    var exf = u1;
    exf *= 0.10;
    exf += 0.05*u2*sin(0.9*t);

    forcing += fast + exf;

    // damping and weak self-nonlinearity
    var hi = 0 + *(G_stage + i);

    var tmp = hi;
    tmp *= 1.8;
    var self = softsign(tmp);
    self *= 0.15;

    // stiffen every 8th coordinate to create multiscale dynamics
    var di = damp;
    if((i & 7) == 0)
      di *= stiff;

    var ghi = hi;
    ghi *= di;

    *(G_dh + i) = sum + ghi + forcing + self;
  }
}

// -------------------- Explicit RK step for drift only --------------------
// Given an explicit RK tableau (s, A, b, c), advance the ODE drift part:
//   yout = y + h * sum_i b_i * k_i
// where k_i = f(t + c_i*h, y + h * sum_{j<i} A_{i,j} k_j)
void rk_step_vec(int s, float* Atri, float* b, float* c,
                 var t, var* y, var h, var* yout)
{
  int i,j,k;

  // trivial fallback (should not happen in normal use)
  if(s < 1)
  {
    for(k=0;k<N;k++) *(yout + k) = *(y + k);
    return;
  }
  if(s > MAXS) s = MAXS;

  // clear stage derivative storage
  for(i=0;i<s;i++)
    for(k=0;k<N;k++)
      G_K[i*N + k] = (fvar)0;

  // stage loop
  for(i=0;i<s;i++)
  {
    // compute stage state: y_stage = y + h * sum_{j<i} A(i,j)*k_j
    for(k=0;k<N;k++)
    {
      var acc = 0 + *(y + k);

      for(j=0;j<i;j++)
      {
        var kij = (var)G_K[j*N + k];
        kij = 0 + kij;

        var aij = getA(Atri,i,j);

        var term = kij;
        term *= aij;
        term *= h;

        acc += term;
      }

      *(G_stage + k) = acc;
    }

    // evaluate drift at t + c_i*h
    var ci = (var)c[i];
    var dt = ci;
    dt *= h;

    f_theta_vec(t + dt);

    // store k_i = f(...)
    for(k=0;k<N;k++)
      G_K[i*N + k] = (fvar)(*(G_dh + k));
  }

  // combine stages into final yout
  for(k=0;k<N;k++)
  {
    var acc = 0 + *(y + k);

    for(i=0;i<s;i++)
    {
      var ki = (var)G_K[i*N + k];
      ki = 0 + ki;

      var bi = (var)b[i];

      var term = ki;
      term *= bi;
      term *= h;

      acc += term;
    }

    *(yout + k) = acc;
  }
}

// -------------------- Norm utilities --------------------
// RMS difference between two N-dim vectors
var err_norm(var* a, var* b)
{
  int k;
  var s = 0;
  for(k=0;k<N;k++)
  {
    var d = *(a + k) - *(b + k);
    s += d*d;
  }
  return sqrt(s / (var)N);
}

// RMS norm of a vector of length n
var rms_norm(var* a, int n)
{
  int k;
  var s = 0;
  for(k=0;k<n;k++)
  {
    var x = *(a + k);
    s += x*x;
  }
  return sqrt(s / (var)n);
}

// -------------------- Stage permutation -> equivalent tableau --------------------
// Given a method with (A,b,c) and a permutation p[] of stages,
// construct a new method (Aout,bout,cout) that corresponds to that permutation.
//
// Intuition:
//   - Reordering stages changes the evaluation sequence.
//   - If you correctly permute the Butcher coefficients, you can obtain an
//     equivalent method (same one-step map under exact arithmetic).
//
// This is the "safe refactor" path used for method B in main().
void permute_method_into(int s, float* A, float* b, float* c,
                         int* p,
                         int* sOut, float* Aout, float* bout, float* cout)
{
  int i,j;

  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(i=0;i<TRI_A;i++) Aout[i] = 0;
  for(i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  // permute b and c
  for(i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  // permute A lower-triangular entries consistently
  for(i=0;i<*sOut;i++)
    for(j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

// -------------------- Base 4-stage explicit RK with "swappable" middle stages --------------------
// This is a custom 4-stage explicit RK tableau (not classical RK4).
// The middle stages (1 and 2) can be swapped; the file then tests whether
// the swap is done correctly (B) or incorrectly (C).
void make_swappable_rk4(int* s, float* Atri, float* b, float* c)
{
  zeroTableau(s, Atri, b, c);
  *s = 4;

  c[0] = 0.0;
  c[1] = 0.30;
  c[2] = 0.80;
  c[3] = 1.00;

  setA(Atri, 1, 0, 0.30);
  setA(Atri, 2, 0, 0.80);

  setA(Atri, 3, 1, 0.50);
  setA(Atri, 3, 2, 0.50);

  b[0] = 0.10;
  b[1] = 0.20;
  b[2] = 0.30;
  b[3] = 0.40;
}

// -------------------- Diffusion increment (SDE part) --------------------
// After accepting a drift step, apply a stochastic increment:
//   y <- y + sigma(t,y) * sqrt(dt) * N(0,1)
// Here sigma depends on u(t), regime, and |y_k| (mild state-dependent scaling).
void apply_diffusion(var t, var dt, var* y)
{
  eval_u(t);

  var u0 = 0 + *(G_u + 0);
  var u2 = 0 + *(G_u + 2);

  // diffusion scale depends on inputs and regime
  var sigma = G_SigmaBase;
  sigma *= (1.0 + 0.6*abs(u0) + 0.3*abs(u2));
  if(G_Regime) sigma *= 1.25;

  var sdt = sqrt(dt);

  int k;
  for(k=0;k<N;k++)
  {
    var hk = 0 + *(y + k);

    // mild multiplicative factor increasing with |y_k|
    var ahk = abs(hk);
    ahk *= 0.15;
    var gmul = 1.0 + ahk;
    gmul = 0 + gmul;

    var xi = rng_norm();

    var inc = sigma;
    inc = inc * gmul;
    inc = inc * sdt;
    inc = inc * xi;

    *(y + k) += inc;
  }
}

// -------------------- Adaptive integrator for hybrid SDE/ODE --------------------
// Integrates from t0 to tEnd starting at y0.
//
// Core loop:
//   - Choose dt (bounded by DT_MIN and G_DtMax, clipped to end time)
//   - Compute drift step with RK: full step and two half steps
//   - Estimate error e = RMS(half - full)
//   - If accepted:
//       * commit y <- half
//       * update regime
//       * apply diffusion noise
//       * apply jumps
//       * produce observations
//       * check blow-up and steady-state termination
//       * update dt via PI-like scaling (here simple power law)
//     else:
//       * reject and halve dt
//
// Outputs:
//   yOut: final state
//   stepsOut / rejectsOut: accepted/rejected counts
//   dtMinOut: minimum dt observed during the run
//   stopReasonOut: why we terminated (see printed table in main)
void integrate_adaptive_sde(int s, float* A, float* b, float* c,
                            var t0, var* y0, var tEnd,
                            unsigned int seed,
                            var* yOut,
                            int* stepsOut, int* rejectsOut,
                            var* dtMinOut,
                            int* stopReasonOut)
{
  // seed the custom RNG and reset regime
  rng_seed(seed);
  G_Regime = 0;

  int k;
  var t = t0;
  var dt = DT_INIT;
  var dtMinSeen = 1e9;

  // initialize current state
  for(k=0;k<N;k++)
    *(G_ycur + k) = *(y0 + k);

  int steps = 0;
  int rejects = 0;
  int iters = 0;
  int dtMinHits = 0;
  int steadyCount = 0;

  int stopReason = 0;

  while(t < tEnd)
  {
    // occasionally check memory and relax constraints if needed
    if((iters & 255) == 0) shed_if_needed();
    iters++;

    // enforce dt bounds and do not step past tEnd
    if(dt < DT_MIN) dt = DT_MIN;
    if(dt > G_DtMax) dt = G_DtMax;
    if(t + dt > tEnd) dt = tEnd - t;

    // track smallest dt seen and how often we hit dt_min
    if(dt < dtMinSeen) dtMinSeen = dt;
    if(dt <= DT_MIN) dtMinHits++;

    // if we are stuck at dt_min too long, abort
    if(dtMinHits > G_DtMinHitMax)
    {
      stopReason = 4;
      break;
    }

    // drift integration: one full step
    rk_step_vec(s, A, b, c, t, G_ycur, dt, G_full);

    // drift integration: two half steps (step-doubling)
    var halfdt = dt;
    halfdt *= 0.5;

    rk_step_vec(s, A, b, c, t,        G_ycur, halfdt, G_mid);
    rk_step_vec(s, A, b, c, t+halfdt, G_mid,  halfdt, G_half);

    // error estimate from step-doubling
    var e = err_norm(G_half, G_full);

    // accept if error is within tolerance or we cannot reduce dt further
    if(e <= G_Tol || dt <= DT_MIN)
    {
      // commit state using the more accurate half-step result
      for(k=0;k<N;k++)
        *(G_ycur + k) = *(G_half + k);

      // hybrid / stochastic effects applied after acceptance
      eval_u(t);
      update_regime(t, dt);

      apply_diffusion(t, dt, G_ycur);
      apply_jump_if_needed(t, dt, G_ycur);

      // update observations at the end of this accepted step
      observe_from_hidden(t + dt, G_ycur);

      // blow-up guards (state or observations)
      var yr = rms_norm(G_ycur, N);
      var xr = rms_norm(G_xobs, NOBS);
      if(yr > G_StateLimitRMS || xr > G_ObsLimitRMS)
      {
        stopReason = 2;
        t += dt;
        steps++;
        break;
      }

      // steady-state detection using drift magnitude at new state
      for(k=0;k<N;k++) *(G_stage + k) = *(G_ycur + k);
      f_theta_vec(t + dt);
      var dr = rms_norm(G_dh, N);

      if(dr < G_SteadyEps) steadyCount++;
      else steadyCount = 0;

      // advance time and increment accepted step count
      t += dt;
      steps++;

      // terminate if we've stayed near-steady for long enough
      if(steadyCount >= G_SteadyNeed)
      {
        stopReason = 1;
        break;
      }

      // adapt dt upward/downward based on error (bounded multiplier)
      if(e < 1e-16)
      {
        dt *= 2.0;
      }
      else
      {
        var fac = pow(G_Tol / e, 0.2); // exponent ~ 1/(order+1)
        fac *= 0.9;                   // safety factor
        if(fac < 0.5) fac = 0.5;
        if(fac > 2.0) fac = 2.0;
        dt *= fac;
      }
    }
    else
    {
      // reject step and reduce dt
      rejects++;
      if(rejects > G_MaxRejects)
      {
        stopReason = 3;
        break;
      }
      dt *= 0.5;
    }
  }

  // return final state + run statistics
  for(k=0;k<N;k++)
    *(yOut + k) = *(G_ycur + k);

  *stepsOut = steps;
  *rejectsOut = rejects;
  *dtMinOut = dtMinSeen;
  *stopReasonOut = stopReason;
}

// -------------------- Entry point / experiment harness --------------------
// Builds three RK methods (reference, safe-permuted, buggy-permuted),
// integrates each with the same initial state and RNG seed, then reports:
//
//   - accepted/rejected steps
//   - minimum dt reached
//   - stop reason code
//   - RMS difference from reference output
//
// This lets you validate that a stage-refactor is mathematically equivalent (B)
// or detect a wrong refactor (C).
function main()
{
  // reset runtime knobs and budgets
  G_ShedTriggered = 0;
  G_Tol   = TOL;
  G_DtMax = DT_MAX;

  G_MaxRejects  = 2000000;
  G_DtMinHitMax = 500000;

  G_SteadyEps     = 1e-7;
  G_SteadyNeed    = 250;
  G_StateLimitRMS = 1e6;
  G_ObsLimitRMS   = 1e6;

  // reset stochastic/hybrid baselines
  G_SigmaBase       = 0.02;
  G_JumpLambdaBase  = 0.20;
  G_JumpMagBase     = 0.20;
  G_RegRate01       = 0.15;
  G_RegRate10       = 0.10;

  // A) reference RK tableau
  int   S = 0;
  float A[TRI_A];
  float b[MAXS];
  float c[MAXS];
  make_swappable_rk4(&S, A, b, c);

  // B) safe permutation: properly permute (A,b,c) into an equivalent method
  int   SB = 0;
  float AB[TRI_A];
  float bB[MAXS];
  float cB[MAXS];

  int p[MAXS];
  p[0]=0; p[1]=2; p[2]=1; p[3]=3;
  permute_method_into(S, A, b, c, p, &SB, AB, bB, cB);

  // C) buggy permutation: permute b/c but keep A unchanged (not equivalent in general)
  int   SW = S;
  float AW[TRI_A];
  float bW[MAXS];
  float cW[MAXS];

  int i;
  for(i=0;i<TRI_A;i++) AW[i] = A[i];
  for(i=0;i<MAXS;i++)  { bW[i]=0; cW[i]=0; }
  for(i=0;i<SW;i++)
  {
    int pi = p[i];
    bW[i] = b[pi];
    cW[i] = c[pi];
  }

  // initial condition for the hidden state y(0)
  int k;
  for(k=0;k<N;k++)
    *(Y0 + k) = 0.2*sin(0.31*(k+1)) + 0.15*cos(0.17*(k+1))
             + 0.05*sin(0.07*(k+1)*(k+1));

  var t0 = 0.0;

  // ----------- Seed generation for the custom RNG -----------
  // We build a seed from current wall-clock time with millisecond mixing so that
  // multiple launches close together still get different seeds.
  var now = wdate(NOW);                      // days since epoch with fractional day
  unsigned int MySeed = (unsigned int)utm(now); // seconds since 1970

  // mix in milliseconds extracted from fractional day
  var frac = now - floor(now);
  var ms = frac * 86400000.0; // 24*60*60*1000
  MySeed = 1664525u * MySeed + (unsigned int)ms + 1013904223u;

  if(MySeed == 0) MySeed = 1;
  printf("\nMySeed=%u", MySeed);
  // ---------------------------------------------------------

  // run stats for the three methods
  int stepsRef=0,  rejRef=0;   var dtMinRef=0;  int stopRef=0;
  int stepsSafe=0, rejSafe=0;  var dtMinSafe=0; int stopSafe=0;
  int stepsBug=0,  rejBug=0;   var dtMinBug=0;  int stopBug=0;

  // integrate using identical initial state + identical RNG seed
  integrate_adaptive_sde(S,  A,  b,  c,  t0, Y0, T_END, MySeed, Yref,
                         &stepsRef,  &rejRef,  &dtMinRef,  &stopRef);

  integrate_adaptive_sde(SB, AB, bB, cB, t0, Y0, T_END, MySeed, Ysafe,
                         &stepsSafe, &rejSafe, &dtMinSafe, &stopSafe);

  integrate_adaptive_sde(SW, AW, bW, cW, t0, Y0, T_END, MySeed, Ybug,
                         &stepsBug,  &rejBug,  &dtMinBug,  &stopBug);

  // compare final states to reference
  var diffSafe = err_norm(Ysafe, Yref);
  var diffBug  = err_norm(Ybug,  Yref);

  // summary prints
  printf("\nNeural SDE / Hybrid ODE integration (N=%d) to T=%g  baseTol=%g", N, T_END, TOL);
  printf("\nAdds: u(t), theta(t), diffusion noise, regime switching, jumps, partial observation, multiscale/stiff modes.");
  printf("\nRuntime knobs end-state: G_Tol=%g  G_DtMax=%g  shed=%d  mem=%d MB",
         G_Tol, G_DtMax, G_ShedTriggered, memMB());

  printf("\n\nA) Reference tableau:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsRef, rejRef, dtMinRef, stopRef);

  printf("\nB) Theorem-382A safe stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsSafe, rejSafe, dtMinSafe, stopSafe);
  printf("\n   diff to reference (RMS) = %.3e", diffSafe);

  printf("\nC) Buggy stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsBug, rejBug, dtMinBug, stopBug);
  printf("\n   diff to reference (RMS) = %.3e", diffBug);

  printf("\n\nStop codes: 0=tEnd, 1=steady-state, 2=event(blow-up), 3=rejectBudget, 4=dtMinBudget\n");

  quit("done");
}

The Clockwork Storm Weaver (64Bit) [Re: TipmyPip] #489141
1 hour ago
1 hour ago
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
Code
// demo_neuralODE_complex_382A_adaptive_main.cpp (Zorro64 C++)

#include <zorro.h>   // required for C++ strategies in Zorro64
#include <math.h>

#define MAXS 4
#define TRI_A 6

// -------------------- Dimensions --------------------
#define N         48
#define K_SIZE    192   // MUST be literal: MAXS*N = 4*48 = 192

#define U_DIM     3     // exogenous input dimension u(t)
#define NOBS      12    // number of observed channels (partial observability)

// -------------------- Base "difficulty knobs" --------------------
#define T_END     20.0
#define DT_INIT   0.20
#define DT_MIN    1e-6
#define DT_MAX    0.50
#define TOL       1e-6

// -------------------- Shedding (memory-pressure) --------------------
#define MEM_BUDGET_MB 2000
static int G_ShedTriggered = 0;

// runtime knobs (can be adjusted by shedding)
static var G_Tol   = TOL;
static var G_DtMax = DT_MAX;

// safety budgets (NOT a max-steps cap)
static int G_MaxRejects     = 2000000;
static int G_DtMinHitMax    = 500000;

// dynamics-driven termination knobs
static var G_SteadyEps      = 1e-7;
static int G_SteadyNeed     = 250;
static var G_StateLimitRMS  = 1e6;
static var G_ObsLimitRMS    = 1e6;

// -------------------- SDE / hybrid knobs --------------------
static var G_SigmaBase      = 0.02;
static var G_JumpLambdaBase = 0.20;
static var G_JumpMagBase    = 0.20;
static var G_RegRate01      = 0.15;
static var G_RegRate10      = 0.10;

// -------------------- TIGHT_MEM types --------------------
#define TIGHT_MEM 1
#ifdef TIGHT_MEM
typedef float fvar;   // 4B
#else
typedef var fvar;     // 8B
#endif

// -------------------- Global working buffers (no malloc) --------------------
// In C++ globals are not auto-zeroed unless explicitly initialized in some cases,
// so we initialize all arrays to 0 to match lite-C expectations.
static fvar G_K[K_SIZE]     = { 0 };
static fvar G_act[N]        = { 0 };

static var  G_stage[N]      = { 0 };
static var  G_dh[N]         = { 0 };

// integrator work
static var  G_ycur[N]       = { 0 };
static var  G_full[N]       = { 0 };
static var  G_half[N]       = { 0 };
static var  G_mid[N]        = { 0 };

// initial/final buffers
static var  Y0[N]           = { 0 };
static var  Yref[N]         = { 0 };
static var  Ysafe[N]        = { 0 };
static var  Ybug[N]         = { 0 };

// partial observability buffers
static var  G_xhat[NOBS]    = { 0 };
static var  G_xobs[NOBS]    = { 0 };

// exogenous input u(t)
static var  G_u[U_DIM]      = { 0 };

// -------------------- RNG (deterministic, local; no Zorro data) --------------------
static unsigned int G_Rng = 1;

static void rng_seed(unsigned int s)
{
  if(s == 0) s = 1;
  G_Rng = s;
}

static var rng_u01()
{
  G_Rng = 1664525u * G_Rng + 1013904223u;
  unsigned int x = (G_Rng >> 8) & 0x00FFFFFFu;
  return ((var)x + 1.0) / 16777217.0; // (0,1)
}

static var rng_norm()
{
  var u1 = rng_u01();
  var u2 = rng_u01();
  var r  = sqrt(-2.0 * log(u1));
  var th = 6.283185307179586 * u2;
  return r * cos(th);
}

// -------------------- Utility --------------------
static int memMB()
{
  var m = memory(0);
  if(!(m > 0)) return 0;
  return (int)(m/(1024*1024));
}

static void shed_if_needed()
{
  if(G_ShedTriggered) return;

  int mb = memMB();
  if(mb >= MEM_BUDGET_MB)
  {
    G_ShedTriggered = 1;

    G_Tol *= 10.0;

    G_DtMax *= 0.5;
    if(G_DtMax < DT_MIN) G_DtMax = DT_MIN;

    G_MaxRejects  = (int)(G_MaxRejects/2);
    if(G_MaxRejects < 10000) G_MaxRejects = 10000;

    G_DtMinHitMax = (int)(G_DtMinHitMax/2);
    if(G_DtMinHitMax < 10000) G_DtMinHitMax = 10000;
  }
}

// -------------------- Triangular indexing for explicit A (lower triangle only) --------------------
static int Aidx(int i, int j)
{
  return (i*(i-1))/2 + j;
}

static var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

static void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

static void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// -------------------- Activation --------------------
static var softsign(var z)
{
  return z / (1.0 + fabs(z));
}

// -------------------- Exogenous input u(t) --------------------
static int G_Regime = 0; // 0 or 1

static void eval_u(var t)
{
  var u0 = sin(0.11*t);
  u0 *= 0.8;
  if(G_Regime) u0 += 0.35;

  var u1 = cos(0.37*t);
  u1 *= 0.6;

  var u2 = sin(2.30*t);
  u2 *= 0.3;

  G_u[0] = u0;
  G_u[1] = u1;
  G_u[2] = u2;
}

// -------------------- Time-varying parameters theta(t) --------------------
static void theta_t(var t, var* dampOut, var* ampOut, var* scaleOut, var* stiffOut)
{
  var damp  = -0.40;
  var amp   =  0.35;
  var scale =  0.06;

  var drift = sin(0.03*t);
  drift *= 0.25;

  var u0 = G_u[0];
  var u1 = G_u[1];

  if(G_Regime)
  {
    amp   *= 1.15;
    scale *= 1.10;
    damp  *= 1.05;
  }

  amp   *= (1.0 + 0.30*drift + 0.15*u0);
  scale *= (1.0 + 0.20*drift + 0.10*u1);
  damp  *= (1.0 + 0.15*drift - 0.10*u0);

  var stiff = 2.8;

  *dampOut  = damp;
  *ampOut   = amp;
  *scaleOut = scale;
  *stiffOut = stiff;
}

// -------------------- Hybrid regime switching (Markov) --------------------
static void update_regime(var t, var dt)
{
  var u0 = G_u[0];
  var u1 = G_u[1];

  var stress = fabs(u0) + 0.5*fabs(u1);

  var r01 = G_RegRate01 * (1.0 + 0.8*stress);
  var r10 = G_RegRate10 * (1.0 + 0.5*stress);

  var p;
  if(G_Regime == 0)
  {
    p = 1.0 - exp(-r01*dt);
    if(rng_u01() < p) G_Regime = 1;
  }
  else
  {
    p = 1.0 - exp(-r10*dt);
    if(rng_u01() < p) G_Regime = 0;
  }
}

// -------------------- Poisson jumps (shock events) --------------------
static void apply_jump_if_needed(var t, var dt, var* y)
{
  var u0  = G_u[0];
  var lam = G_JumpLambdaBase * (1.0 + 0.7*fabs(u0));
  if(G_Regime) lam *= 1.25;

  var p = 1.0 - exp(-lam*dt);
  if(rng_u01() >= p) return;

  var uu = rng_u01() - 0.5;
  var j  = tan(3.141592653589793 * uu);

  if(j >  6.0) j =  6.0;
  if(j < -6.0) j = -6.0;

  var mag = G_JumpMagBase;
  mag *= (1.0 + 0.5*fabs(u0));
  if(G_Regime) mag *= 1.2;

  int k;
  for(k=0;k<N;k++)
  {
    if(rng_u01() < 0.25)
    {
      var delta = mag * j;
      var kk = (k+1);
      delta *= (1.0 + 0.01*kk);
      y[k] += delta;
    }
  }
}

// -------------------- Partial observability --------------------
static void observe_from_hidden(var t, var* h)
{
  int k;
  for(k=0;k<NOBS;k++)
  {
    int j  = k;
    int j2 = k + NOBS;

    var a = h[j];
    var b = h[j2];

    var z = a + 0.30*b;
    z += 0.05*sin(0.17*t + 0.3*(k+1));

    var xhat = z / (1.0 + fabs(z));
    G_xhat[k] = xhat;

    var eta = 0.01 * rng_norm();
    G_xobs[k] = xhat + eta;
  }
}

// -------------------- Drift vector field f_theta(t,y,u) --------------------
static void f_theta_vec(var t)
{
  int i,j;

  eval_u(t);

  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  for(j=0;j<N;j++)
  {
    var jj = (j+1);

    var bj = sin(0.17*jj);
    bj *= 0.03;

    var u0 = G_u[0];
    var inj = u0;
    inj *= 0.02;
    inj *= sin(0.07*jj + 0.19*t);

    var hj = G_stage[j];
    var in = hj + bj + inj;

    G_act[j] = (fvar)softsign(in);
  }

  for(i=0;i<N;i++)
  {
    var sum = 0;
    var ii = (i+1);

    for(j=0;j<N;j++)
    {
      var jj = (j+1);

      var term1 = 0.013;
      term1 *= ii;
      term1 *= jj;

      var term2 = 0.11;
      term2 *= ii;

      var term3 = 0.07;
      term3 *= jj;

      var ang = term1 + term2 - term3;

      var w  = sin(ang);
      var ww = w * scale;

      var aj = (var)G_act[j];

      sum += ww * aj;
    }

    var phi = 0.05 * ii;

    var forcing = sin(t + phi) * amp;

    var fast = sin(20.0*t + 0.10*ii);
    fast *= 0.08;

    var u1 = G_u[1];
    var u2 = G_u[2];

    var exf = u1 * 0.10;
    exf += 0.05*u2*sin(0.9*t);

    forcing += fast + exf;

    var hi = G_stage[i];

    var tmp = hi * 1.8;
    var self = softsign(tmp) * 0.15;

    var di = damp;
    if((i & 7) == 0)
      di *= stiff;

    var ghi = hi * di;

    G_dh[i] = sum + ghi + forcing + self;
  }
}

// -------------------- Vector RK step (explicit) for DRIFT ONLY --------------------
static void rk_step_vec(int s, float* Atri, float* b, float* c,
                        var t, var* y, var h, var* yout)
{
  int i,j,k;

  if(s < 1)
  {
    for(k=0;k<N;k++) yout[k] = y[k];
    return;
  }
  if(s > MAXS) s = MAXS;

  for(i=0;i<s;i++)
    for(k=0;k<N;k++)
      G_K[i*N + k] = (fvar)0;

  for(i=0;i<s;i++)
  {
    for(k=0;k<N;k++)
    {
      var acc = y[k];

      for(j=0;j<i;j++)
      {
        var kij = (var)G_K[j*N + k];
        var aij = getA(Atri,i,j);
        acc += kij * aij * h;
      }

      G_stage[k] = acc;
    }

    var dt = ((var)c[i]) * h;

    f_theta_vec(t + dt);

    for(k=0;k<N;k++)
      G_K[i*N + k] = (fvar)G_dh[k];
  }

  for(k=0;k<N;k++)
  {
    var acc = y[k];

    for(i=0;i<s;i++)
    {
      var ki = (var)G_K[i*N + k];
      var bi = (var)b[i];
      acc += ki * bi * h;
    }

    yout[k] = acc;
  }
}

// -------------------- Norms --------------------
static var err_norm(var* a, var* b)
{
  int k;
  var s = 0;
  for(k=0;k<N;k++)
  {
    var d = a[k] - b[k];
    s += d*d;
  }
  return sqrt(s / (var)N);
}

static var rms_norm(var* a, int n)
{
  int k;
  var s = 0;
  for(k=0;k<n;k++)
  {
    var x = a[k];
    s += x*x;
  }
  return sqrt(s / (var)n);
}

// -------------------- Stage permutation -> equivalent tableau --------------------
static void permute_method_into(int s, float* A, float* b, float* c,
                                int* p,
                                int* sOut, float* Aout, float* bout, float* cout)
{
  int i,j;

  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(i=0;i<TRI_A;i++) Aout[i] = 0;
  for(i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  for(i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  for(i=0;i<*sOut;i++)
    for(j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

// -------------------- A 4-stage explicit RK with swappable middle stages --------------------
static void make_swappable_rk4(int* s, float* Atri, float* b, float* c)
{
  zeroTableau(s, Atri, b, c);
  *s = 4;

  c[0] = 0.0f;
  c[1] = 0.30f;
  c[2] = 0.80f;
  c[3] = 1.00f;

  setA(Atri, 1, 0, 0.30);
  setA(Atri, 2, 0, 0.80);

  setA(Atri, 3, 1, 0.50);
  setA(Atri, 3, 2, 0.50);

  b[0] = 0.10f;
  b[1] = 0.20f;
  b[2] = 0.30f;
  b[3] = 0.40f;
}

// -------------------- Diffusion (SDE) increment --------------------
static void apply_diffusion(var t, var dt, var* y)
{
  eval_u(t);

  var u0 = G_u[0];
  var u2 = G_u[2];

  var sigma = G_SigmaBase;
  sigma *= (1.0 + 0.6*fabs(u0) + 0.3*fabs(u2));
  if(G_Regime) sigma *= 1.25;

  var sdt = sqrt(dt);

  int k;
  for(k=0;k<N;k++)
  {
    var hk = y[k];

    var gmul = 1.0 + (fabs(hk) * 0.15);

    var xi = rng_norm();

    y[k] += sigma * gmul * sdt * xi;
  }
}

// -------------------- Adaptive integrator --------------------
static void integrate_adaptive_sde(int s, float* A, float* b, float* c,
                                   var t0, var* y0, var tEnd,
                                   unsigned int seed,
                                   var* yOut,
                                   int* stepsOut, int* rejectsOut,
                                   var* dtMinOut,
                                   int* stopReasonOut)
{
  rng_seed(seed);
  G_Regime = 0;

  int k;
  var t = t0;
  var dt = DT_INIT;
  var dtMinSeen = 1e9;

  for(k=0;k<N;k++)
    G_ycur[k] = y0[k];

  int steps = 0;
  int rejects = 0;
  int iters = 0;
  int dtMinHits = 0;
  int steadyCount = 0;

  int stopReason = 0;

  while(t < tEnd)
  {
    if((iters & 255) == 0) shed_if_needed();
    iters++;

    if(dt < DT_MIN) dt = DT_MIN;
    if(dt > G_DtMax) dt = G_DtMax;
    if(t + dt > tEnd) dt = tEnd - t;

    if(dt < dtMinSeen) dtMinSeen = dt;
    if(dt <= DT_MIN) dtMinHits++;

    if(dtMinHits > G_DtMinHitMax)
    {
      stopReason = 4;
      break;
    }

    rk_step_vec(s, A, b, c, t, G_ycur, dt, G_full);

    var halfdt = dt * 0.5;

    rk_step_vec(s, A, b, c, t,        G_ycur, halfdt, G_mid);
    rk_step_vec(s, A, b, c, t+halfdt, G_mid,  halfdt, G_half);

    var e = err_norm(G_half, G_full);

    if(e <= G_Tol || dt <= DT_MIN)
    {
      for(k=0;k<N;k++)
        G_ycur[k] = G_half[k];

      eval_u(t);
      update_regime(t, dt);

      apply_diffusion(t, dt, G_ycur);
      apply_jump_if_needed(t, dt, G_ycur);

      observe_from_hidden(t + dt, G_ycur);

      var yr = rms_norm(G_ycur, N);
      var xr = rms_norm(G_xobs, NOBS);
      if(yr > G_StateLimitRMS || xr > G_ObsLimitRMS)
      {
        stopReason = 2;
        t += dt;
        steps++;
        break;
      }

      for(k=0;k<N;k++) G_stage[k] = G_ycur[k];
      f_theta_vec(t + dt);
      var dr = rms_norm(G_dh, N);

      if(dr < G_SteadyEps) steadyCount++;
      else steadyCount = 0;

      t += dt;
      steps++;

      if(steadyCount >= G_SteadyNeed)
      {
        stopReason = 1;
        break;
      }

      if(e < 1e-16)
      {
        dt *= 2.0;
      }
      else
      {
        var fac = pow(G_Tol / e, 0.2) * 0.9;
        if(fac < 0.5) fac = 0.5;
        if(fac > 2.0) fac = 2.0;
        dt *= fac;
      }
    }
    else
    {
      rejects++;
      if(rejects > G_MaxRejects)
      {
        stopReason = 3;
        break;
      }
      dt *= 0.5;
    }
  }

  for(k=0;k<N;k++)
    yOut[k] = G_ycur[k];

  *stepsOut = steps;
  *rejectsOut = rejects;
  *dtMinOut = dtMinSeen;
  *stopReasonOut = stopReason;
}

// -------------------- Entry point (runs once) --------------------
DLLFUNC int main()
{
  // reset knobs
  G_ShedTriggered = 0;
  G_Tol   = TOL;
  G_DtMax = DT_MAX;

  G_MaxRejects  = 2000000;
  G_DtMinHitMax = 500000;

  G_SteadyEps     = 1e-7;
  G_SteadyNeed    = 250;
  G_StateLimitRMS = 1e6;
  G_ObsLimitRMS   = 1e6;

  G_SigmaBase       = 0.02;
  G_JumpLambdaBase  = 0.20;
  G_JumpMagBase     = 0.20;
  G_RegRate01       = 0.15;
  G_RegRate10       = 0.10;

  int   S = 0;
  float A[TRI_A];
  float b[MAXS];
  float c[MAXS];
  make_swappable_rk4(&S, A, b, c);

  int   SB = 0;
  float AB[TRI_A];
  float bB[MAXS];
  float cB[MAXS];

  int p[MAXS];
  p[0]=0; p[1]=2; p[2]=1; p[3]=3;
  permute_method_into(S, A, b, c, p, &SB, AB, bB, cB);

  int   SW = S;
  float AW[TRI_A];
  float bW[MAXS];
  float cW[MAXS];

  int i;
  for(i=0;i<TRI_A;i++) AW[i] = A[i];
  for(i=0;i<MAXS;i++)  { bW[i]=0; cW[i]=0; }
  for(i=0;i<SW;i++)
  {
    int pi = p[i];
    bW[i] = b[pi];
    cW[i] = c[pi];
  }

  int k;
  for(k=0;k<N;k++)
    Y0[k] = 0.2*sin(0.31*(k+1)) + 0.15*cos(0.17*(k+1))
          + 0.05*sin(0.07*(k+1)*(k+1));

  var t0 = 0.0;

  // ----------- seed for custom RNG -----------
  var now = wdate(NOW);
  unsigned int MySeed = (unsigned int)utm(now);

  var frac = now - floor(now);
  var ms = frac * 86400000.0;
  MySeed = 1664525u * MySeed + (unsigned int)ms + 1013904223u;

  if(MySeed == 0) MySeed = 1;
  printf("\nMySeed=%u", MySeed);
  // ------------------------------------------

  int stepsRef=0,  rejRef=0;   var dtMinRef=0;  int stopRef=0;
  int stepsSafe=0, rejSafe=0;  var dtMinSafe=0; int stopSafe=0;
  int stepsBug=0,  rejBug=0;   var dtMinBug=0;  int stopBug=0;

  integrate_adaptive_sde(S,  A,  b,  c,  t0, Y0, T_END, MySeed, Yref,
                         &stepsRef,  &rejRef,  &dtMinRef,  &stopRef);

  integrate_adaptive_sde(SB, AB, bB, cB, t0, Y0, T_END, MySeed, Ysafe,
                         &stepsSafe, &rejSafe, &dtMinSafe, &stopSafe);

  integrate_adaptive_sde(SW, AW, bW, cW, t0, Y0, T_END, MySeed, Ybug,
                         &stepsBug,  &rejBug,  &dtMinBug,  &stopBug);

  var diffSafe = err_norm(Ysafe, Yref);
  var diffBug  = err_norm(Ybug,  Yref);

  printf("\nNeural SDE / Hybrid ODE integration (N=%d) to T=%g  baseTol=%g", N, T_END, TOL);
  printf("\nAdds: u(t), theta(t), diffusion noise, regime switching, jumps, partial observation, multiscale/stiff modes.");
  printf("\nRuntime knobs end-state: G_Tol=%g  G_DtMax=%g  shed=%d  mem=%d MB",
         G_Tol, G_DtMax, G_ShedTriggered, memMB());

  printf("\n\nA) Reference tableau:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsRef, rejRef, dtMinRef, stopRef);

  printf("\nB) Theorem-382A safe stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsSafe, rejSafe, dtMinSafe, stopSafe);
  printf("\n   diff to reference (RMS) = %.3e", diffSafe);

  printf("\nC) Buggy stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsBug, rejBug, dtMinBug, stopBug);
  printf("\n   diff to reference (RMS) = %.3e", diffBug);

  printf("\n\nStop codes: 0=tEnd, 1=steady-state, 2=event(blow-up), 3=rejectBudget, 4=dtMinBudget\n");

  return quit("done");
}

The Clockwork Storm Weaver (OpenCL) [Re: TipmyPip] #489142
52 minutes ago
52 minutes ago
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
Code
// demo_neuralODE_complex_382A_adaptive_main_opencl_nosdk.cpp (Zorro64 OpenCL C++)
//
// Headerless OpenCL dynamic loader (no CL/cl.h, no OpenCL.lib).
// Prints numerical outputs proving GPU drift is computed, and reports GPU usage.
// If OpenCL fails at any point -> CPU fallback.

#include <zorro.h>
#include <math.h>
#include <windows.h>

#define USE_OPENCL 1
#define CL_DEBUG  1   // <-- set to 0 if you want quiet mode

#define MAXS 4
#define TRI_A 6

// -------------------- Dimensions --------------------
#define N         48
#define K_SIZE    192   // MUST be literal: MAXS*N = 4*48 = 192

#define U_DIM     3
#define NOBS      12

// -------------------- Base knobs --------------------
#define T_END     20.0
#define DT_INIT   0.20
#define DT_MIN    1e-6
#define DT_MAX    0.50
#define TOL       1e-6

// -------------------- Shedding --------------------
#define MEM_BUDGET_MB 2000
static int G_ShedTriggered = 0;

// runtime knobs
static var G_Tol   = TOL;
static var G_DtMax = DT_MAX;

// budgets
static int G_MaxRejects  = 2000000;
static int G_DtMinHitMax = 500000;

// termination knobs
static var G_SteadyEps     = 1e-7;
static int G_SteadyNeed    = 250;
static var G_StateLimitRMS = 1e6;
static var G_ObsLimitRMS   = 1e6;

// SDE / hybrid knobs
static var G_SigmaBase      = 0.02;
static var G_JumpLambdaBase = 0.20;
static var G_JumpMagBase    = 0.20;
static var G_RegRate01      = 0.15;
static var G_RegRate10      = 0.10;

// -------------------- TIGHT_MEM types --------------------
#define TIGHT_MEM 1
#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef var fvar;
#endif

// -------------------- Global buffers --------------------
static fvar G_K[K_SIZE]  = { 0 };
static var  G_stage[N]   = { 0 };
static var  G_dh[N]      = { 0 };

static var  G_ycur[N]    = { 0 };
static var  G_full[N]    = { 0 };
static var  G_half[N]    = { 0 };
static var  G_mid[N]     = { 0 };

static var  Y0[N]        = { 0 };
static var  Yref[N]      = { 0 };
static var  Ysafe[N]     = { 0 };
static var  Ybug[N]      = { 0 };

static var  G_xhat[NOBS] = { 0 };
static var  G_xobs[NOBS] = { 0 };

static var  G_u[U_DIM]   = { 0 };

// -------------------- RNG --------------------
static unsigned int G_Rng = 1;

static void rng_seed(unsigned int s)
{
  if(s == 0) s = 1;
  G_Rng = s;
}

static var rng_u01()
{
  G_Rng = 1664525u * G_Rng + 1013904223u;
  unsigned int x = (G_Rng >> 8) & 0x00FFFFFFu;
  return ((var)x + 1.0) / 16777217.0;
}

static var rng_norm()
{
  var u1 = rng_u01();
  var u2 = rng_u01();
  var r  = sqrt(-2.0 * log(u1));
  var th = 6.283185307179586 * u2;
  return r * cos(th);
}

// -------------------- Utility --------------------
static int memMB()
{
  var m = memory(0);
  if(!(m > 0)) return 0;
  return (int)(m/(1024*1024));
}

static void shed_if_needed()
{
  if(G_ShedTriggered) return;

  int mb = memMB();
  if(mb >= MEM_BUDGET_MB)
  {
    G_ShedTriggered = 1;
    G_Tol *= 10.0;

    G_DtMax *= 0.5;
    if(G_DtMax < DT_MIN) G_DtMax = DT_MIN;

    G_MaxRejects = (int)(G_MaxRejects/2);
    if(G_MaxRejects < 10000) G_MaxRejects = 10000;

    G_DtMinHitMax = (int)(G_DtMinHitMax/2);
    if(G_DtMinHitMax < 10000) G_DtMinHitMax = 10000;
  }
}

// -------------------- Triangular indexing --------------------
static int Aidx(int i, int j) { return (i*(i-1))/2 + j; }

static var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

static void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

static void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// -------------------- Activation --------------------
static var softsign(var z) { return z / (1.0 + fabs(z)); }

// -------------------- Exogenous input u(t) --------------------
static int G_Regime = 0;

static void eval_u(var t)
{
  var u0 = sin(0.11*t) * 0.8;
  if(G_Regime) u0 += 0.35;

  var u1 = cos(0.37*t) * 0.6;
  var u2 = sin(2.30*t) * 0.3;

  G_u[0] = u0;
  G_u[1] = u1;
  G_u[2] = u2;
}

// -------------------- theta(t) --------------------
static void theta_t(var t, var* dampOut, var* ampOut, var* scaleOut, var* stiffOut)
{
  var damp  = -0.40;
  var amp   =  0.35;
  var scale =  0.06;

  var drift = sin(0.03*t) * 0.25;

  var u0 = G_u[0];
  var u1 = G_u[1];

  if(G_Regime)
  {
    amp   *= 1.15;
    scale *= 1.10;
    damp  *= 1.05;
  }

  amp   *= (1.0 + 0.30*drift + 0.15*u0);
  scale *= (1.0 + 0.20*drift + 0.10*u1);
  damp  *= (1.0 + 0.15*drift - 0.10*u0);

  *dampOut  = damp;
  *ampOut   = amp;
  *scaleOut = scale;
  *stiffOut = 2.8;
}

// -------------------- Regime switching --------------------
static void update_regime(var t, var dt)
{
  var u0 = G_u[0];
  var u1 = G_u[1];

  var stress = fabs(u0) + 0.5*fabs(u1);
  var r01 = G_RegRate01 * (1.0 + 0.8*stress);
  var r10 = G_RegRate10 * (1.0 + 0.5*stress);

  var p;
  if(G_Regime == 0)
  {
    p = 1.0 - exp(-r01*dt);
    if(rng_u01() < p) G_Regime = 1;
  }
  else
  {
    p = 1.0 - exp(-r10*dt);
    if(rng_u01() < p) G_Regime = 0;
  }
}

// -------------------- Jumps --------------------
static void apply_jump_if_needed(var t, var dt, var* y)
{
  var u0  = G_u[0];
  var lam = G_JumpLambdaBase * (1.0 + 0.7*fabs(u0));
  if(G_Regime) lam *= 1.25;

  var p = 1.0 - exp(-lam*dt);
  if(rng_u01() >= p) return;

  var uu = rng_u01() - 0.5;
  var j  = tan(3.141592653589793 * uu);
  if(j >  6.0) j =  6.0;
  if(j < -6.0) j = -6.0;

  var mag = G_JumpMagBase * (1.0 + 0.5*fabs(u0));
  if(G_Regime) mag *= 1.2;

  for(int k=0;k<N;k++)
  {
    if(rng_u01() < 0.25)
    {
      var delta = mag * j;
      delta *= (1.0 + 0.01*(k+1));
      y[k] += delta;
    }
  }
}

// -------------------- Observations --------------------
static void observe_from_hidden(var t, var* h)
{
  for(int k=0;k<NOBS;k++)
  {
    var a = h[k];
    var b = h[k + NOBS];

    var z = a + 0.30*b + 0.05*sin(0.17*t + 0.3*(k+1));
    var xhat = z / (1.0 + fabs(z));
    G_xhat[k] = xhat;

    G_xobs[k] = xhat + 0.01*rng_norm();
  }
}

// ===========================================================
//           OpenCL (headerless dynamic loader)
// ===========================================================
#if USE_OPENCL

typedef int                 cl_int;
typedef unsigned int        cl_uint;
typedef unsigned long long  cl_ulong;
typedef unsigned long       cl_bitfield;
typedef cl_bitfield         cl_device_type;
typedef cl_bitfield         cl_mem_flags;
typedef cl_uint             cl_bool;

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;

#define CL_SUCCESS              0
#define CL_TRUE                 1

#define CL_DEVICE_TYPE_ALL      (1ULL << 0)
#define CL_DEVICE_TYPE_GPU      (1ULL << 2)

#define CL_MEM_READ_ONLY        (1ULL << 2)
#define CL_MEM_WRITE_ONLY       (1ULL << 1)

#define CL_PROGRAM_BUILD_LOG    0x1183

typedef cl_int   (__stdcall *PFN_clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
typedef cl_int   (__stdcall *PFN_clGetDeviceIDs)(cl_platform_id, cl_device_type, cl_uint, cl_device_id*, cl_uint*);
typedef cl_context (__stdcall *PFN_clCreateContext)(const void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
typedef cl_command_queue (__stdcall *PFN_clCreateCommandQueue)(cl_context, cl_device_id, cl_bitfield, cl_int*);
typedef cl_program (__stdcall *PFN_clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
typedef cl_int   (__stdcall *PFN_clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
typedef cl_int   (__stdcall *PFN_clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
typedef cl_kernel(__stdcall *PFN_clCreateKernel)(cl_program, const char*, cl_int*);
typedef cl_mem   (__stdcall *PFN_clCreateBuffer)(cl_context, cl_mem_flags, size_t, void*, cl_int*);
typedef cl_int   (__stdcall *PFN_clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
typedef cl_int   (__stdcall *PFN_clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
typedef cl_int   (__stdcall *PFN_clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
typedef cl_int   (__stdcall *PFN_clFinish)(cl_command_queue);
typedef cl_int   (__stdcall *PFN_clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);

typedef cl_int   (__stdcall *PFN_clReleaseMemObject)(cl_mem);
typedef cl_int   (__stdcall *PFN_clReleaseKernel)(cl_kernel);
typedef cl_int   (__stdcall *PFN_clReleaseProgram)(cl_program);
typedef cl_int   (__stdcall *PFN_clReleaseCommandQueue)(cl_command_queue);
typedef cl_int   (__stdcall *PFN_clReleaseContext)(cl_context);

static HMODULE gCL_Dll = 0;
static PFN_clGetPlatformIDs         p_clGetPlatformIDs = 0;
static PFN_clGetDeviceIDs           p_clGetDeviceIDs = 0;
static PFN_clCreateContext          p_clCreateContext = 0;
static PFN_clCreateCommandQueue     p_clCreateCommandQueue = 0;
static PFN_clCreateProgramWithSource p_clCreateProgramWithSource = 0;
static PFN_clBuildProgram           p_clBuildProgram = 0;
static PFN_clGetProgramBuildInfo    p_clGetProgramBuildInfo = 0;
static PFN_clCreateKernel           p_clCreateKernel = 0;
static PFN_clCreateBuffer           p_clCreateBuffer = 0;
static PFN_clSetKernelArg           p_clSetKernelArg = 0;
static PFN_clEnqueueWriteBuffer     p_clEnqueueWriteBuffer = 0;
static PFN_clEnqueueNDRangeKernel   p_clEnqueueNDRangeKernel = 0;
static PFN_clFinish                 p_clFinish = 0;
static PFN_clEnqueueReadBuffer      p_clEnqueueReadBuffer = 0;

static PFN_clReleaseMemObject       p_clReleaseMemObject = 0;
static PFN_clReleaseKernel          p_clReleaseKernel = 0;
static PFN_clReleaseProgram         p_clReleaseProgram = 0;
static PFN_clReleaseCommandQueue    p_clReleaseCommandQueue = 0;
static PFN_clReleaseContext         p_clReleaseContext = 0;

static int gCL_Ready = 0;
static int gCL_UsedCalls = 0;
static int gCL_FailedCalls = 0;
static cl_int gCL_LastErr = 0;

static cl_platform_id   gCL_Platform = 0;
static cl_device_id     gCL_Device   = 0;
static cl_context       gCL_Context  = 0;
static cl_command_queue gCL_Queue    = 0;
static cl_program       gCL_Program  = 0;
static cl_kernel        gCL_K_DH     = 0;

static cl_mem gCL_B_Stage = 0;
static cl_mem gCL_B_DH    = 0;

static const char* gCL_Source =
"__kernel void dh_kernel(__global const float* stage, __global float* dh,           \n"
"  float t, float damp, float amp, float scale, float stiff, int regime,           \n"
"  float u0, float u1, float u2)                                                   \n"
"{                                                                                \n"
"  int i = get_global_id(0);                                                       \n"
"  if(i >= 48) return;                                                             \n"
"  float ii = (float)(i+1);                                                        \n"
"  float sum = 0.0f;                                                               \n"
"  for(int j=0;j<48;j++)                                                           \n"
"  {                                                                              \n"
"    float jj = (float)(j+1);                                                      \n"
"    float bj = sin(0.17f*jj) * 0.03f;                                              \n"
"    float inj = u0 * 0.02f * sin(0.07f*jj + 0.19f*t);                              \n"
"    float hj = stage[j];                                                          \n"
"    float in = hj + bj + inj;                                                     \n"
"    float aj = in / (1.0f + fabs(in));                                            \n"
"    float ang = (0.013f*ii*jj) + (0.11f*ii) - (0.07f*jj);                          \n"
"    float ww = sin(ang) * scale;                                                  \n"
"    sum += ww * aj;                                                               \n"
"  }                                                                              \n"
"  float forcing = sin(t + 0.05f*ii) * amp;                                        \n"
"  forcing += sin(20.0f*t + 0.10f*ii) * 0.08f;                                     \n"
"  forcing += (u1 * 0.10f) + (0.05f*u2*sin(0.9f*t));                               \n"
"  float hi = stage[i];                                                            \n"
"  float self = ( (hi*1.8f) / (1.0f + fabs(hi*1.8f)) ) * 0.15f;                    \n"
"  float di = damp;                                                                \n"
"  if((i & 7) == 0) di *= stiff;                                                   \n"
"  float ghi = hi * di;                                                            \n"
"  dh[i] = sum + ghi + forcing + self;                                             \n"
"}                                                                                \n";

static void cl_release_all()
{
  if(gCL_B_DH && p_clReleaseMemObject)     { p_clReleaseMemObject(gCL_B_DH);     gCL_B_DH = 0; }
  if(gCL_B_Stage && p_clReleaseMemObject)  { p_clReleaseMemObject(gCL_B_Stage);  gCL_B_Stage = 0; }
  if(gCL_K_DH && p_clReleaseKernel)        { p_clReleaseKernel(gCL_K_DH);        gCL_K_DH = 0; }
  if(gCL_Program && p_clReleaseProgram)    { p_clReleaseProgram(gCL_Program);    gCL_Program = 0; }
  if(gCL_Queue && p_clReleaseCommandQueue) { p_clReleaseCommandQueue(gCL_Queue); gCL_Queue = 0; }
  if(gCL_Context && p_clReleaseContext)    { p_clReleaseContext(gCL_Context);    gCL_Context = 0; }

  gCL_Device = 0;
  gCL_Platform = 0;
  gCL_Ready = 0;

  if(gCL_Dll)
  {
    FreeLibrary(gCL_Dll);
    gCL_Dll = 0;
  }
}

static FARPROC cl_sym(const char* name)
{
  if(!gCL_Dll) return 0;
  return GetProcAddress(gCL_Dll, name);
}

static int cl_load()
{
  gCL_Dll = LoadLibraryA("OpenCL.dll");
  if(!gCL_Dll) return 0;

  p_clGetPlatformIDs = (PFN_clGetPlatformIDs)cl_sym("clGetPlatformIDs");
  p_clGetDeviceIDs = (PFN_clGetDeviceIDs)cl_sym("clGetDeviceIDs");
  p_clCreateContext = (PFN_clCreateContext)cl_sym("clCreateContext");
  p_clCreateCommandQueue = (PFN_clCreateCommandQueue)cl_sym("clCreateCommandQueue");
  p_clCreateProgramWithSource = (PFN_clCreateProgramWithSource)cl_sym("clCreateProgramWithSource");
  p_clBuildProgram = (PFN_clBuildProgram)cl_sym("clBuildProgram");
  p_clGetProgramBuildInfo = (PFN_clGetProgramBuildInfo)cl_sym("clGetProgramBuildInfo");
  p_clCreateKernel = (PFN_clCreateKernel)cl_sym("clCreateKernel");
  p_clCreateBuffer = (PFN_clCreateBuffer)cl_sym("clCreateBuffer");
  p_clSetKernelArg = (PFN_clSetKernelArg)cl_sym("clSetKernelArg");
  p_clEnqueueWriteBuffer = (PFN_clEnqueueWriteBuffer)cl_sym("clEnqueueWriteBuffer");
  p_clEnqueueNDRangeKernel = (PFN_clEnqueueNDRangeKernel)cl_sym("clEnqueueNDRangeKernel");
  p_clFinish = (PFN_clFinish)cl_sym("clFinish");
  p_clEnqueueReadBuffer = (PFN_clEnqueueReadBuffer)cl_sym("clEnqueueReadBuffer");

  p_clReleaseMemObject = (PFN_clReleaseMemObject)cl_sym("clReleaseMemObject");
  p_clReleaseKernel = (PFN_clReleaseKernel)cl_sym("clReleaseKernel");
  p_clReleaseProgram = (PFN_clReleaseProgram)cl_sym("clReleaseProgram");
  p_clReleaseCommandQueue = (PFN_clReleaseCommandQueue)cl_sym("clReleaseCommandQueue");
  p_clReleaseContext = (PFN_clReleaseContext)cl_sym("clReleaseContext");

  if(!p_clGetPlatformIDs || !p_clGetDeviceIDs || !p_clCreateContext || !p_clCreateCommandQueue ||
     !p_clCreateProgramWithSource || !p_clBuildProgram || !p_clCreateKernel || !p_clCreateBuffer ||
     !p_clSetKernelArg || !p_clEnqueueWriteBuffer || !p_clEnqueueNDRangeKernel || !p_clFinish ||
     !p_clEnqueueReadBuffer)
  {
    cl_release_all();
    return 0;
  }
  return 1;
}

static int cl_init()
{
  if(!cl_load()) return 0;

  cl_int err = CL_SUCCESS;

  cl_uint nPlatforms = 0;
  err = p_clGetPlatformIDs(0, 0, &nPlatforms);
  if(err != CL_SUCCESS || nPlatforms == 0) { cl_release_all(); return 0; }

  cl_platform_id platforms[8];
  if(nPlatforms > 8) nPlatforms = 8;
  err = p_clGetPlatformIDs(nPlatforms, platforms, &nPlatforms);
  if(err != CL_SUCCESS) { cl_release_all(); return 0; }

  cl_platform_id chosenP = 0;
  cl_device_id chosenD = 0;

  // choose GPU device if possible
  for(cl_uint p=0; p<nPlatforms; p++)
  {
    cl_uint nDev = 0;
    if(p_clGetDeviceIDs(platforms[p], CL_DEVICE_TYPE_GPU, 0, 0, &nDev) == CL_SUCCESS && nDev > 0)
    {
      cl_device_id devs[8];
      if(nDev > 8) nDev = 8;
      if(p_clGetDeviceIDs(platforms[p], CL_DEVICE_TYPE_GPU, nDev, devs, &nDev) == CL_SUCCESS)
      {
        chosenP = platforms[p];
        chosenD = devs[0];
        break;
      }
    }
  }

  // fallback: any device
  if(!chosenD)
  {
    for(cl_uint p=0; p<nPlatforms; p++)
    {
      cl_uint nDev = 0;
      if(p_clGetDeviceIDs(platforms[p], CL_DEVICE_TYPE_ALL, 0, 0, &nDev) == CL_SUCCESS && nDev > 0)
      {
        cl_device_id devs[8];
        if(nDev > 8) nDev = 8;
        if(p_clGetDeviceIDs(platforms[p], CL_DEVICE_TYPE_ALL, nDev, devs, &nDev) == CL_SUCCESS)
        {
          chosenP = platforms[p];
          chosenD = devs[0];
          break;
        }
      }
    }
  }

  if(!chosenD) { cl_release_all(); return 0; }

  gCL_Platform = chosenP;
  gCL_Device   = chosenD;

  gCL_Context = p_clCreateContext(0, 1, &gCL_Device, 0, 0, &err);
  if(err != CL_SUCCESS || !gCL_Context) { cl_release_all(); return 0; }

  gCL_Queue = p_clCreateCommandQueue(gCL_Context, gCL_Device, 0, &err);
  if(err != CL_SUCCESS || !gCL_Queue) { cl_release_all(); return 0; }

  gCL_Program = p_clCreateProgramWithSource(gCL_Context, 1, &gCL_Source, 0, &err);
  if(err != CL_SUCCESS || !gCL_Program) { cl_release_all(); return 0; }

  // NOTE: you can optionally pass build options here (some drivers like it):
  // const char* opts = "-cl-fast-relaxed-math";
  // err = p_clBuildProgram(gCL_Program, 1, &gCL_Device, opts, 0, 0);
  err = p_clBuildProgram(gCL_Program, 1, &gCL_Device, 0, 0, 0);
  if(err != CL_SUCCESS)
  {
    if(p_clGetProgramBuildInfo)
    {
      char logbuf[4096] = {0};
      size_t logsz = 0;
      p_clGetProgramBuildInfo(gCL_Program, gCL_Device, CL_PROGRAM_BUILD_LOG, sizeof(logbuf), logbuf, &logsz);
      printf("\nOpenCL build failed: %s", logbuf);
    }
    cl_release_all();
    return 0;
  }

  gCL_K_DH = p_clCreateKernel(gCL_Program, "dh_kernel", &err);
  if(err != CL_SUCCESS || !gCL_K_DH) { cl_release_all(); return 0; }

  gCL_B_Stage = p_clCreateBuffer(gCL_Context, CL_MEM_READ_ONLY,  sizeof(float)*N, 0, &err);
  if(err != CL_SUCCESS || !gCL_B_Stage) { cl_release_all(); return 0; }

  gCL_B_DH = p_clCreateBuffer(gCL_Context, CL_MEM_WRITE_ONLY, sizeof(float)*N, 0, &err);
  if(err != CL_SUCCESS || !gCL_B_DH) { cl_release_all(); return 0; }

  gCL_Ready = 1;
  gCL_UsedCalls = 0;
  gCL_FailedCalls = 0;
  gCL_LastErr = 0;
  return 1;
}

static void cl_fail(cl_int err)
{
  gCL_LastErr = err;
  gCL_FailedCalls++;
  gCL_Ready = 0;
#if CL_DEBUG
  printf("\n[OpenCL] ERROR %d -> fallback to CPU", (int)err);
#endif
}

// GPU drift eval
static void f_theta_vec_gpu(var t)
{
  eval_u(t);

  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  float stageF[N];
  float dhF[N] = { 0 };

  for(int k=0;k<N;k++) stageF[k] = (float)G_stage[k];

  cl_int err = p_clEnqueueWriteBuffer(gCL_Queue, gCL_B_Stage, CL_TRUE, 0, sizeof(float)*N, stageF, 0, 0, 0);
  if(err != CL_SUCCESS) { cl_fail(err); return; }

  int arg = 0;
  float tf     = (float)t;
  float dampf  = (float)damp;
  float ampf   = (float)amp;
  float scalef = (float)scale;
  float stifff = (float)stiff;
  int regime   = G_Regime;
  float u0 = (float)G_u[0];
  float u1 = (float)G_u[1];
  float u2 = (float)G_u[2];

  err  = p_clSetKernelArg(gCL_K_DH, arg++, sizeof(cl_mem), &gCL_B_Stage);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(cl_mem), &gCL_B_DH);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &tf);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &dampf);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &ampf);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &scalef);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &stifff);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(int),    &regime);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &u0);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &u1);
  err |= p_clSetKernelArg(gCL_K_DH, arg++, sizeof(float),  &u2);

  if(err != CL_SUCCESS) { cl_fail(err); return; }

  size_t global = (size_t)N;
  err = p_clEnqueueNDRangeKernel(gCL_Queue, gCL_K_DH, 1, 0, &global, 0, 0, 0, 0);
  if(err != CL_SUCCESS) { cl_fail(err); return; }

  err = p_clFinish(gCL_Queue);
  if(err != CL_SUCCESS) { cl_fail(err); return; }

  err = p_clEnqueueReadBuffer(gCL_Queue, gCL_B_DH, CL_TRUE, 0, sizeof(float)*N, dhF, 0, 0, 0);
  if(err != CL_SUCCESS) { cl_fail(err); return; }

  for(int k=0;k<N;k++) G_dh[k] = (var)dhF[k];

  gCL_UsedCalls++;

#if CL_DEBUG
  // print only the first GPU call to prove it produces numbers
  static int printed = 0;
  if(!printed)
  {
    printed = 1;
    printf("\n[OpenCL] dh sample (t=%g): dh[0]=%.6g dh[1]=%.6g dh[2]=%.6g dh[3]=%.6g",
      t, (double)G_dh[0], (double)G_dh[1], (double)G_dh[2], (double)G_dh[3]);
  }
#endif
}
#endif // USE_OPENCL

// ===========================================================
//                    CPU drift eval
// ===========================================================
static void f_theta_vec_cpu(var t)
{
  eval_u(t);

  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  for(int i=0;i<N;i++)
  {
    var sum = 0;
    var ii = (i+1);

    for(int j=0;j<N;j++)
    {
      var jj = (j+1);

      var bj  = sin(0.17*jj) * 0.03;
      var inj = G_u[0] * 0.02 * sin(0.07*jj + 0.19*t);

      var hj = G_stage[j];
      var in = hj + bj + inj;
      var aj = softsign(in);

      var ang = (0.013*ii*jj) + (0.11*ii) - (0.07*jj);
      var ww  = sin(ang) * scale;

      sum += ww * aj;
    }

    var forcing = sin(t + 0.05*ii) * amp;
    forcing += sin(20.0*t + 0.10*ii) * 0.08;
    forcing += (G_u[1] * 0.10) + (0.05*G_u[2]*sin(0.9*t));

    var hi   = G_stage[i];
    var self = softsign(hi*1.8) * 0.15;

    var di = damp;
    if((i & 7) == 0) di *= stiff;

    var ghi = hi * di;

    G_dh[i] = sum + ghi + forcing + self;
  }
}

static void f_theta_vec(var t)
{
#if USE_OPENCL
  if(gCL_Ready)
  {
    f_theta_vec_gpu(t);
    if(gCL_Ready) return;
  }
#endif
  f_theta_vec_cpu(t);
}

// -------------------- Vector RK step --------------------
static void rk_step_vec(int s, float* Atri, float* b, float* c,
                        var t, var* y, var h, var* yout)
{
  if(s < 1)
  {
    for(int k=0;k<N;k++) yout[k] = y[k];
    return;
  }
  if(s > MAXS) s = MAXS;

  for(int i=0;i<s;i++)
    for(int k=0;k<N;k++)
      G_K[i*N + k] = (fvar)0;

  for(int i=0;i<s;i++)
  {
    for(int k=0;k<N;k++)
    {
      var acc = y[k];
      for(int j=0;j<i;j++)
        acc += ((var)G_K[j*N + k]) * getA(Atri,i,j) * h;

      G_stage[k] = acc;
    }

    var dt = ((var)c[i]) * h;

    f_theta_vec(t + dt);

    for(int k=0;k<N;k++)
      G_K[i*N + k] = (fvar)G_dh[k];
  }

  for(int k=0;k<N;k++)
  {
    var acc = y[k];
    for(int i=0;i<s;i++)
      acc += ((var)G_K[i*N + k]) * (var)b[i] * h;

    yout[k] = acc;
  }
}

static var err_norm(var* a, var* b)
{
  var s = 0;
  for(int k=0;k<N;k++) { var d = a[k] - b[k]; s += d*d; }
  return sqrt(s / (var)N);
}

static var rms_norm(var* a, int n)
{
  var s = 0;
  for(int k=0;k<n;k++) { var x = a[k]; s += x*x; }
  return sqrt(s / (var)n);
}

static void permute_method_into(int s, float* A, float* b, float* c,
                                int* p,
                                int* sOut, float* Aout, float* bout, float* cout)
{
  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(int i=0;i<TRI_A;i++) Aout[i] = 0;
  for(int i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  for(int i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  for(int i=0;i<*sOut;i++)
    for(int j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

static void make_swappable_rk4(int* s, float* Atri, float* b, float* c)
{
  zeroTableau(s, Atri, b, c);
  *s = 4;

  c[0] = 0.0f;
  c[1] = 0.30f;
  c[2] = 0.80f;
  c[3] = 1.00f;

  setA(Atri, 1, 0, 0.30);
  setA(Atri, 2, 0, 0.80);
  setA(Atri, 3, 1, 0.50);
  setA(Atri, 3, 2, 0.50);

  b[0] = 0.10f;
  b[1] = 0.20f;
  b[2] = 0.30f;
  b[3] = 0.40f;
}

static void apply_diffusion(var t, var dt, var* y)
{
  eval_u(t);

  var u0 = G_u[0];
  var u2 = G_u[2];

  var sigma = G_SigmaBase * (1.0 + 0.6*fabs(u0) + 0.3*fabs(u2));
  if(G_Regime) sigma *= 1.25;

  var sdt = sqrt(dt);

  for(int k=0;k<N;k++)
  {
    var gmul = 1.0 + (fabs(y[k]) * 0.15);
    y[k] += sigma * gmul * sdt * rng_norm();
  }
}

static void integrate_adaptive_sde(int s, float* A, float* b, float* c,
                                   var t0, var* y0, var tEnd,
                                   unsigned int seed,
                                   var* yOut,
                                   int* stepsOut, int* rejectsOut,
                                   var* dtMinOut,
                                   int* stopReasonOut)
{
  rng_seed(seed);
  G_Regime = 0;

  var t = t0;
  var dt = DT_INIT;
  var dtMinSeen = 1e9;

  for(int k=0;k<N;k++) G_ycur[k] = y0[k];

  int steps = 0, rejects = 0, iters = 0, dtMinHits = 0, steadyCount = 0;
  int stopReason = 0;

  while(t < tEnd)
  {
    if((iters & 255) == 0) shed_if_needed();
    iters++;

    if(dt < DT_MIN) dt = DT_MIN;
    if(dt > G_DtMax) dt = G_DtMax;
    if(t + dt > tEnd) dt = tEnd - t;

    if(dt < dtMinSeen) dtMinSeen = dt;
    if(dt <= DT_MIN) dtMinHits++;

    if(dtMinHits > G_DtMinHitMax) { stopReason = 4; break; }

    rk_step_vec(s, A, b, c, t, G_ycur, dt, G_full);

    var halfdt = dt * 0.5;
    rk_step_vec(s, A, b, c, t,        G_ycur, halfdt, G_mid);
    rk_step_vec(s, A, b, c, t+halfdt, G_mid,  halfdt, G_half);

    var e = err_norm(G_half, G_full);

    if(e <= G_Tol || dt <= DT_MIN)
    {
      for(int k=0;k<N;k++) G_ycur[k] = G_half[k];

      eval_u(t);
      update_regime(t, dt);

      apply_diffusion(t, dt, G_ycur);
      apply_jump_if_needed(t, dt, G_ycur);

      observe_from_hidden(t + dt, G_ycur);

      var yr = rms_norm(G_ycur, N);
      var xr = rms_norm(G_xobs, NOBS);
      if(yr > G_StateLimitRMS || xr > G_ObsLimitRMS)
      {
        stopReason = 2;
        t += dt; steps++;
        break;
      }

      for(int k=0;k<N;k++) G_stage[k] = G_ycur[k];
      f_theta_vec(t + dt);
      var dr = rms_norm(G_dh, N);

      if(dr < G_SteadyEps) steadyCount++; else steadyCount = 0;

      t += dt; steps++;
      if(steadyCount >= G_SteadyNeed) { stopReason = 1; break; }

      if(e < 1e-16) dt *= 2.0;
      else
      {
        var fac = pow(G_Tol / e, 0.2) * 0.9;
        if(fac < 0.5) fac = 0.5;
        if(fac > 2.0) fac = 2.0;
        dt *= fac;
      }
    }
    else
    {
      rejects++;
      if(rejects > G_MaxRejects) { stopReason = 3; break; }
      dt *= 0.5;
    }
  }

  for(int k=0;k<N;k++) yOut[k] = G_ycur[k];
  *stepsOut = steps;
  *rejectsOut = rejects;
  *dtMinOut = dtMinSeen;
  *stopReasonOut = stopReason;
}

// ===========================================================
//        One-time GPU vs CPU drift self-test (prints numbers)
// ===========================================================
static void drift_selftest()
{
  // stage = Y0 snapshot at t = 0.5 (arbitrary)
  for(int k=0;k<N;k++) G_stage[k] = Y0[k];

  var ttest = 0.5;

  // CPU
  var dh_cpu[N];
  f_theta_vec_cpu(ttest);
  for(int k=0;k<N;k++) dh_cpu[k] = G_dh[k];

#if USE_OPENCL
  if(gCL_Ready)
  {
    // GPU
    f_theta_vec_gpu(ttest);
    var dh_gpu[N];
    for(int k=0;k<N;k++) dh_gpu[k] = G_dh[k];

    // RMS diff
    var s = 0;
    for(int k=0;k<N;k++) { var d = dh_gpu[k] - dh_cpu[k]; s += d*d; }
    var rms = sqrt(s / (var)N);

    printf("\n[SelfTest] dh CPU vs GPU @t=%g: RMS diff = %.3e", ttest, (double)rms);
    printf("\n[SelfTest] CPU dh[0..3]=%.6g %.6g %.6g %.6g",
      (double)dh_cpu[0], (double)dh_cpu[1], (double)dh_cpu[2], (double)dh_cpu[3]);
    printf("\n[SelfTest] GPU dh[0..3]=%.6g %.6g %.6g %.6g",
      (double)dh_gpu[0], (double)dh_gpu[1], (double)dh_gpu[2], (double)dh_gpu[3]);
  }
  else
  {
    printf("\n[SelfTest] OpenCL not ready -> skipped GPU compare");
  }
#else
  printf("\n[SelfTest] USE_OPENCL=0 -> CPU only");
#endif
}

// -------------------- Entry point --------------------
DLLFUNC int main()
{
#if USE_OPENCL
  if(cl_init())
    printf("\nOpenCL: enabled (dynamic loader)");
  else
    printf("\nOpenCL: not available -> CPU fallback");
#endif

  // reset knobs
  G_ShedTriggered = 0;
  G_Tol   = TOL;
  G_DtMax = DT_MAX;

  G_MaxRejects  = 2000000;
  G_DtMinHitMax = 500000;

  G_SteadyEps     = 1e-7;
  G_SteadyNeed    = 250;
  G_StateLimitRMS = 1e6;
  G_ObsLimitRMS   = 1e6;

  G_SigmaBase       = 0.02;
  G_JumpLambdaBase  = 0.20;
  G_JumpMagBase     = 0.20;
  G_RegRate01       = 0.15;
  G_RegRate10       = 0.10;

  int   S = 0;
  float A[TRI_A];
  float b[MAXS];
  float c[MAXS];
  make_swappable_rk4(&S, A, b, c);

  int   SB = 0;
  float AB[TRI_A];
  float bB[MAXS];
  float cB[MAXS];

  int p[MAXS];
  p[0]=0; p[1]=2; p[2]=1; p[3]=3;
  permute_method_into(S, A, b, c, p, &SB, AB, bB, cB);

  int   SW = S;
  float AW[TRI_A];
  float bW[MAXS];
  float cW[MAXS];

  for(int i=0;i<TRI_A;i++) AW[i] = A[i];
  for(int i=0;i<MAXS;i++)  { bW[i]=0; cW[i]=0; }
  for(int i=0;i<SW;i++)
  {
    int pi = p[i];
    bW[i] = b[pi];
    cW[i] = c[pi];
  }

  for(int k=0;k<N;k++)
    Y0[k] = 0.2*sin(0.31*(k+1)) + 0.15*cos(0.17*(k+1))
          + 0.05*sin(0.07*(k+1)*(k+1));

  var t0 = 0.0;

  // seed for custom RNG
  var now = wdate(NOW);
  unsigned int MySeed = (unsigned int)utm(now);
  var frac = now - floor(now);
  var ms = frac * 86400000.0;
  MySeed = 1664525u * MySeed + (unsigned int)ms + 1013904223u;
  if(MySeed == 0) MySeed = 1;
  printf("\nMySeed=%u", MySeed);

  // ---- PROOF: print GPU numerical results + CPU comparison ----
  drift_selftest();

  int stepsRef=0,  rejRef=0;   var dtMinRef=0;  int stopRef=0;
  int stepsSafe=0, rejSafe=0;  var dtMinSafe=0; int stopSafe=0;
  int stepsBug=0,  rejBug=0;   var dtMinBug=0;  int stopBug=0;

  integrate_adaptive_sde(S,  A,  b,  c,  t0, Y0, T_END, MySeed, Yref,
                         &stepsRef,  &rejRef,  &dtMinRef,  &stopRef);

  integrate_adaptive_sde(SB, AB, bB, cB, t0, Y0, T_END, MySeed, Ysafe,
                         &stepsSafe, &rejSafe, &dtMinSafe, &stopSafe);

  integrate_adaptive_sde(SW, AW, bW, cW, t0, Y0, T_END, MySeed, Ybug,
                         &stepsBug,  &rejBug,  &dtMinBug,  &stopBug);

  var diffSafe = err_norm(Ysafe, Yref);
  var diffBug  = err_norm(Ybug,  Yref);

  printf("\n\nNeural SDE / Hybrid ODE integration (N=%d) to T=%g  baseTol=%g", N, T_END, TOL);

  printf("\nA) Reference tableau:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsRef, rejRef, dtMinRef, stopRef);

  printf("\nB) Theorem-382A safe stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsSafe, rejSafe, dtMinSafe, stopSafe);
  printf("\n   diff to reference (RMS) = %.3e", (double)diffSafe);

  printf("\nC) Buggy stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsBug, rejBug, dtMinBug, stopBug);
  printf("\n   diff to reference (RMS) = %.3e", (double)diffBug);

#if USE_OPENCL
  printf("\n\nOpenCL usage: gpu_calls=%d failed_calls=%d last_err=%d",
         gCL_UsedCalls, gCL_FailedCalls, (int)gCL_LastErr);
  cl_release_all();
#endif

  printf("\n\nStop codes: 0=tEnd, 1=steady-state, 2=event(blow-up), 3=rejectBudget, 4=dtMinBudget\n");

  return quit("done");
}

The Clockwork Storm Weaver (CUDA) [Re: TipmyPip] #489143
40 minutes ago
40 minutes ago
Joined: Sep 2017
Posts: 192
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 192
Code
// demo_neuralODE_complex_382A_adaptive_main_cuda_nosdk.cpp (Zorro64 C++)
//
// Headerless CUDA (Driver API) + NVRTC runtime compilation (no cuda.h, no nvrtc.h, no .lib).
// - Loads nvcuda.dll and nvrtc64_*.dll dynamically.
// - Compiles CUDA C kernel to PTX at runtime, loads PTX, launches kernel.
// - Prints numerical outputs proving CUDA drift is computed, reports GPU usage.
// - If CUDA/NVRTC missing or any failure -> CPU fallback.

#include <zorro.h>
#include <math.h>
#include <windows.h>

#define USE_CUDA  1
#define CU_DEBUG  1   // set 0 to reduce prints

#define MAXS 4
#define TRI_A 6

// -------------------- Dimensions --------------------
#define N         48
#define K_SIZE    192

#define U_DIM     3
#define NOBS      12

// -------------------- Base knobs --------------------
#define T_END     20.0
#define DT_INIT   0.20
#define DT_MIN    1e-6
#define DT_MAX    0.50
#define TOL       1e-6

// -------------------- Shedding --------------------
#define MEM_BUDGET_MB 2000
static int G_ShedTriggered = 0;

// runtime knobs
static var G_Tol   = TOL;
static var G_DtMax = DT_MAX;

// budgets
static int G_MaxRejects  = 2000000;
static int G_DtMinHitMax = 500000;

// termination knobs
static var G_SteadyEps     = 1e-7;
static int G_SteadyNeed    = 250;
static var G_StateLimitRMS = 1e6;
static var G_ObsLimitRMS   = 1e6;

// SDE / hybrid knobs
static var G_SigmaBase      = 0.02;
static var G_JumpLambdaBase = 0.20;
static var G_JumpMagBase    = 0.20;
static var G_RegRate01      = 0.15;
static var G_RegRate10      = 0.10;

#define TIGHT_MEM 1
#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef var fvar;
#endif

// -------------------- Global buffers --------------------
static fvar G_K[K_SIZE]  = { 0 };
static var  G_stage[N]   = { 0 };
static var  G_dh[N]      = { 0 };

static var  G_ycur[N]    = { 0 };
static var  G_full[N]    = { 0 };
static var  G_half[N]    = { 0 };
static var  G_mid[N]     = { 0 };

static var  Y0[N]        = { 0 };
static var  Yref[N]      = { 0 };
static var  Ysafe[N]     = { 0 };
static var  Ybug[N]      = { 0 };

static var  G_xhat[NOBS] = { 0 };
static var  G_xobs[NOBS] = { 0 };

static var  G_u[U_DIM]   = { 0 };

// -------------------- RNG --------------------
static unsigned int G_Rng = 1;

static void rng_seed(unsigned int s)
{
  if(s == 0) s = 1;
  G_Rng = s;
}

static var rng_u01()
{
  G_Rng = 1664525u * G_Rng + 1013904223u;
  unsigned int x = (G_Rng >> 8) & 0x00FFFFFFu;
  return ((var)x + 1.0) / 16777217.0;
}

static var rng_norm()
{
  var u1 = rng_u01();
  var u2 = rng_u01();
  var r  = sqrt(-2.0 * log(u1));
  var th = 6.283185307179586 * u2;
  return r * cos(th);
}

// -------------------- Utility --------------------
static int memMB()
{
  var m = memory(0);
  if(!(m > 0)) return 0;
  return (int)(m/(1024*1024));
}

static void shed_if_needed()
{
  if(G_ShedTriggered) return;

  int mb = memMB();
  if(mb >= MEM_BUDGET_MB)
  {
    G_ShedTriggered = 1;
    G_Tol *= 10.0;

    G_DtMax *= 0.5;
    if(G_DtMax < DT_MIN) G_DtMax = DT_MIN;

    G_MaxRejects = (int)(G_MaxRejects/2);
    if(G_MaxRejects < 10000) G_MaxRejects = 10000;

    G_DtMinHitMax = (int)(G_DtMinHitMax/2);
    if(G_DtMinHitMax < 10000) G_DtMinHitMax = 10000;
  }
}

// -------------------- Triangular indexing --------------------
static int Aidx(int i, int j) { return (i*(i-1))/2 + j; }

static var getA(float* Atri, int i, int j)
{
  if(i <= j) return 0;
  return (var)Atri[Aidx(i,j)];
}

static void setA(float* Atri, int i, int j, var v)
{
  if(i <= j) return;
  Atri[Aidx(i,j)] = (float)v;
}

static void zeroTableau(int* s, float* Atri, float* b, float* c)
{
  int i;
  *s = 0;
  for(i=0;i<TRI_A;i++) Atri[i] = 0;
  for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}

// -------------------- Activation --------------------
static var softsign(var z) { return z / (1.0 + fabs(z)); }

// -------------------- Exogenous input u(t) --------------------
static int G_Regime = 0;

static void eval_u(var t)
{
  var u0 = sin(0.11*t) * 0.8;
  if(G_Regime) u0 += 0.35;

  var u1 = cos(0.37*t) * 0.6;
  var u2 = sin(2.30*t) * 0.3;

  G_u[0] = u0;
  G_u[1] = u1;
  G_u[2] = u2;
}

// -------------------- theta(t) --------------------
static void theta_t(var t, var* dampOut, var* ampOut, var* scaleOut, var* stiffOut)
{
  var damp  = -0.40;
  var amp   =  0.35;
  var scale =  0.06;

  var drift = sin(0.03*t) * 0.25;

  var u0 = G_u[0];
  var u1 = G_u[1];

  if(G_Regime)
  {
    amp   *= 1.15;
    scale *= 1.10;
    damp  *= 1.05;
  }

  amp   *= (1.0 + 0.30*drift + 0.15*u0);
  scale *= (1.0 + 0.20*drift + 0.10*u1);
  damp  *= (1.0 + 0.15*drift - 0.10*u0);

  *dampOut  = damp;
  *ampOut   = amp;
  *scaleOut = scale;
  *stiffOut = 2.8;
}

// -------------------- Regime switching --------------------
static void update_regime(var t, var dt)
{
  var u0 = G_u[0];
  var u1 = G_u[1];

  var stress = fabs(u0) + 0.5*fabs(u1);
  var r01 = G_RegRate01 * (1.0 + 0.8*stress);
  var r10 = G_RegRate10 * (1.0 + 0.5*stress);

  var p;
  if(G_Regime == 0)
  {
    p = 1.0 - exp(-r01*dt);
    if(rng_u01() < p) G_Regime = 1;
  }
  else
  {
    p = 1.0 - exp(-r10*dt);
    if(rng_u01() < p) G_Regime = 0;
  }
}

// -------------------- Jumps --------------------
static void apply_jump_if_needed(var t, var dt, var* y)
{
  var u0  = G_u[0];
  var lam = G_JumpLambdaBase * (1.0 + 0.7*fabs(u0));
  if(G_Regime) lam *= 1.25;

  var p = 1.0 - exp(-lam*dt);
  if(rng_u01() >= p) return;

  var uu = rng_u01() - 0.5;
  var j  = tan(3.141592653589793 * uu);
  if(j >  6.0) j =  6.0;
  if(j < -6.0) j = -6.0;

  var mag = G_JumpMagBase * (1.0 + 0.5*fabs(u0));
  if(G_Regime) mag *= 1.2;

  for(int k=0;k<N;k++)
  {
    if(rng_u01() < 0.25)
    {
      var delta = mag * j;
      delta *= (1.0 + 0.01*(k+1));
      y[k] += delta;
    }
  }
}

// -------------------- Observations --------------------
static void observe_from_hidden(var t, var* h)
{
  for(int k=0;k<NOBS;k++)
  {
    var a = h[k];
    var b = h[k + NOBS];

    var z = a + 0.30*b + 0.05*sin(0.17*t + 0.3*(k+1));
    var xhat = z / (1.0 + fabs(z));
    G_xhat[k] = xhat;

    G_xobs[k] = xhat + 0.01*rng_norm();
  }
}

// ===========================================================
//                    CUDA (Driver API + NVRTC)
// ===========================================================
#if USE_CUDA

// --- minimal CUDA driver types ---
typedef int CUresult;
typedef int CUdevice;
typedef struct CUctx_st* CUcontext;
typedef struct CUmod_st* CUmodule;
typedef struct CUfunc_st* CUfunction;
typedef unsigned long long CUdeviceptr;

#define CUDA_SUCCESS 0

// function pointer types (nvcuda.dll)
typedef CUresult (__stdcall *PFN_cuInit)(unsigned int);
typedef CUresult (__stdcall *PFN_cuDeviceGetCount)(int*);
typedef CUresult (__stdcall *PFN_cuDeviceGet)(CUdevice*, int);
typedef CUresult (__stdcall *PFN_cuDeviceComputeCapability)(int*, int*, CUdevice);
typedef CUresult (__stdcall *PFN_cuCtxCreate_v2)(CUcontext*, unsigned int, CUdevice);
typedef CUresult (__stdcall *PFN_cuCtxDestroy_v2)(CUcontext);

typedef CUresult (__stdcall *PFN_cuModuleLoadDataEx)(CUmodule*, const void*, unsigned int, void*, void*);
typedef CUresult (__stdcall *PFN_cuModuleGetFunction)(CUfunction*, CUmodule, const char*);
typedef CUresult (__stdcall *PFN_cuModuleUnload)(CUmodule);

typedef CUresult (__stdcall *PFN_cuMemAlloc_v2)(CUdeviceptr*, size_t);
typedef CUresult (__stdcall *PFN_cuMemFree_v2)(CUdeviceptr);
typedef CUresult (__stdcall *PFN_cuMemcpyHtoD_v2)(CUdeviceptr, const void*, size_t);
typedef CUresult (__stdcall *PFN_cuMemcpyDtoH_v2)(void*, CUdeviceptr, size_t);
typedef CUresult (__stdcall *PFN_cuLaunchKernel)(
  CUfunction,
  unsigned int, unsigned int, unsigned int,
  unsigned int, unsigned int, unsigned int,
  unsigned int,
  void*,
  void**,
  void**);
typedef CUresult (__stdcall *PFN_cuCtxSynchronize)(void);

// --- minimal NVRTC types ---
typedef int nvrtcResult;
typedef struct _nvrtcProgram* nvrtcProgram;

#define NVRTC_SUCCESS 0

typedef nvrtcResult (__stdcall *PFN_nvrtcCreateProgram)(
  nvrtcProgram*,
  const char*,
  const char*,
  int,
  const char* const*,
  const char* const*);
typedef nvrtcResult (__stdcall *PFN_nvrtcCompileProgram)(nvrtcProgram, int, const char* const*);
typedef nvrtcResult (__stdcall *PFN_nvrtcGetProgramLogSize)(nvrtcProgram, size_t*);
typedef nvrtcResult (__stdcall *PFN_nvrtcGetProgramLog)(nvrtcProgram, char*);
typedef nvrtcResult (__stdcall *PFN_nvrtcGetPTXSize)(nvrtcProgram, size_t*);
typedef nvrtcResult (__stdcall *PFN_nvrtcGetPTX)(nvrtcProgram, char*);
typedef nvrtcResult (__stdcall *PFN_nvrtcDestroyProgram)(nvrtcProgram*);

// loaded DLLs
static HMODULE gCU_Dll = 0;
static HMODULE gNVRTC_Dll = 0;

// CUDA pointers
static PFN_cuInit p_cuInit = 0;
static PFN_cuDeviceGetCount p_cuDeviceGetCount = 0;
static PFN_cuDeviceGet p_cuDeviceGet = 0;
static PFN_cuDeviceComputeCapability p_cuDeviceComputeCapability = 0;
static PFN_cuCtxCreate_v2 p_cuCtxCreate = 0;
static PFN_cuCtxDestroy_v2 p_cuCtxDestroy = 0;

static PFN_cuModuleLoadDataEx p_cuModuleLoadDataEx = 0;
static PFN_cuModuleGetFunction p_cuModuleGetFunction = 0;
static PFN_cuModuleUnload p_cuModuleUnload = 0;

static PFN_cuMemAlloc_v2 p_cuMemAlloc = 0;
static PFN_cuMemFree_v2 p_cuMemFree = 0;
static PFN_cuMemcpyHtoD_v2 p_cuMemcpyHtoD = 0;
static PFN_cuMemcpyDtoH_v2 p_cuMemcpyDtoH = 0;
static PFN_cuLaunchKernel p_cuLaunchKernel = 0;
static PFN_cuCtxSynchronize p_cuCtxSynchronize = 0;

// NVRTC pointers
static PFN_nvrtcCreateProgram p_nvrtcCreateProgram = 0;
static PFN_nvrtcCompileProgram p_nvrtcCompileProgram = 0;
static PFN_nvrtcGetProgramLogSize p_nvrtcGetProgramLogSize = 0;
static PFN_nvrtcGetProgramLog p_nvrtcGetProgramLog = 0;
static PFN_nvrtcGetPTXSize p_nvrtcGetPTXSize = 0;
static PFN_nvrtcGetPTX p_nvrtcGetPTX = 0;
static PFN_nvrtcDestroyProgram p_nvrtcDestroyProgram = 0;

// CUDA state
static int gCU_Ready = 0;
static int gCU_UsedCalls = 0;
static int gCU_FailedCalls = 0;
static CUresult gCU_LastErr = 0;

static CUdevice  gCU_Device = 0;
static CUcontext gCU_Ctx = 0;
static CUmodule  gCU_Mod = 0;
static CUfunction gCU_Func = 0;

static CUdeviceptr gCU_dStage = 0; // float[N]
static CUdeviceptr gCU_dDh    = 0; // float[N]

// CUDA kernel code compiled by NVRTC
static const char* gCU_Source =
"extern \"C\" __global__ void dh_kernel(const float* stage, float* dh,              \n"
"  float t, float damp, float amp, float scale, float stiff, int regime,           \n"
"  float u0, float u1, float u2)                                                   \n"
"{                                                                                \n"
"  int i = (int)(blockIdx.x * blockDim.x + threadIdx.x);                            \n"
"  if(i >= 48) return;                                                             \n"
"  float ii = (float)(i+1);                                                        \n"
"  float sum = 0.0f;                                                               \n"
"  #pragma unroll                                                                  \n"
"  for(int j=0;j<48;j++)                                                           \n"
"  {                                                                              \n"
"    float jj = (float)(j+1);                                                      \n"
"    float bj  = __sinf(0.17f*jj) * 0.03f;                                          \n"
"    float inj = u0 * 0.02f * __sinf(0.07f*jj + 0.19f*t);                           \n"
"    float hj = stage[j];                                                          \n"
"    float in = hj + bj + inj;                                                     \n"
"    float aj = in / (1.0f + fabsf(in));                                           \n"
"    float ang = (0.013f*ii*jj) + (0.11f*ii) - (0.07f*jj);                          \n"
"    float ww = __sinf(ang) * scale;                                               \n"
"    sum += ww * aj;                                                               \n"
"  }                                                                              \n"
"  float forcing = __sinf(t + 0.05f*ii) * amp;                                      \n"
"  forcing += __sinf(20.0f*t + 0.10f*ii) * 0.08f;                                   \n"
"  forcing += (u1 * 0.10f) + (0.05f*u2*__sinf(0.9f*t));                             \n"
"  float hi = stage[i];                                                            \n"
"  float self = ((hi*1.8f) / (1.0f + fabsf(hi*1.8f))) * 0.15f;                      \n"
"  float di = damp;                                                                \n"
"  if((i & 7) == 0) di *= stiff;                                                   \n"
"  float ghi = hi * di;                                                            \n"
"  dh[i] = sum + ghi + forcing + self;                                             \n"
"}                                                                                \n";

static void cu_fail(CUresult e)
{
  gCU_LastErr = e;
  gCU_FailedCalls++;
  gCU_Ready = 0;
#if CU_DEBUG
  printf("\n[CUDA] ERROR %d -> fallback to CPU", (int)e);
#endif
}

// try load nvrtc dll from a list of common names
static HMODULE load_nvrtc_any()
{
  const char* names[] = {
    "nvrtc64_123_0.dll",
    "nvrtc64_122_0.dll",
    "nvrtc64_121_0.dll",
    "nvrtc64_120_0.dll",
    "nvrtc64_118_0.dll",
    "nvrtc64_117_0.dll",
    "nvrtc64_116_0.dll",
    "nvrtc64_115_0.dll",
    "nvrtc64_114_0.dll",
    "nvrtc64_113_0.dll",
    "nvrtc64_112_0.dll",
    "nvrtc64_111_0.dll",
    "nvrtc64_110_0.dll",
    "nvrtc64_102_0.dll",
    "nvrtc64_101_0.dll",
    "nvrtc64_100_0.dll",
    "nvrtc64_92.dll"
  };
  for(int i=0;i<(int)(sizeof(names)/sizeof(names[0])); i++)
  {
    HMODULE h = LoadLibraryA(names[i]);
    if(h) return h;
  }
  // last resort: generic name (some setups provide it)
  return LoadLibraryA("nvrtc64.dll");
}

static FARPROC sym(HMODULE h, const char* name)
{
  if(!h) return 0;
  return GetProcAddress(h, name);
}

static void cu_release_all()
{
  if(gCU_dDh && p_cuMemFree)    { p_cuMemFree(gCU_dDh);    gCU_dDh = 0; }
  if(gCU_dStage && p_cuMemFree) { p_cuMemFree(gCU_dStage); gCU_dStage = 0; }
  if(gCU_Mod && p_cuModuleUnload) { p_cuModuleUnload(gCU_Mod); gCU_Mod = 0; }
  if(gCU_Ctx && p_cuCtxDestroy) { p_cuCtxDestroy(gCU_Ctx); gCU_Ctx = 0; }

  gCU_Ready = 0;

  if(gNVRTC_Dll) { FreeLibrary(gNVRTC_Dll); gNVRTC_Dll = 0; }
  if(gCU_Dll)    { FreeLibrary(gCU_Dll);    gCU_Dll = 0; }
}

static int cu_load()
{
  // CUDA driver
  gCU_Dll = LoadLibraryA("nvcuda.dll");
  if(!gCU_Dll) return 0;

  p_cuInit = (PFN_cuInit)sym(gCU_Dll, "cuInit");
  p_cuDeviceGetCount = (PFN_cuDeviceGetCount)sym(gCU_Dll, "cuDeviceGetCount");
  p_cuDeviceGet = (PFN_cuDeviceGet)sym(gCU_Dll, "cuDeviceGet");
  p_cuDeviceComputeCapability = (PFN_cuDeviceComputeCapability)sym(gCU_Dll, "cuDeviceComputeCapability");
  p_cuCtxCreate = (PFN_cuCtxCreate_v2)sym(gCU_Dll, "cuCtxCreate_v2");
  p_cuCtxDestroy = (PFN_cuCtxDestroy_v2)sym(gCU_Dll, "cuCtxDestroy_v2");

  p_cuModuleLoadDataEx = (PFN_cuModuleLoadDataEx)sym(gCU_Dll, "cuModuleLoadDataEx");
  p_cuModuleGetFunction = (PFN_cuModuleGetFunction)sym(gCU_Dll, "cuModuleGetFunction");
  p_cuModuleUnload = (PFN_cuModuleUnload)sym(gCU_Dll, "cuModuleUnload");

  p_cuMemAlloc = (PFN_cuMemAlloc_v2)sym(gCU_Dll, "cuMemAlloc_v2");
  p_cuMemFree = (PFN_cuMemFree_v2)sym(gCU_Dll, "cuMemFree_v2");
  p_cuMemcpyHtoD = (PFN_cuMemcpyHtoD_v2)sym(gCU_Dll, "cuMemcpyHtoD_v2");
  p_cuMemcpyDtoH = (PFN_cuMemcpyDtoH_v2)sym(gCU_Dll, "cuMemcpyDtoH_v2");
  p_cuLaunchKernel = (PFN_cuLaunchKernel)sym(gCU_Dll, "cuLaunchKernel");
  p_cuCtxSynchronize = (PFN_cuCtxSynchronize)sym(gCU_Dll, "cuCtxSynchronize");

  if(!p_cuInit || !p_cuDeviceGetCount || !p_cuDeviceGet || !p_cuCtxCreate || !p_cuCtxDestroy ||
     !p_cuModuleLoadDataEx || !p_cuModuleGetFunction || !p_cuModuleUnload ||
     !p_cuMemAlloc || !p_cuMemFree || !p_cuMemcpyHtoD || !p_cuMemcpyDtoH ||
     !p_cuLaunchKernel || !p_cuCtxSynchronize)
  {
    cu_release_all();
    return 0;
  }

  // NVRTC
  gNVRTC_Dll = load_nvrtc_any();
  if(!gNVRTC_Dll)
  {
#if CU_DEBUG
    printf("\n[CUDA] NVRTC dll not found -> CPU fallback");
#endif
    cu_release_all();
    return 0;
  }

  p_nvrtcCreateProgram = (PFN_nvrtcCreateProgram)sym(gNVRTC_Dll, "nvrtcCreateProgram");
  p_nvrtcCompileProgram = (PFN_nvrtcCompileProgram)sym(gNVRTC_Dll, "nvrtcCompileProgram");
  p_nvrtcGetProgramLogSize = (PFN_nvrtcGetProgramLogSize)sym(gNVRTC_Dll, "nvrtcGetProgramLogSize");
  p_nvrtcGetProgramLog = (PFN_nvrtcGetProgramLog)sym(gNVRTC_Dll, "nvrtcGetProgramLog");
  p_nvrtcGetPTXSize = (PFN_nvrtcGetPTXSize)sym(gNVRTC_Dll, "nvrtcGetPTXSize");
  p_nvrtcGetPTX = (PFN_nvrtcGetPTX)sym(gNVRTC_Dll, "nvrtcGetPTX");
  p_nvrtcDestroyProgram = (PFN_nvrtcDestroyProgram)sym(gNVRTC_Dll, "nvrtcDestroyProgram");

  if(!p_nvrtcCreateProgram || !p_nvrtcCompileProgram || !p_nvrtcGetProgramLogSize ||
     !p_nvrtcGetProgramLog || !p_nvrtcGetPTXSize || !p_nvrtcGetPTX || !p_nvrtcDestroyProgram)
  {
    cu_release_all();
    return 0;
  }

  return 1;
}

static int cu_init()
{
  if(!cu_load()) return 0;

  CUresult e = p_cuInit(0);
  if(e != CUDA_SUCCESS) { cu_release_all(); return 0; }

  int count = 0;
  e = p_cuDeviceGetCount(&count);
  if(e != CUDA_SUCCESS || count <= 0) { cu_release_all(); return 0; }

  e = p_cuDeviceGet(&gCU_Device, 0);
  if(e != CUDA_SUCCESS) { cu_release_all(); return 0; }

  int maj = 0, min = 0;
  if(p_cuDeviceComputeCapability)
    p_cuDeviceComputeCapability(&maj, &min, gCU_Device);

  e = p_cuCtxCreate(&gCU_Ctx, 0, gCU_Device);
  if(e != CUDA_SUCCESS || !gCU_Ctx) { cu_release_all(); return 0; }

  // Compile kernel to PTX via NVRTC
  nvrtcProgram prog = 0;
  nvrtcResult r = p_nvrtcCreateProgram(&prog, gCU_Source, "dh_kernel.cu", 0, 0, 0);
  if(r != NVRTC_SUCCESS || !prog) { cu_release_all(); return 0; }

  char archOpt[64];
  if(maj <= 0) { maj = 5; min = 2; } // conservative fallback
  sprintf(archOpt, "--gpu-architecture=compute_%d%d", maj, min);

  const char* opts[] = {
    archOpt,
    "--use_fast_math"
  };

  r = p_nvrtcCompileProgram(prog, (int)(sizeof(opts)/sizeof(opts[0])), opts);

  // log
  size_t logSize = 0;
  p_nvrtcGetProgramLogSize(prog, &logSize);
  if(logSize > 1)
  {
    char* logbuf = (char*)malloc(logSize + 1);
    if(logbuf)
    {
      logbuf[0] = 0;
      p_nvrtcGetProgramLog(prog, logbuf);
#if CU_DEBUG
      printf("\n[NVRTC] %s", logbuf);
#endif
      free(logbuf);
    }
  }

  if(r != NVRTC_SUCCESS)
  {
    p_nvrtcDestroyProgram(&prog);
    cu_release_all();
    return 0;
  }

  size_t ptxSize = 0;
  r = p_nvrtcGetPTXSize(prog, &ptxSize);
  if(r != NVRTC_SUCCESS || ptxSize < 8) { p_nvrtcDestroyProgram(&prog); cu_release_all(); return 0; }

  char* ptx = (char*)malloc(ptxSize + 1);
  if(!ptx) { p_nvrtcDestroyProgram(&prog); cu_release_all(); return 0; }
  ptx[0] = 0;

  r = p_nvrtcGetPTX(prog, ptx);
  p_nvrtcDestroyProgram(&prog);
  if(r != NVRTC_SUCCESS) { free(ptx); cu_release_all(); return 0; }

  // Load PTX into driver
  e = p_cuModuleLoadDataEx(&gCU_Mod, (const void*)ptx, 0, 0, 0);
  free(ptx);
  if(e != CUDA_SUCCESS || !gCU_Mod) { cu_release_all(); return 0; }

  e = p_cuModuleGetFunction(&gCU_Func, gCU_Mod, "dh_kernel");
  if(e != CUDA_SUCCESS || !gCU_Func) { cu_release_all(); return 0; }

  // Allocate device buffers
  e = p_cuMemAlloc(&gCU_dStage, sizeof(float)*N);
  if(e != CUDA_SUCCESS || !gCU_dStage) { cu_release_all(); return 0; }

  e = p_cuMemAlloc(&gCU_dDh, sizeof(float)*N);
  if(e != CUDA_SUCCESS || !gCU_dDh) { cu_release_all(); return 0; }

  gCU_Ready = 1;
  gCU_UsedCalls = 0;
  gCU_FailedCalls = 0;
  gCU_LastErr = 0;

#if CU_DEBUG
  printf("\n[CUDA] init OK (device cc=%d.%d)", maj, min);
#endif
  return 1;
}

// GPU drift eval (CUDA)
static void f_theta_vec_cuda(var t)
{
  eval_u(t);

  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  float stageF[N];
  float dhF[N] = {0};

  for(int k=0;k<N;k++) stageF[k] = (float)G_stage[k];

  CUresult e = p_cuMemcpyHtoD(gCU_dStage, stageF, sizeof(float)*N);
  if(e != CUDA_SUCCESS) { cu_fail(e); return; }

  // Kernel params (must match kernel signature)
  float tf = (float)t;
  float dampf  = (float)damp;
  float ampf   = (float)amp;
  float scalef = (float)scale;
  float stifff = (float)stiff;
  int regime = (int)G_Regime;
  float u0 = (float)G_u[0];
  float u1 = (float)G_u[1];
  float u2 = (float)G_u[2];

  void* params[] = {
    (void*)&gCU_dStage,
    (void*)&gCU_dDh,
    (void*)&tf,
    (void*)&dampf,
    (void*)&ampf,
    (void*)&scalef,
    (void*)&stifff,
    (void*)&regime,
    (void*)&u0,
    (void*)&u1,
    (void*)&u2
  };

  // launch: 1 block of 64 threads is enough for N=48
  unsigned int block = 64;
  unsigned int grid  = 1;

  e = p_cuLaunchKernel(gCU_Func,
                      grid,1,1,
                      block,1,1,
                      0, 0,
                      params, 0);
  if(e != CUDA_SUCCESS) { cu_fail(e); return; }

  e = p_cuCtxSynchronize();
  if(e != CUDA_SUCCESS) { cu_fail(e); return; }

  e = p_cuMemcpyDtoH(dhF, gCU_dDh, sizeof(float)*N);
  if(e != CUDA_SUCCESS) { cu_fail(e); return; }

  for(int k=0;k<N;k++) G_dh[k] = (var)dhF[k];

  gCU_UsedCalls++;

#if CU_DEBUG
  static int printed = 0;
  if(!printed)
  {
    printed = 1;
    printf("\n[CUDA] dh sample (t=%g): dh[0]=%.6g dh[1]=%.6g dh[2]=%.6g dh[3]=%.6g",
      t, (double)G_dh[0], (double)G_dh[1], (double)G_dh[2], (double)G_dh[3]);
  }
#endif
}
#endif // USE_CUDA

// ===========================================================
//                    CPU drift eval
// ===========================================================
static void f_theta_vec_cpu(var t)
{
  eval_u(t);

  var damp, amp, scale, stiff;
  theta_t(t, &damp, &amp, &scale, &stiff);

  for(int i=0;i<N;i++)
  {
    var sum = 0;
    var ii = (i+1);

    for(int j=0;j<N;j++)
    {
      var jj = (j+1);

      var bj  = sin(0.17*jj) * 0.03;
      var inj = G_u[0] * 0.02 * sin(0.07*jj + 0.19*t);

      var hj = G_stage[j];
      var in = hj + bj + inj;
      var aj = softsign(in);

      var ang = (0.013*ii*jj) + (0.11*ii) - (0.07*jj);
      var ww  = sin(ang) * scale;

      sum += ww * aj;
    }

    var forcing = sin(t + 0.05*ii) * amp;
    forcing += sin(20.0*t + 0.10*ii) * 0.08;
    forcing += (G_u[1] * 0.10) + (0.05*G_u[2]*sin(0.9*t));

    var hi   = G_stage[i];
    var self = softsign(hi*1.8) * 0.15;

    var di = damp;
    if((i & 7) == 0) di *= stiff;

    var ghi = hi * di;

    G_dh[i] = sum + ghi + forcing + self;
  }
}

static void f_theta_vec(var t)
{
#if USE_CUDA
  if(gCU_Ready)
  {
    f_theta_vec_cuda(t);
    if(gCU_Ready) return;
  }
#endif
  f_theta_vec_cpu(t);
}

// -------------------- Vector RK step --------------------
static void rk_step_vec(int s, float* Atri, float* b, float* c,
                        var t, var* y, var h, var* yout)
{
  if(s < 1)
  {
    for(int k=0;k<N;k++) yout[k] = y[k];
    return;
  }
  if(s > MAXS) s = MAXS;

  for(int i=0;i<s;i++)
    for(int k=0;k<N;k++)
      G_K[i*N + k] = (fvar)0;

  for(int i=0;i<s;i++)
  {
    for(int k=0;k<N;k++)
    {
      var acc = y[k];
      for(int j=0;j<i;j++)
        acc += ((var)G_K[j*N + k]) * getA(Atri,i,j) * h;

      G_stage[k] = acc;
    }

    var dt = ((var)c[i]) * h;

    f_theta_vec(t + dt);

    for(int k=0;k<N;k++)
      G_K[i*N + k] = (fvar)G_dh[k];
  }

  for(int k=0;k<N;k++)
  {
    var acc = y[k];
    for(int i=0;i<s;i++)
      acc += ((var)G_K[i*N + k]) * (var)b[i] * h;

    yout[k] = acc;
  }
}

static var err_norm(var* a, var* b)
{
  var s = 0;
  for(int k=0;k<N;k++) { var d = a[k] - b[k]; s += d*d; }
  return sqrt(s / (var)N);
}

static var rms_norm(var* a, int n)
{
  var s = 0;
  for(int k=0;k<n;k++) { var x = a[k]; s += x*x; }
  return sqrt(s / (var)n);
}

// -------------------- Tableau permutation --------------------
static void permute_method_into(int s, float* A, float* b, float* c,
                                int* p,
                                int* sOut, float* Aout, float* bout, float* cout)
{
  *sOut = s;
  if(*sOut < 0) *sOut = 0;
  if(*sOut > MAXS) *sOut = MAXS;

  for(int i=0;i<TRI_A;i++) Aout[i] = 0;
  for(int i=0;i<MAXS;i++)  { bout[i]=0; cout[i]=0; }

  for(int i=0;i<*sOut;i++)
  {
    int pi = p[i];
    if(pi < 0) pi = 0;
    if(pi >= s) pi = s-1;
    bout[i] = b[pi];
    cout[i] = c[pi];
  }

  for(int i=0;i<*sOut;i++)
    for(int j=0;j<i;j++)
    {
      int pi = p[i];
      int pj = p[j];
      if(pi < 0) pi = 0;
      if(pj < 0) pj = 0;
      if(pi >= s) pi = s-1;
      if(pj >= s) pj = s-1;
      setA(Aout, i, j, getA(A, pi, pj));
    }
}

static void make_swappable_rk4(int* s, float* Atri, float* b, float* c)
{
  zeroTableau(s, Atri, b, c);
  *s = 4;

  c[0] = 0.0f;
  c[1] = 0.30f;
  c[2] = 0.80f;
  c[3] = 1.00f;

  setA(Atri, 1, 0, 0.30);
  setA(Atri, 2, 0, 0.80);
  setA(Atri, 3, 1, 0.50);
  setA(Atri, 3, 2, 0.50);

  b[0] = 0.10f;
  b[1] = 0.20f;
  b[2] = 0.30f;
  b[3] = 0.40f;
}

static void apply_diffusion(var t, var dt, var* y)
{
  eval_u(t);

  var u0 = G_u[0];
  var u2 = G_u[2];

  var sigma = G_SigmaBase * (1.0 + 0.6*fabs(u0) + 0.3*fabs(u2));
  if(G_Regime) sigma *= 1.25;

  var sdt = sqrt(dt);

  for(int k=0;k<N;k++)
  {
    var gmul = 1.0 + (fabs(y[k]) * 0.15);
    y[k] += sigma * gmul * sdt * rng_norm();
  }
}

static void integrate_adaptive_sde(int s, float* A, float* b, float* c,
                                   var t0, var* y0, var tEnd,
                                   unsigned int seed,
                                   var* yOut,
                                   int* stepsOut, int* rejectsOut,
                                   var* dtMinOut,
                                   int* stopReasonOut)
{
  rng_seed(seed);
  G_Regime = 0;

  var t = t0;
  var dt = DT_INIT;
  var dtMinSeen = 1e9;

  for(int k=0;k<N;k++) G_ycur[k] = y0[k];

  int steps = 0, rejects = 0, iters = 0, dtMinHits = 0, steadyCount = 0;
  int stopReason = 0;

  while(t < tEnd)
  {
    if((iters & 255) == 0) shed_if_needed();
    iters++;

    if(dt < DT_MIN) dt = DT_MIN;
    if(dt > G_DtMax) dt = G_DtMax;
    if(t + dt > tEnd) dt = tEnd - t;

    if(dt < dtMinSeen) dtMinSeen = dt;
    if(dt <= DT_MIN) dtMinHits++;

    if(dtMinHits > G_DtMinHitMax) { stopReason = 4; break; }

    rk_step_vec(s, A, b, c, t, G_ycur, dt, G_full);

    var halfdt = dt * 0.5;
    rk_step_vec(s, A, b, c, t,        G_ycur, halfdt, G_mid);
    rk_step_vec(s, A, b, c, t+halfdt, G_mid,  halfdt, G_half);

    var e = err_norm(G_half, G_full);

    if(e <= G_Tol || dt <= DT_MIN)
    {
      for(int k=0;k<N;k++) G_ycur[k] = G_half[k];

      eval_u(t);
      update_regime(t, dt);

      apply_diffusion(t, dt, G_ycur);
      apply_jump_if_needed(t, dt, G_ycur);

      observe_from_hidden(t + dt, G_ycur);

      var yr = rms_norm(G_ycur, N);
      var xr = rms_norm(G_xobs, NOBS);
      if(yr > G_StateLimitRMS || xr > G_ObsLimitRMS)
      {
        stopReason = 2;
        t += dt; steps++;
        break;
      }

      for(int k=0;k<N;k++) G_stage[k] = G_ycur[k];
      f_theta_vec(t + dt);
      var dr = rms_norm(G_dh, N);

      if(dr < G_SteadyEps) steadyCount++; else steadyCount = 0;

      t += dt; steps++;
      if(steadyCount >= G_SteadyNeed) { stopReason = 1; break; }

      if(e < 1e-16) dt *= 2.0;
      else
      {
        var fac = pow(G_Tol / e, 0.2) * 0.9;
        if(fac < 0.5) fac = 0.5;
        if(fac > 2.0) fac = 2.0;
        dt *= fac;
      }
    }
    else
    {
      rejects++;
      if(rejects > G_MaxRejects) { stopReason = 3; break; }
      dt *= 0.5;
    }
  }

  for(int k=0;k<N;k++) yOut[k] = G_ycur[k];
  *stepsOut = steps;
  *rejectsOut = rejects;
  *dtMinOut = dtMinSeen;
  *stopReasonOut = stopReason;
}

// self-test: CPU vs CUDA drift numbers
static void drift_selftest()
{
  for(int k=0;k<N;k++) G_stage[k] = Y0[k];
  var ttest = 0.5;

  var dh_cpu[N];
  f_theta_vec_cpu(ttest);
  for(int k=0;k<N;k++) dh_cpu[k] = G_dh[k];

#if USE_CUDA
  if(gCU_Ready)
  {
    f_theta_vec_cuda(ttest);
    var dh_gpu[N];
    for(int k=0;k<N;k++) dh_gpu[k] = G_dh[k];

    var s = 0;
    for(int k=0;k<N;k++) { var d = dh_gpu[k] - dh_cpu[k]; s += d*d; }
    var rms = sqrt(s / (var)N);

    printf("\n[SelfTest] dh CPU vs CUDA @t=%g: RMS diff = %.3e", ttest, (double)rms);
    printf("\n[SelfTest] CPU dh[0..3]=%.6g %.6g %.6g %.6g",
      (double)dh_cpu[0], (double)dh_cpu[1], (double)dh_cpu[2], (double)dh_cpu[3]);
    printf("\n[SelfTest] CUDA dh[0..3]=%.6g %.6g %.6g %.6g",
      (double)dh_gpu[0], (double)dh_gpu[1], (double)dh_gpu[2], (double)dh_gpu[3]);
  }
  else
  {
    printf("\n[SelfTest] CUDA not ready -> skipped GPU compare");
  }
#endif
}

// -------------------- Entry point --------------------
DLLFUNC int main()
{
#if USE_CUDA
  if(cu_init())
    printf("\nCUDA: enabled (Driver API + NVRTC)");
  else
    printf("\nCUDA: not available -> CPU fallback");
#endif

  // reset knobs
  G_ShedTriggered = 0;
  G_Tol   = TOL;
  G_DtMax = DT_MAX;

  G_MaxRejects  = 2000000;
  G_DtMinHitMax = 500000;

  G_SteadyEps     = 1e-7;
  G_SteadyNeed    = 250;
  G_StateLimitRMS = 1e6;
  G_ObsLimitRMS   = 1e6;

  G_SigmaBase       = 0.02;
  G_JumpLambdaBase  = 0.20;
  G_JumpMagBase     = 0.20;
  G_RegRate01       = 0.15;
  G_RegRate10       = 0.10;

  int   S = 0;
  float A[TRI_A];
  float b[MAXS];
  float c[MAXS];
  make_swappable_rk4(&S, A, b, c);

  int   SB = 0;
  float AB[TRI_A];
  float bB[MAXS];
  float cB[MAXS];

  int p[MAXS];
  p[0]=0; p[1]=2; p[2]=1; p[3]=3;
  permute_method_into(S, A, b, c, p, &SB, AB, bB, cB);

  int   SW = S;
  float AW[TRI_A];
  float bW[MAXS];
  float cW[MAXS];

  for(int i=0;i<TRI_A;i++) AW[i] = A[i];
  for(int i=0;i<MAXS;i++)  { bW[i]=0; cW[i]=0; }
  for(int i=0;i<SW;i++)
  {
    int pi = p[i];
    bW[i] = b[pi];
    cW[i] = c[pi];
  }

  for(int k=0;k<N;k++)
    Y0[k] = 0.2*sin(0.31*(k+1)) + 0.15*cos(0.17*(k+1))
          + 0.05*sin(0.07*(k+1)*(k+1));

  var t0 = 0.0;

  var now = wdate(NOW);
  unsigned int MySeed = (unsigned int)utm(now);
  var frac = now - floor(now);
  var ms = frac * 86400000.0;
  MySeed = 1664525u * MySeed + (unsigned int)ms + 1013904223u;
  if(MySeed == 0) MySeed = 1;
  printf("\nMySeed=%u", MySeed);

  drift_selftest();

  int stepsRef=0,  rejRef=0;   var dtMinRef=0;  int stopRef=0;
  int stepsSafe=0, rejSafe=0;  var dtMinSafe=0; int stopSafe=0;
  int stepsBug=0,  rejBug=0;   var dtMinBug=0;  int stopBug=0;

  integrate_adaptive_sde(S,  A,  b,  c,  t0, Y0, T_END, MySeed, Yref,
                         &stepsRef,  &rejRef,  &dtMinRef,  &stopRef);

  integrate_adaptive_sde(SB, AB, bB, cB, t0, Y0, T_END, MySeed, Ysafe,
                         &stepsSafe, &rejSafe, &dtMinSafe, &stopSafe);

  integrate_adaptive_sde(SW, AW, bW, cW, t0, Y0, T_END, MySeed, Ybug,
                         &stepsBug,  &rejBug,  &dtMinBug,  &stopBug);

  var diffSafe = err_norm(Ysafe, Yref);
  var diffBug  = err_norm(Ybug,  Yref);

  printf("\n\nNeural SDE / Hybrid ODE integration (N=%d) to T=%g baseTol=%g", N, T_END, TOL);

  printf("\nA) Reference tableau:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsRef, rejRef, dtMinRef, stopRef);

  printf("\nB) Safe stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsSafe, rejSafe, dtMinSafe, stopSafe);
  printf("\n   diff to reference (RMS) = %.3e", (double)diffSafe);

  printf("\nC) Buggy stage-refactor:");
  printf("\n   accepted=%d rejected=%d minDt=%g stop=%d", stepsBug, rejBug, dtMinBug, stopBug);
  printf("\n   diff to reference (RMS) = %.3e", (double)diffBug);

#if USE_CUDA
  printf("\n\nCUDA usage: gpu_calls=%d failed_calls=%d last_err=%d",
         gCU_UsedCalls, gCU_FailedCalls, (int)gCU_LastErr);
  cu_release_all();
#endif

  printf("\n\nStop codes: 0=tEnd, 1=steady-state, 2=event(blow-up), 3=rejectBudget, 4=dtMinBudget\n");

  return quit("done");
}

Last edited by TipmyPip; 38 minutes ago.
Page 13 of 13 1 2 11 12 13

Moderated by  Petra 

Powered by UBB.threads™ PHP Forum Software 7.7.1