OP
Member
Joined: Sep 2017
Posts: 187
|
The following code is a demonstration tale about stage shuffling. It shows that you can reorder the internal steps of an explicit solver, rebuild its coefficient tables to match the new order, and still land on the same final state after one composed update. At the same time, the new order can be cheaper because it improves the reuse of an expensive time based feature. The program defines a tiny time dependent neural style rule that drives a single hidden variable forward. The rule mixes the current hidden value with a transformed view of the current time, pushes the mixture through a safe softsign activation, and returns a rate of change. The time transform uses the sine function, treated here as the costly part. A deliberately small cache stores only the most recent time and the most recent sine value, and a counter records how many real sine evaluations were required. Two explicit methods are built using compact Runge Kutta tables. The first is a two stage Heun step used as a finishing pass. The second is a custom four stage step designed to include a repeated stage time. In the original stage order, two stages share the same time offset within the step, but they are separated by another stage that uses a different time. With a one entry cache, that separation forces the cache to forget the repeated time and compute sine again later. To keep memory small, the dependency coefficients are stored in packed triangular form. Helper functions translate a stage pair into a packed index, read a coefficient, and write a coefficient. Another helper clears a tableau so the build routines start from a clean slate. The core routine that advances the state performs one explicit step: it loops over stages, forms each stage state from earlier stage derivatives and dependency weights, evaluates the neural rule at the stage time, then combines all stage derivatives with the output weights to form the step result. A composition routine chains two such steps, advancing time between them because the rule depends on time. The stage permutation routine builds an equivalent tableau from an original tableau and a permutation map. It permutes the stage time list and output weights, and it relabels the dependency weights consistently so each new stage still refers to the correct original stage values. In this demo, the Heun method is left in its natural order, while the custom four stage method is permuted by swapping two stages so the two equal time evaluations become adjacent. The entry point runs the composed update twice. First it resets the cache and counter, applies the original methods, and records the output and sine count. Then it resets again, applies the permuted methods, and records the output and sine count. It prints a correctness check that the two outputs match closely, and it prints an advantage check showing fewer sine calls after the shuffle. The story is simple: the reveal stays the same, but the backstage choreography becomes smarter. In real Zorro strategies, the expensive piece could be an indicator update, a custom feature lookup, or a broker query; arranging repeated inputs next to each other can reduce overhead without changing trading logic. // demo_neuralODE_theorem_382A_advantage_main.c (Zorro lite-C)
//
// What this program demonstrates
// ------------------------------
// We integrate a simple *time-dependent* Neural ODE:
//
// dh/dt = f_theta(t, h)
//
// using a composed integrator: first a 4-stage explicit RK method (M2),
// then a 2-stage Heun RK2 method (M1).
//
// Theorem 382A (as used here) says:
// If you *relabel / reorder* the internal RK stages and you permute the
// RK coefficient tables (A,b,c) *consistently*, you obtain an *equivalent*
// RK method that produces the same one-step result. This remains true
// when chaining methods (composition).
//
// The "advantage" shown here:
// The ODE right-hand-side f_theta() uses sin(t) (treated as expensive).
// M2 evaluates f_theta twice at the same stage time (t + 0.5*dt), but those
// two equal-time stages are NOT adjacent in the original ordering. We keep a
// very small cache that remembers only the *last* sin(t) computed.
// By permuting M2’s stages so the equal-time stages become adjacent, the cache
// hits more often -> fewer sin() computations, while the final result is unchanged.
#define MAXS 4
#define TRI_A 6 // number of stored A-entries for an explicit 4-stage RK (lower triangle only)
static var G_k[MAXS]; // stage derivatives k[i] for one RK step
// ---- Method M1: Heun RK2 (2 stages) ----
// Stored as a classic RK tableau (A,b,c) in compact triangular form for A.
static int M1_s = 0; // number of stages
static float M1_A[TRI_A]; // A coefficients (lower triangle packed)
static float M1_b[MAXS]; // output weights
static float M1_c[MAXS]; // stage time offsets (as fraction of step size)
// ---- Method M2: custom 4-stage explicit RK with a repeated stage time ----
// Stage evaluation times (c values) are: 0, 0.5, 1.0, 0.5
// Stages 1 and 3 both evaluate at t + 0.5*dt, but they are separated by stage 2.
// With a "remember only last t" cache, this separation causes cache misses.
// We'll build a permuted version where the two 0.5 stages are adjacent.
static int M2_s = 0;
static float M2_A[TRI_A];
static float M2_b[MAXS];
static float M2_c[MAXS];
// ---- Permuted (“B”) versions of the same methods ----
// These are produced by reordering stages and permuting A,b,c consistently.
// They should yield the same numerical one-step result as the originals.
static int M1B_s = 0;
static float M1B_A[TRI_A];
static float M1B_b[MAXS];
static float M1B_c[MAXS];
static int M2B_s = 0;
static float M2B_A[TRI_A];
static float M2B_b[MAXS];
static float M2B_c[MAXS];
// ---------------------------------------------------------------------------
// Compact storage for explicit RK tableaus
// ---------------------------------------------------------------------------
// For explicit RK, only entries with i>j are used (lower triangle).
// We store those coefficients in a 1D array using triangular indexing.
int Aidx(int i, int j)
{
// maps (i,j) with i>j into packed array position
return (i*(i-1))/2 + j;
}
var getA(float* Atri, int i, int j)
{
// explicit RK has A[i,j] = 0 for i<=j
if(i <= j) return 0;
return (var)Atri[Aidx(i,j)];
}
void setA(float* Atri, int i, int j, var v)
{
// only store lower-triangular entries
if(i <= j) return;
Atri[Aidx(i,j)] = (float)v;
}
void zeroTableau(int* s, float* Atri, float* b, float* c)
{
// reset an RK tableau to all zeros
int i;
*s = 0;
for(i=0;i<TRI_A;i++) Atri[i] = 0;
for(i=0;i<MAXS;i++) { b[i]=0; c[i]=0; }
}
// ---------------------------------------------------------------------------
// Neural ODE right-hand side + cost accounting
// ---------------------------------------------------------------------------
// We count how many *actual* sin() computations happen to show the optimization.
// sin_cached() is intentionally tiny: it remembers only the last (t, sin(t)) pair.
// Therefore, two equal-time evaluations only benefit if they occur consecutively.
static int G_sinCalls = 0; // number of times we really call sin()
static var G_lastSinT = 1e99;
static var G_lastSinV = 0;
var sin_cached(var t)
{
// if the time matches the previous call (within tolerance), reuse the last value
if(abs(t - G_lastSinT) < 1e-12)
return G_lastSinV;
// otherwise compute and remember
G_lastSinT = t;
G_lastSinV = sin(t);
G_sinCalls += 1;
return G_lastSinV;
}
// smooth bounded activation used inside the toy neural function
var softsign(var z)
{
return z / (1.0 + abs(z));
}
// Neural ODE: dh/dt = f_theta(t,h)
// Uses sin_cached(t) so stage ordering can reduce sin() calls.
var f_theta(var t, var h)
{
// treat this as a small "neural net" with hard-coded weights.
// sin(t) acts as an expensive feature transform.
var s = sin_cached(t);
var z = 1.20*h + 0.70*s - 0.30;
var a = softsign(z);
return 0.90*a + 0.10;
}
// ---------------------------------------------------------------------------
// One explicit RK step: y -> y1 over step size h
// ---------------------------------------------------------------------------
// Computes stages k[i] = f(t + c[i]*h, y + h * sum_j A[i,j]*k[j])
// then combines them with weights b[i].
var rk_step(int s, float* Atri, float* b, float* c, var t, var y, var h)
{
int i,j;
if(s < 1) return y;
if(s > MAXS) s = MAXS;
// clear stage array
for(i=0;i<s;i++) G_k[i] = 0;
// stage loop
for(i=0;i<s;i++)
{
// build the stage state yi from previous stages
var yi = y;
for(j=0;j<i;j++)
yi += h * getA(Atri,i,j) * G_k[j];
// evaluate derivative at stage time
G_k[i] = f_theta(t + (var)c[i]*h, yi);
}
// combine stages into output
var y1 = y;
for(i=0;i<s;i++)
y1 += h * (var)b[i] * G_k[i];
return y1;
}
// ---------------------------------------------------------------------------
// Method composition: do M2 step first, then M1 step
// ---------------------------------------------------------------------------
// Because f_theta depends on time, the second step must start at (t + h).
// This function advances the solution by two substeps of size h in time.
var compose_step(int s1,float* A1,float* b1,float* c1,
int s2,float* A2,float* b2,float* c2,
var t, var y, var h)
{
var y_mid = rk_step(s2,A2,b2,c2,t, y, h); // first substep at time t
var y_out = rk_step(s1,A1,b1,c1,t+h, y_mid, h); // second substep at time t+h
return y_out;
}
// ---------------------------------------------------------------------------
// Build a stage-permuted (equivalent) RK tableau
// ---------------------------------------------------------------------------
// Input: original (A,b,c) with s stages and a permutation array p[].
// Output: (Aout,bout,cout) where stage i of the new method corresponds to stage p[i]
// of the old method. This is the “stage relabeling” used for Theorem 382A.
void permute_method_into(int s, float* A, float* b, float* c,
int* p,
int* sOut, float* Aout, float* bout, float* cout)
{
int i,j;
*sOut = s;
if(*sOut < 0) *sOut = 0;
if(*sOut > MAXS) *sOut = MAXS;
// clear outputs
for(i=0;i<TRI_A;i++) Aout[i] = 0;
for(i=0;i<MAXS;i++) { bout[i]=0; cout[i]=0; }
// permute b and c (stage weights and stage times)
for(i=0;i<*sOut;i++)
{
int pi = p[i];
if(pi < 0) pi = 0;
if(pi >= s) pi = s-1;
bout[i] = b[pi];
cout[i] = c[pi];
}
// permute A (stage-to-stage dependencies), lower triangle only
for(i=0;i<*sOut;i++)
for(j=0;j<i;j++)
{
int pi = p[i];
int pj = p[j];
if(pi < 0) pi = 0;
if(pj < 0) pj = 0;
if(pi >= s) pi = s-1;
if(pj >= s) pj = s-1;
setA(Aout, i, j, getA(A, pi, pj));
}
}
// ---------------------------------------------------------------------------
// Define the two base RK methods (M1, M2)
// ---------------------------------------------------------------------------
void make_heun_rk2()
{
// Heun / explicit trapezoid method (2 stages)
zeroTableau(&M1_s, M1_A, M1_b, M1_c);
M1_s = 2;
M1_c[0] = 0.0; // stage 0 at t
M1_c[1] = 1.0; // stage 1 at t+h
setA(M1_A, 1, 0, 1.0); // y1 stage uses k0
M1_b[0] = 0.5; // average of k0 and k1
M1_b[1] = 0.5;
}
void make_custom4_repeatTime()
{
// A simple 4-stage explicit method that evaluates two stages at the same time (0.5)
// but places them apart in the stage sequence to create cache misses.
zeroTableau(&M2_s, M2_A, M2_b, M2_c);
M2_s = 4;
// stage times (fractions of step size)
M2_c[0] = 0.0;
M2_c[1] = 0.5; // repeated time #1
M2_c[2] = 1.0;
M2_c[3] = 0.5; // repeated time #2 (same as stage 1)
// explicit dependencies between stages
setA(M2_A, 1, 0, 0.5); // stage1 uses stage0
setA(M2_A, 2, 1, 1.0); // stage2 uses stage1
setA(M2_A, 3, 0, 0.5); // stage3 uses stage0 (so we can move it relative to stage2)
// simple equal weights (this is a demo method, not a named high-order scheme)
M2_b[0] = 0.25;
M2_b[1] = 0.25;
M2_b[2] = 0.25;
M2_b[3] = 0.25;
}
void buildAll()
{
// Build original methods
make_heun_rk2();
make_custom4_repeatTime();
// Stage permutations used to construct equivalent “B” methods:
//
// - M1 (Heun) is left unchanged (identity permutation).
// - M2 swaps stages 2 and 3 so the two 0.5-time stages become adjacent:
// old: 0, 1(0.5), 2(1.0), 3(0.5)
// new: 0, 1(0.5), 3(0.5), 2(1.0)
//
// This improves the hit rate of the tiny sin(t) cache.
int p1[MAXS]; p1[0]=0; p1[1]=1; p1[2]=0; p1[3]=0;
int p2[MAXS]; p2[0]=0; p2[1]=1; p2[2]=3; p2[3]=2;
permute_method_into(M1_s, M1_A, M1_b, M1_c, p1, &M1B_s, M1B_A, M1B_b, M1B_c);
permute_method_into(M2_s, M2_A, M2_b, M2_c, p2, &M2B_s, M2B_A, M2B_b, M2B_c);
}
// reset sin-cache state and call counter (for fair “before vs after” cost comparison)
void reset_cost_counters()
{
G_sinCalls = 0;
G_lastSinT = 1e99;
G_lastSinV = 0;
}
// ---------------------------------------------------------------------------
// Program entry point
// ---------------------------------------------------------------------------
// 1) Build original and permuted RK methods.
// 2) Run the composed integrator once with original methods; record sin() count.
// 3) Run again with permuted-but-equivalent methods; record sin() count.
// 4) Print:
// - output equality (correctness / equivalence)
// - sin() calls saved (cost advantage)
function main()
{
buildAll();
var t0 = 0.0; // start time
var h0 = 0.25; // initial hidden state
var dt = 0.10; // step size used by each substep of the composition
// ---- Original composition (M2 then M1) ----
reset_cost_counters();
var outA = compose_step(M1_s, M1_A, M1_b, M1_c,
M2_s, M2_A, M2_b, M2_c,
t0, h0, dt);
int sinA = G_sinCalls;
// ---- Permuted composition (M2B then M1B) ----
reset_cost_counters();
var outB = compose_step(M1B_s, M1B_A, M1B_b, M1B_c,
M2B_s, M2B_A, M2B_b, M2B_c,
t0, h0, dt);
int sinB = G_sinCalls;
// ---- Report results: same output, fewer sin() calls ----
printf("\nNeural ODE: dh/dt = f_theta(t,h) with sin_cached(t) inside f_theta");
printf("\nStart: t=%.3f h=%.17g dt=%.3f", t0, h0, dt);
printf("\n\n(1) Correctness: permuting stages changes nothing in the final output");
printf("\n outA (original stages) = %.17g", outA);
printf("\n outB (permuted stages) = %.17g", outB);
printf("\n abs diff = %.3e", abs(outA - outB));
printf("\n\n(2) Advantage: permuting stages reduces repeated expensive sin(t) work");
printf("\n sin() calls original = %d", sinA);
printf("\n sin() calls permuted = %d", sinB);
printf("\n sin() calls saved = %d", (sinA - sinB));
printf("\n");
quit("done");
}
Last edited by TipmyPip; 3 hours ago.
|