FractalBandit Compass is a two layer learning strategy built for daily bar trading on EUR USD, combining adaptive parameter selection with supervised machine learning that learns from realized trade outcomes. The core idea is to let the strategy continuously reshape its own settings based on performance, while simultaneously learning when to trade using a stable set of market descriptors computed on two separate timeframes.

The first layer is a group of reinforcement learning bandit agents, one agent per strategy parameter. Each agent treats the possible settings of its parameter as a set of discrete choices, similar to selecting one option from a menu. Examples of these settings include the first timeframe choice, the offset that defines the second timeframe, window lengths for fractal dimension estimation, window lengths for slope measurement, window lengths for volatility estimation, and risk shaping controls such as leverage scaling, maximum leverage, prediction threshold, and holding time. When the system is flat and ready to begin a new trade episode, every agent selects an option using an exploration and exploitation routine. Most of the time it exploits the best known option for that parameter, but sometimes it explores alternative options to avoid getting stuck with a locally good choice that may stop working later. This makes the strategy flexible across changing regimes.

The second layer is the market signal engine, which produces a compact feature vector from two timeframes. On each timeframe, the strategy measures a Katz style fractal dimension estimate to describe how path like or trend like the recent price movement appears. It also measures a price slope over a configurable window to summarize direction and persistence, and it measures volatility using the standard deviation of log returns over a configurable window to summarize turbulence. These six values form the feature signature used by the machine learning model.

Machine learning is trained using a returns based mode so the model learns from the eventual success or failure of trades rather than from a single bar label. During training runs, the strategy forces alternating long and short entries when flat. This is not meant to be profitable by itself; it is meant to guarantee that the model sees a broad variety of examples in different market conditions and does not stall with no trades and no learning. During test and trade runs, the model produces long and short preferences from the same feature signature. If the model has not learned enough yet and outputs no meaningful preference, the strategy bootstraps a small trade using a simple directional hint from the combined slope, or a random choice when direction is unclear. This ensures the learning loop continues to generate outcomes.

The episode boundary is defined by the strategy returning to flat after having an open position. At that point, the reward is computed as the change in account balance since the start of the episode. That single reward is then used to update every bandit agent on the exact option it chose for the episode. Over time, parameter options that consistently lead to better episodes gain higher value estimates, and the agents increasingly favor them while still occasionally testing alternatives. The result is a self tuning strategy that adapts both its market interpretation settings and its execution behavior, aiming to remain robust as market structure shifts.

Code
// ============================================================================
// Fractal Learner - RL Bandits for Parameters + 2TF ML (RETURNS)
// File: Fractal_EURUSD_2TF_RL_Bandits_v1.c   (Zorro / lite-C)
//
// What changed vs optimize():
// - Replaced optimize() with epsilon-greedy multi-armed bandits (one agent per parameter).
// - When flat, agents pick new parameter "arms".
// - A trade is opened (Train: alternating long/short; Test/Trade: ML-driven + bootstrap).
// - When positions go flat again, reward = Balance delta, and all agents update Q for used arms.
// - Next episode starts with new parameter picks.
//
// lite-C safe:
// - No adjacent string literals in one file_append call (header uses multiple calls)
// - No inline/ternary helpers; uses function keyword
// - strf format string is ONE literal
// ============================================================================

#define NPAR 12
#define MAXARMS 64

#define EPSILON 0.10
#define ALPHA   0.10

#define P_INT 0
#define P_VAR 1

// Parameter indices (RL-controlled)
#define P_TF1        0
#define P_TF2D       1
#define P_FDLEN1     2
#define P_SLOPELEN1  3
#define P_VOLLEN1    4
#define P_FDLEN2     5
#define P_SLOPELEN2  6
#define P_VOLLEN2    7
#define P_LEVSCALE   8
#define P_MAXLEV     9
#define P_PREDTHR    10
#define P_HOLDBARS   11

// -----------------------------
// RL storage
// -----------------------------
string ParName[NPAR];

int ParType[NPAR] =
{
	P_INT,  // TF1
	P_INT,  // TF2d
	P_INT,  // FDLen1
	P_INT,  // SlopeLen1
	P_INT,  // VolLen1
	P_INT,  // FDLen2
	P_INT,  // SlopeLen2
	P_INT,  // VolLen2
	P_VAR,  // LevScale
	P_VAR,  // MaxLev
	P_VAR,  // PredThr
	P_INT   // HoldBars
};

var ParMin[NPAR];
var ParMax[NPAR];
var ParStep[NPAR];

var Q[NPAR][MAXARMS];
int Ncnt[NPAR][MAXARMS];
int ArmsCount[NPAR];
int CurArm[NPAR];

void initParNames()
{
	ParName[P_TF1]       = "TF1";
	ParName[P_TF2D]      = "TF2d";
	ParName[P_FDLEN1]    = "FDLen1";
	ParName[P_SLOPELEN1] = "SlopeLen1";
	ParName[P_VOLLEN1]   = "VolLen1";
	ParName[P_FDLEN2]    = "FDLen2";
	ParName[P_SLOPELEN2] = "SlopeLen2";
	ParName[P_VOLLEN2]   = "VolLen2";
	ParName[P_LEVSCALE]  = "LevScale";
	ParName[P_MAXLEV]    = "MaxLev";
	ParName[P_PREDTHR]   = "PredThr";
	ParName[P_HOLDBARS]  = "HoldBars";
}

int calcArms(var mn, var mx, var stp)
{
	if(stp <= 0) return 1;
	int n = (int)floor((mx - mn)/stp + 1.000001);
	if(n < 1) n = 1;
	if(n > MAXARMS) n = MAXARMS;
	return n;
}

var armValue(int p, int a)
{
	var v = ParMin[p] + (var)a * ParStep[p];
	if(v < ParMin[p]) v = ParMin[p];
	if(v > ParMax[p]) v = ParMax[p];
	if(ParType[p] == P_INT) v = (var)(int)(v + 0.5);
	return v;
}

int bestArm(int p)
{
	int a, best = 0;
	var bestQ = Q[p][0];
	for(a=1; a<ArmsCount[p]; a++)
	{
		if(Q[p][a] > bestQ)
		{
			bestQ = Q[p][a];
			best = a;
		}
	}
	return best;
}

int selectArm(int p)
{
	if(random(1) < EPSILON)
		return (int)random((var)ArmsCount[p]);
	return bestArm(p);
}

void updateArm(int p, int a, var reward)
{
	Q[p][a] = Q[p][a] + ALPHA*(reward - Q[p][a]);
	Ncnt[p][a] += 1;
}

void initParamsRL()
{
	// Ranges roughly matching your old optimize() ranges/steps
	ParMin[P_TF1]        = 1;    ParMax[P_TF1]        = 3;    ParStep[P_TF1]        = 1;
	ParMin[P_TF2D]       = 1;    ParMax[P_TF2D]       = 11;   ParStep[P_TF2D]       = 1;

	ParMin[P_FDLEN1]     = 20;   ParMax[P_FDLEN1]     = 220;  ParStep[P_FDLEN1]     = 5;
	ParMin[P_SLOPELEN1]  = 20;   ParMax[P_SLOPELEN1]  = 200;  ParStep[P_SLOPELEN1]  = 5;
	ParMin[P_VOLLEN1]    = 20;   ParMax[P_VOLLEN1]    = 200;  ParStep[P_VOLLEN1]    = 5;

	ParMin[P_FDLEN2]     = 10;   ParMax[P_FDLEN2]     = 160;  ParStep[P_FDLEN2]     = 5;
	ParMin[P_SLOPELEN2]  = 10;   ParMax[P_SLOPELEN2]  = 140;  ParStep[P_SLOPELEN2]  = 5;
	ParMin[P_VOLLEN2]    = 10;   ParMax[P_VOLLEN2]    = 140;  ParStep[P_VOLLEN2]    = 5;

	ParMin[P_LEVSCALE]   = 2;    ParMax[P_LEVSCALE]   = 30;   ParStep[P_LEVSCALE]   = 1;
	ParMin[P_MAXLEV]     = 0.1;  ParMax[P_MAXLEV]     = 1.0;  ParStep[P_MAXLEV]     = 0.1;
	ParMin[P_PREDTHR]    = 0.0;  ParMax[P_PREDTHR]    = 0.20; ParStep[P_PREDTHR]    = 0.01;
	ParMin[P_HOLDBARS]   = 1;    ParMax[P_HOLDBARS]   = 30;   ParStep[P_HOLDBARS]   = 1;

	int p, a;
	for(p=0; p<NPAR; p++)
	{
		ArmsCount[p] = calcArms(ParMin[p], ParMax[p], ParStep[p]);
		CurArm[p] = 0;
		for(a=0; a<ArmsCount[p]; a++)
		{
			Q[p][a] = 0;
			Ncnt[p][a] = 0;
		}
	}
}

void pickParams()
{
	int p;
	for(p=0; p<NPAR; p++)
		CurArm[p] = selectArm(p);
}

// -----------------------------
// Feature helpers (your originals, lite-C safe)
// -----------------------------

function fractalDimKatz(vars P, int N)
{
	if(N < 2) return 1.0;

	var L = 0;
	int i;
	for(i=0; i<N-1; i++)
		L += abs(P[i] - P[i+1]);

	var d = 0;
	for(i=1; i<N; i++)
	{
		var di = abs(P[i] - P[0]);
		if(di > d) d = di;
	}

	if(L <= 0 || d <= 0) return 1.0;

	var n  = (var)N;
	var fd = log(n) / (log(n) + log(d / L));
	return clamp(fd, 1.0, 2.0);
}

function linSlope(vars P, int N)
{
	if(N < 2) return 0;

	var sumT=0, sumP=0, sumTT=0, sumTP=0;
	int i;
	for(i=0; i<N; i++)
	{
		var t = (var)i;
		sumT  += t;
		sumP  += P[i];
		sumTT += t*t;
		sumTP += t*P[i];
	}

	var denom = (var)N*sumTT - sumT*sumT;
	if(abs(denom) < 1e-12) return 0;

	return ((var)N*sumTP - sumT*sumP) / denom;
}

function stdevReturns(vars R, int N)
{
	if(N < 2) return 0;

	var mean = 0;
	int i;
	for(i=0; i<N; i++) mean += R[i];
	mean /= (var)N;

	var v = 0;
	for(i=0; i<N; i++)
	{
		var d = R[i] - mean;
		v += d*d;
	}
	v /= (var)(N-1);

	return sqrt(max(0, v));
}

function run()
{
	// ------------------------------------------------------------------------
	// SESSION / DATA SETTINGS
	// ------------------------------------------------------------------------
	BarPeriod = 1440;
	StartDate = 20100101;
	EndDate   = 0;

	set(PLOTNOW|RULES|LOGFILE);

	asset("EUR/USD");
	algo("FRACTAL2TF_EUR_RL_v1");

	var eps = 1e-12;
	DataSplit = 50;

	// LookBack must cover the MAX possible TF * MAX window.
	// Max TF is 12, max window here is 220, plus padding.
	LookBack = 3000;

	// ------------------------------------------------------------------------
	// One-time init
	// ------------------------------------------------------------------------
	static int Inited = 0;
	static int PrevOpenTotal = 0;
	static var LastBalance = 0;
	static int Flip = 0; // for forced Train trades

	string LogFN = "Log\\FRACTAL2TF_EUR_RL_v1.csv";

	if(is(FIRSTINITRUN))
	{
		file_delete(LogFN);

		// lite-C safe header (multiple calls)
		file_append(LogFN,"Date,Time,Mode,Bar,");
		file_append(LogFN,"TF1,TF2,TF2d,FDLen1,SlopeLen1,VolLen1,FDLen2,SlopeLen2,VolLen2,");
		file_append(LogFN,"LevScale,MaxLev,PredThr,HoldBars,");
		file_append(LogFN,"FD1,Slope1,Vol1,FD2,Slope2,Vol2,");
		file_append(LogFN,"PredL,PredS,Pred,Lev,Reward\n");

		Inited = 0;
		PrevOpenTotal = 0;
		LastBalance = 0;
		Flip = 0;
	}

	if(!Inited)
	{
		initParNames();
		initParamsRL();
		pickParams();

		LastBalance = Balance;
		PrevOpenTotal = NumOpenTotal;

		Inited = 1;
	}

	// ------------------------------------------------------------------------
	// Convert chosen arms -> parameter values (current episode)
	// ------------------------------------------------------------------------
	int TF1 = (int)armValue(P_TF1, CurArm[P_TF1]);
	int TF2d = (int)armValue(P_TF2D, CurArm[P_TF2D]);
	int TF2 = TF1 + TF2d;
	if(TF2 > 12) TF2 = 12;

	int FDLen1    = (int)armValue(P_FDLEN1,    CurArm[P_FDLEN1]);
	int SlopeLen1 = (int)armValue(P_SLOPELEN1, CurArm[P_SLOPELEN1]);
	int VolLen1   = (int)armValue(P_VOLLEN1,   CurArm[P_VOLLEN1]);

	int FDLen2    = (int)armValue(P_FDLEN2,    CurArm[P_FDLEN2]);
	int SlopeLen2 = (int)armValue(P_SLOPELEN2, CurArm[P_SLOPELEN2]);
	int VolLen2   = (int)armValue(P_VOLLEN2,   CurArm[P_VOLLEN2]);

	var LevScale  = armValue(P_LEVSCALE, CurArm[P_LEVSCALE]);
	var MaxLev    = armValue(P_MAXLEV,   CurArm[P_MAXLEV]);
	var PredThr   = armValue(P_PREDTHR,  CurArm[P_PREDTHR]);
	int HoldBars  = (int)armValue(P_HOLDBARS, CurArm[P_HOLDBARS]);

	// ------------------------------------------------------------------------
	// Build series (2 TF)
	// ------------------------------------------------------------------------
	TimeFrame = TF1;
	vars P1 = series(priceClose());
	vars R1 = series(log(max(eps,P1[0]) / max(eps,P1[1])));

	vars FD1S    = series(0);
	vars Slope1S = series(0);
	vars Vol1S   = series(0);

	TimeFrame = TF2;
	vars P2 = series(priceClose());
	vars R2 = series(log(max(eps,P2[0]) / max(eps,P2[1])));

	vars FD2S    = series(0);
	vars Slope2S = series(0);
	vars Vol2S   = series(0);

	TimeFrame = 1;

	// Warmup gate based on current episode params
	int Need1 = max(max(FDLen1, SlopeLen1), VolLen1) + 5;
	int Need2 = max(max(FDLen2, SlopeLen2), VolLen2) + 5;
	int WarmupBars = max(TF1*Need1, TF2*Need2) + 10;

	if(Bar < WarmupBars)
		return;

	// Do NOT block TRAIN during LOOKBACK
	if(is(LOOKBACK) && !Train)
		return;

	// ------------------------------------------------------------------------
	// Compute features
	// ------------------------------------------------------------------------
	TimeFrame = TF1;
	FD1S[0]    = fractalDimKatz(P1, FDLen1);
	Slope1S[0] = linSlope(P1, SlopeLen1);
	Vol1S[0]   = stdevReturns(R1, VolLen1);

	TimeFrame = TF2;
	FD2S[0]    = fractalDimKatz(P2, FDLen2);
	Slope2S[0] = linSlope(P2, SlopeLen2);
	Vol2S[0]   = stdevReturns(R2, VolLen2);

	TimeFrame = 1;

	// Feature vector for ML
	var Sig[6];
	Sig[0] = FD1S[0];
	Sig[1] = Slope1S[0];
	Sig[2] = Vol1S[0];
	Sig[3] = FD2S[0];
	Sig[4] = Slope2S[0];
	Sig[5] = Vol2S[0];

	// ------------------------------------------------------------------------
	// Trading logic
	// ------------------------------------------------------------------------
	int MethodBase = PERCEPTRON + FUZZY + BALANCED;
	int MethodRet  = MethodBase + RETURNS;

	var PredL=0, PredS=0, Pred=0, Lev=0;

	// time-based exit
	if(NumOpenTotal > 0)
		for(open_trades)
			if(TradeIsOpen && TradeBars >= HoldBars)
				exitTrade(ThisTrade);

	if(Train)
	{
		// Forced alternating trades so ML always gets samples
		if(NumOpenTotal == 0)
		{
			Flip = 1 - Flip;
			LastBalance = Balance; // episode start for reward

			if(Flip)
			{
				adviseLong(MethodRet, 0, Sig, 6);
				Lots = 1; enterLong();
			}
			else
			{
				adviseShort(MethodRet, 0, Sig, 6);
				Lots = 1; enterShort();
			}
		}
	}
	else
	{
		PredL = adviseLong(MethodBase, 0, Sig, 6);
		PredS = adviseShort(MethodBase, 0, Sig, 6);

		// Bootstrap if model has no signal yet
		if(NumOpenTotal == 0 && PredL == 0 && PredS == 0)
		{
			LastBalance = Balance; // episode start for reward
			var s = Sig[1] + Sig[4];
			if(s > 0) { Lots=1; enterLong(); }
			else if(s < 0) { Lots=1; enterShort(); }
			else
			{
				if(random(1) < 0.5) { Lots=1; enterLong(); }
				else                { Lots=1; enterShort(); }
			}
		}
		else
		{
			Pred = PredL - PredS;
			Lev  = clamp(Pred * LevScale, -MaxLev, MaxLev);

			if(Lev > PredThr)       { exitShort(); Lots=1; enterLong();  }
			else if(Lev < -PredThr) { exitLong();  Lots=1; enterShort(); }
			else                    { exitLong();  exitShort(); }
		}
	}

	// ------------------------------------------------------------------------
	// RL reward + update (episode ends when we go from having positions to flat)
	// ------------------------------------------------------------------------
	var Reward = 0;

	if(PrevOpenTotal > 0 && NumOpenTotal == 0)
	{
		Reward = Balance - LastBalance;

		if(Reward != 0)
		{
			int p;
			for(p=0; p<NPAR; p++)
				updateArm(p, CurArm[p], Reward);
		}

		// Next episode parameters
		pickParams();
		LastBalance = Balance;
	}

	PrevOpenTotal = NumOpenTotal;

	// ------------------------------------------------------------------------
	// Logging (one-literal format string)
	// ------------------------------------------------------------------------
	string ModeStr = "Trade";
	if(Train) ModeStr = "Train";
	else if(Test) ModeStr = "Test";

	file_append(LogFN, strf("%04i-%02i-%02i,%02i:%02i,%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%.6f,%.3f,%.6f,%d,%.6f,%.8f,%.8f,%.6f,%.8f,%.8f,%.8f,%.8f,%.8f,%.6f,%.6f\n",
		year(0),month(0),day(0), hour(0),minute(0),
		ModeStr, Bar,
		TF1, TF2, TF2d,
		FDLen1, SlopeLen1, VolLen1,
		FDLen2, SlopeLen2, VolLen2,
		LevScale, MaxLev, PredThr, HoldBars,
		Sig[0], Sig[1], Sig[2], Sig[3], Sig[4], Sig[5],
		PredL, PredS, Pred, Lev, Reward
	));

	// ------------------------------------------------------------------------
	// Plots
	// ------------------------------------------------------------------------
	plot("FD_TF1",    Sig[0], NEW, 0);
	plot("FD_TF2",    Sig[3], 0, 0);
	plot("Slope_TF1", Sig[1], 0, 0);
	plot("Slope_TF2", Sig[4], 0, 0);
	plot("Vol_TF1",   Sig[2], 0, 0);
	plot("Vol_TF2",   Sig[5], 0, 0);
	plot("Pred",      Pred, 0, 0);
	plot("Lev",       Lev, 0, 0);
	plot("Reward",    Reward, 0, 0);
}