Gamestudio Links
Zorro Links
Newest Posts
ZorroGPT
by TipmyPip. 02/26/26 21:35
WFO Training with parallel cores Zorro64
by Martin_HH. 02/26/26 16:03
Zorro version 3.0 prerelease!
by TipmyPip. 02/25/26 16:38
Camera always moves upwards?
by clonman. 02/21/26 09:29
Sam Foster Sound | Experienced Game Composer for Hire
by titanicpiano14. 02/19/26 13:22
AUM Magazine
Latest Screens
Dorifto samurai
Shadow 2
Rocker`s Revenge
Stug 3 Stormartillery
Who's Online Now
3 registered members (TipmyPip, Martin_HH, JMMAC), 5,487 guests, and 2 spiders.
Key: Admin, Global Mod, Mod
Newest Members
the1, alx, ApprenticeInMuc, PatrickH90, USER0328
19200 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
Page 19 of 20 1 2 17 18 19 20
PrismWeave Regime Switcher v9 (RL) [Re: TipmyPip] #489263
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
PrismWeave Regime Switcher is a multi pair portfolio engine that watches a whole currency basket as if it were a single organism, then repeatedly reshapes its attention toward the pairs that look structurally strongest under the current market mood. Each pair produces a small bundle of features that act like descriptive “facets” of that pair, including short and medium returns, volatility, a price deviation measure, a range pressure measure, a flow like activity proxy, a coarse regime flag, a volatility of volatility proxy, and a persistence proxy. These facets are stored in a compact ring buffer that is organized for fast access across features, assets, and time.

At regular update intervals the engine builds a pair to pair relationship map by measuring similarity across the feature history of every pair combination. This step is heavy, so the strategy can optionally offload it to a graphics or compute device using an OpenCL kernel. If OpenCL is not available or fails at runtime, the engine automatically falls back to a full CPU path without changing behavior. The resulting relationship map is blended with a separate exposure distance table, allowing the model to consider both statistical similarity and structural currency overlap.

The blended distances are then transformed into a connected network view through an all pairs shortest path pass. From that network view the strategy derives a compactness score per pair, which acts as a proxy for how well each pair sits inside the current market geometry. A higher compactness implies a pair is well integrated through short relational paths, while a lower compactness implies isolation or conflict with the broader structure. A simple volatility based detector assigns a coarse market regime label that is used as an additional context signal.

A controller layer then monitors portfolio level summaries and adapts behavior. It combines an unsupervised classifier, a reinforcement style action selector, a principal component style tracker, and optional mixture and hidden state regime models. These modules adjust risk scale, selection breadth, and coefficient balance, and they impose cool down behavior when the regime looks unstable or uncertain. Finally the engine converts compactness, coupling pressure from neighbors, and regime context into bounded per pair scores, selects a diversified top set using a lightweight clustering guard, and reports the chosen leaders as the current focus list for trading execution.

Code
// TGr06C_RegimeSwitcher_v9.cpp - Zorro64 Strategy DLL
// Strategy C v9: Regime-Switching with MX06 OOP + OpenCL + Learning Controller
//
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 4
#define TOP_K 5

#define ALPHA 0.5
#define BETA 0.2
#define GAMMA 2.0
#define LAMBDA_META 0.6

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 2
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;

    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;

    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class RegimeSwitcherStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> regime;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  int currentRegime;
  StrategyController controller;
  SpectralClusterModel spectral;

  RegimeSwitcherStrategy() : barCount(0), updateCount(0), currentRegime(0) {}

  void init() {
    printf("RegimeSwitcher_v9: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    regime.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("RegimeSwitcher_v9: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    spectral.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("RegimeSwitcher_v9: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    regime.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);

    fvar reg = 0;
    if(vol > 0.001) reg = (fvar)1.0;
    else reg = (fvar)0.0;

    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, reg);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  fvar detectRegime() {
    fvar v = 0;
    for(int i=0;i<N_ASSETS;i++) v += featSoA.get(2, i, 0);
    v /= (fvar)N_ASSETS;

    if(v > (fvar)0.0015) currentRegime = 2;
    else if(v > (fvar)0.0008) currentRegime = 1;
    else currentRegime = 0;

    return (fvar)currentRegime;
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);

          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;

      regime[i] = detectRegime();
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)ALPHA * regime[i] + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = currentRegime;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_SPECTRAL
      spectral.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===RegimeSwitcher_v9 Top-K(update#%d,Reg=%d,OpenCL=%d)===\n",
        updateCount, currentRegime, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_SPECTRAL
      int usedCluster[SPECTRAL_K];
      for(int c=0;c<SPECTRAL_K;c++) usedCluster[c] = 0;
      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = spectral.clusterId[idx];
        if(cid < 0 || cid >= SPECTRAL_K) cid = 0;
        if(usedCluster[cid]) continue;
        usedCluster[cid] = 1;
        selected[selCount++] = idx;
      }
      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(!dup) selected[selCount++] = idx;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static RegimeSwitcherStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new RegimeSwitcherStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

VolNet Sentinel Nine v9 (RL) [Re: TipmyPip] #489264
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
VolNet Sentinel Nine is a portfolio selector and risk adjuster that treats a basket of currency pairs as a living network whose shape changes with market conditions. It builds a consistent set of nine behavior signals for every pair, capturing short movement, longer movement, volatility level, price deviation, range pressure, activity flow, a simple regime flag, volatility of volatility, and persistence. These signals are stored in a compact ring buffer so the strategy always works with a fixed recent history and predictable memory use.

At regular update steps, the strategy measures how similar pairs are by comparing their nine signal histories. It does this by producing a full relationship matrix that reflects how pairs co move across the entire feature set, not just price return. Because this step is expensive, the strategy can accelerate it with an optional OpenCL backend. If OpenCL is not available, fails to load, finds no device, or the kernel fails, the system automatically falls back to a full CPU path without changing behavior. This makes the engine portable while still able to scale when a compatible device exists.

Once pair similarities are estimated, the strategy converts them into a distance view of the market and blends that distance with a currency exposure distance table. This blend pushes the model to respect both statistical co movement and structural currency overlap. It then performs a shortest path pass across the whole basket, allowing indirect links to matter. From those shortest paths it derives a compactness score for each pair, describing whether that pair sits inside a tight, strongly connected neighborhood or a more isolated, loosely connected region.

Scores are then created by mixing three ideas: the pair’s own compactness, its current volatility, and a crowding penalty that reflects how tightly the surrounding neighborhood is compacted. High compactness supports selection, high volatility can either boost opportunity or reflect risk depending on tuning, and crowding reduces preference for overly redundant choices. The result is a bounded score for every pair, used to rank the universe.

A learning controller watches the system level averages and adapts the strategy profile. It combines lightweight unsupervised clustering, reinforcement style action selection, principal component tracking, and regime models that estimate stability. When uncertainty rises or regime switching accelerates, it reduces risk and reduces how many pairs are selected. A spectral style clustering step encourages diversity by avoiding choosing too many pairs from the same similarity cluster. The strategy prints a curated top list at intervals, acting as a volatility aware, network aware, self tuning selector for portfolio deployment.

Code
// TGr06D_VolAdjuster_v9.cpp - Zorro64 Strategy DLL
// Strategy D v9: Volatility-Adjusted with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 4.0
#define LAMBDA_META 0.6

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 3
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class VolAdjusterStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> volatility;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  SpectralClusterModel spectral;

  VolAdjusterStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("VolAdjuster_v9: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    volatility.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("VolAdjuster_v9: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    spectral.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("VolAdjuster_v9: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    volatility.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      volatility[i] = featSoA.get(2, i, 0);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)ALPHA * volatility[i] + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_SPECTRAL
      spectral.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===VolAdjuster_v9 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_SPECTRAL
      int usedCluster[SPECTRAL_K];
      for(int c=0;c<SPECTRAL_K;c++) usedCluster[c] = 0;
      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = spectral.clusterId[idx];
        if(cid < 0 || cid >= SPECTRAL_K) cid = 0;
        if(usedCluster[cid]) continue;
        usedCluster[cid] = 1;
        selected[selCount++] = idx;
      }
      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(!dup) selected[selCount++] = idx;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f, V=%.6f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx], (double)volatility[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static VolAdjusterStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new VolAdjusterStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

PrismWeave Momentum Atlas v9 (RL) [Re: TipmyPip] #489265
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
PrismWeave Momentum Atlas is a multi layer portfolio selector that treats a basket of currency pairs as a living map whose structure changes over time. It builds a nine channel feature stream for each pair and uses those streams to measure how pairs move together, how tightly the basket clusters, and which pairs carry the strongest forward tilt. The engine is designed for speed and robustness: it can accelerate the heaviest step with optional OpenCL, but it always keeps a full CPU fallback so the strategy remains operational on any machine.

Each update cycle begins by collecting recent price history per asset and transforming it into a compact feature set that represents short return, medium return, volatility, price deviation, range pressure, flow proxy, regime flag, volatility of volatility, and persistence. These features are stored in a structure of arrays ring buffer for cache friendly access and predictable memory use. When it is time to refresh the portfolio map, the strategy computes a pairwise similarity table across assets by averaging correlations computed independently for each feature channel. If OpenCL is available, the correlations are computed on the accelerator in floating precision and returned to the host; otherwise the CPU performs the same calculation.

The similarity table is converted into a distance table that blends statistical closeness with an exposure based separation term, allowing the system to respect both behavior and structural overlap. A shortest path pass then turns local distances into global reachability, producing a compactness signal per asset that summarizes how centrally or cleanly that asset sits within the current basket geometry. In parallel, a momentum measure is taken from the medium horizon return channel.

Scores are then formed from three forces: a momentum bias, a compactness support term, and a coupling penalty that discourages crowded exposure to highly connected neighbors. A controller layer observes the basket level snapshot of score, compactness, and volatility, and uses multiple lightweight learners to infer stability and regime. It adapts internal scaling, adjusts how many assets to select, and applies a cooldown when confidence deteriorates. A spectral style clustering surrogate can enforce diversity by preventing the top list from collapsing into one cluster. The result is a fast adaptive ranking engine that continuously rebalances attention toward strong and structurally healthy pairs while throttling risk when the map becomes uncertain.

Code
// TGr06E_MomentumBias_v9.cpp - Zorro64 Strategy DLL
// Strategy E v9: Momentum-Biased with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 3.5
#define LAMBDA_META 0.7

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 4
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class MomentumBiasStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> momentum;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  SpectralClusterModel spectral;

  MomentumBiasStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("MomentumBias_v9: Initializing...\n");
    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);
    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    momentum.init(N_ASSETS);
    scores.init(N_ASSETS);
    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);
    openCL.init();
    printf("MomentumBias_v9: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    spectral.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("MomentumBias_v9: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    momentum.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      momentum[i] = featSoA.get(1, i, 0);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)GAMMA * momentum[i] + (fvar)ALPHA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_SPECTRAL
      spectral.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===MomentumBias_v9 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_SPECTRAL
      int usedCluster[SPECTRAL_K];
      for(int c=0;c<SPECTRAL_K;c++) usedCluster[c] = 0;
      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = spectral.clusterId[idx];
        if(cid < 0 || cid >= SPECTRAL_K) cid = 0;
        if(usedCluster[cid]) continue;
        usedCluster[cid] = 1;
        selected[selCount++] = idx;
      }
      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(!dup) selected[selCount++] = idx;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, M=%.4f, C=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)momentum[idx], (double)compactness[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static MomentumBiasStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new MomentumBiasStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

CompactCrown Navigator v10 (RL) [Re: TipmyPip] #489266
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
CompactCrown Navigator is a portfolio selection engine that treats a basket of currency pairs as a living network and then ranks the pairs by how cleanly they fit into the network’s current shape. Each update cycle, it builds a dense feature stream for every pair, stores the streams in a compact ring buffer, and measures how similarly the pairs behave across multiple aspects such as short return, longer return, volatility, relative price stretch, range pressure, flow proxy, regime flag, volatility of volatility, and persistence. These feature streams form the raw material for a correlation fabric that links every pair to every other pair.

The strategy’s central task is to compute that correlation fabric frequently without becoming too slow. It therefore contains two execution paths: a CPU path that always works, and an optional OpenCL path that attempts to offload the heaviest step to the GPU. The OpenCL backend is loaded dynamically, selects a device if available, builds a small kernel, and runs a pairwise correlation sweep in parallel. If anything fails, the engine falls back to the CPU computation automatically. This makes performance adaptive while keeping robustness.

Once the correlation fabric exists, the strategy transforms it into a distance fabric, blending correlation distance with a second distance based on currency exposure relationships. It then runs a shortest path pass so that indirect links between pairs can reduce effective separation. From the resulting distances it derives a compactness value for each pair, meaning how tightly that pair sits inside the overall network when all routes are considered.

Scores are then produced by combining three influences: a local regime proxy, the pair’s compactness, and a penalty based on how crowded its neighborhood is. After scoring, a controller layer monitors summary statistics of the system and adjusts behavior. It uses a mix of clustering, dimensional compression, mixture modeling, hidden state filtering, and a simple reinforcement style selector to tune scaling, control risk, and change how many pairs are selected. When uncertainty rises or the system signals frequent state switching, the controller reduces risk and shrinks selection breadth. Finally, a diversification gate based on hierarchical clustering prevents the top list from concentrating in one cluster, keeping the chosen set spread across different structural groups.

Code
// TGr06A_CompactDominant_v10.cpp - Zorro64 Strategy DLL
// Strategy A v10: Compactness-Dominant with MX06 OOP + OpenCL + Learning Controller
//
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 3.0
#define LAMBDA_META 0.7

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 0
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3];
  int counts[3];
  int initialized;

  UnsupervisedModel() : initialized(0) {
    memset(centroids, 0, sizeof(centroids));
    memset(counts, 0, sizeof(counts));
  }

  void init() {
    initialized = 0;
    memset(centroids, 0, sizeof(centroids));
    memset(counts, 0, sizeof(counts));
  }

  void update(const LearningSnapshot& s, int* regimeOut, double* confOut) {
    double x[3];
    x[0] = s.meanScore;
    x[1] = s.meanCompactness;
    x[2] = s.meanVol;

    if(!initialized) {
      for(int k=0; k<3; k++) {
        centroids[k][0] = x[0] + 0.01 * (k - 1);
        centroids[k][1] = x[1] + 0.01 * (1 - k);
        centroids[k][2] = x[2] + 0.005 * (k - 1);
        counts[k] = 1;
      }
      initialized = 1;
    }

    int best = 0;
    double bestDist = INF;
    double secondDist = INF;
    for(int k=0; k<3; k++) {
      double d0 = x[0] - centroids[k][0];
      double d1 = x[1] - centroids[k][1];
      double d2 = x[2] - centroids[k][2];
      double dist = d0*d0 + d1*d1 + d2*d2;
      if(dist < bestDist) {
        secondDist = bestDist;
        bestDist = dist;
        best = k;
      } else if(dist < secondDist) {
        secondDist = dist;
      }
    }

    counts[best]++;
    double lr = 1.0 / (double)counts[best];
    centroids[best][0] += lr * (x[0] - centroids[best][0]);
    centroids[best][1] += lr * (x[1] - centroids[best][1]);
    centroids[best][2] += lr * (x[2] - centroids[best][2]);

    *regimeOut = best;
    *confOut = 1.0 / (1.0 + sqrt(fabs(secondDist - bestDist) + EPS));
  }
};

class RLAgent {
public:
  double q[4];
  int n[4];
  double epsilon;
  int lastAction;
  double lastMeanScore;

  RLAgent() : epsilon(0.10), lastAction(0), lastMeanScore(0) {
    for(int i=0;i<4;i++){ q[i]=0; n[i]=0; }
  }

  void init() {
    epsilon = 0.10;
    lastAction = 0;
    lastMeanScore = 0;
    for(int i=0;i<4;i++){ q[i]=0; n[i]=0; }
  }

  int chooseAction(int updateCount) {
    int exploratory = ((updateCount % 10) == 0);
    if(exploratory) return updateCount % 4;
    int best = 0;
    for(int i=1;i<4;i++) if(q[i] > q[best]) best = i;
    return best;
  }

  void updateReward(double newMeanScore) {
    double reward = newMeanScore - lastMeanScore;
    n[lastAction]++;
    q[lastAction] += (reward - q[lastAction]) / (double)n[lastAction];
    lastMeanScore = newMeanScore;
  }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class CompactDominantStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;

  CompactDominantStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("CompactDominant_v10: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("CompactDominant_v10: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("CompactDominant_v10: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar regime = featSoA.get(6, i, 0);
      fvar rawScore = (fvar)ALPHA * regime + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0;
    s.meanCompactness = 0;
    s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_HCLUST
      hclust.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===CompactDominant_v10 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_HCLUST
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = hclust.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static CompactDominantStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new CompactDominantStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

CrowdAverse Nexus v10 (RL) [Re: TipmyPip] #489267
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
CrowdAverse Nexus is a portfolio selection engine that tries to avoid crowded trades by reading the market as a living network and then choosing only the most distinctive opportunities. It watches a fixed universe of currency pairs and converts each pair into a stream of features that describe how it behaves right now and how it has behaved recently. Those features include short and longer returns, volatility, price surprise, range pressure, flow proxy, a simple regime flag, volatility of volatility, and persistence. All feature values are stored in a fast ring buffer layout designed for low overhead and predictable memory use.

At regular update intervals the strategy builds a connectivity picture across the whole universe. It measures similarity between pairs by computing pairwise correlations, averaged across all feature channels. This correlation step is the heavy part, so the code optionally accelerates it with an OpenCL kernel. If OpenCL is unavailable or fails at runtime, the strategy automatically falls back to a full CPU implementation without changing behavior. The correlation map is then converted into a distance map, and this distance is blended with an exposure distance table so that similarity is not only about price behavior but also about shared currency exposure. This creates a crowd map that reflects both statistical sameness and structural overlap.

The strategy then runs a shortest path pass over the distance map to capture indirect relationships. From those shortest paths it derives a compactness score per pair, representing how centrally connected that pair is within the network. It also computes a simple entropy proxy from recent return variability. Each pair receives a score that rewards internal compactness and controlled randomness while penalizing crowd proximity, meaning it prefers pairs that look structured yet not too entangled with the rest.

A learning controller sits above the scoring layer. It summarizes the whole universe into a snapshot and uses multiple unsupervised models, a reinforcement style selector, and a lightweight dimensional reduction lens to adapt risk posture, adjust weightings, and even reduce how many pairs are selected when uncertainty rises. Clustering adds a diversity constraint so the final selection is spread across different groups rather than concentrated in one theme. The result is a self regulating selector that seeks clear, less crowded opportunities and tightens participation when the market becomes unstable or overly synchronized.

Code
// TGr06B_CrowdAverse_v10.cpp - Zorro64 Strategy DLL
// Strategy B v10: Crowd-Averse with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.3
#define GAMMA 2.5
#define LAMBDA_META 0.5

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 1
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3];
  int counts[3];
  int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids, 0, sizeof(centroids)); memset(counts, 0, sizeof(counts)); }
  void init() { initialized = 0; memset(centroids, 0, sizeof(centroids)); memset(counts, 0, sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut) {
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized) {
      for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; }
      initialized = 1;
    }
    int best=0; double bestDist=INF, secondDist=INF;
    for(int k=0;k<3;k++) {
      double d0=x0-centroids[k][0], d1=x1-centroids[k][1], d2=x2-centroids[k][2];
      double dist=d0*d0+d1*d1+d2*d2;
      if(dist < bestDist){ secondDist=bestDist; bestDist=dist; best=k; }
      else if(dist < secondDist){ secondDist=dist; }
    }
    counts[best]++;
    double lr = 1.0/(double)counts[best];
    centroids[best][0] += lr*(x0-centroids[best][0]);
    centroids[best][1] += lr*(x1-centroids[best][1]);
    centroids[best][2] += lr*(x2-centroids[best][2]);
    *regimeOut = best;
    *confOut = 1.0/(1.0 + sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class CrowdAverseStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> entropy;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;

  CrowdAverseStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("CrowdAverse_v10: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    entropy.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("CrowdAverse_v10: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("CrowdAverse_v10: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    entropy.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  fvar computeEntropy(int assetIdx) {
    fvar mean = 0;
    for(int t=0; t<FEAT_WINDOW; t++) mean += featSoA.get(0, assetIdx, t);
    mean /= FEAT_WINDOW;
    fvar var = 0;
    for(int t=0; t<FEAT_WINDOW; t++) { fvar d = featSoA.get(0, assetIdx, t) - mean; var += d*d; }
    return (fvar)(var / FEAT_WINDOW);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      entropy[i] = computeEntropy(i);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar C_A = compactness[i];
      fvar Ent = entropy[i];

      fvar rawScore = (fvar)ALPHA * Ent + (fvar)GAMMA * C_A - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_HCLUST
      hclust.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===CrowdAverse_v10 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_HCLUST
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = hclust.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f, Ent=%.6f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx], (double)entropy[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static CrowdAverseStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new CrowdAverseStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

PrismSwitch Nexus v10 (RL) [Re: TipmyPip] #489268
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
PrismSwitch Nexus is a regime switching portfolio engine that treats a basket of currency pairs as a living network and then continuously rebalances attention toward the most coherent opportunities. It runs as an object oriented Zorro DLL and is built around a fast update loop that gathers multiple aspects of each pair, compares them across the entire universe, and converts those comparisons into ranked trade candidates. Each pair produces a compact feature signature made from short and longer return behavior, volatility, standardized price deviation, range tendency, flow like activity, a simple regime flag, volatility of volatility, and persistence. These features are stored in a structured ring buffer so the system can quickly retrieve recent history for every pair and every feature without heavy allocation.

At regular update intervals the engine builds a relationship map between pairs by measuring how similarly their feature histories move. This is the most expensive step, so the system optionally uses an OpenCL backend that loads dynamically and accelerates the pairwise correlation work on available hardware. If OpenCL is missing or fails at any stage, it cleanly falls back to a pure CPU implementation with identical intent. The correlation map is then converted into a distance map, blended with a separate exposure distance map so that structural similarity and portfolio overlap are both considered. The distance map is transformed through a shortest path pass, allowing indirect relationships to shape the final connectivity view. From this, the engine derives a compactness score for each pair, which acts as a measure of how tightly connected that pair is to the rest of the market network.

A learning controller sits above the scoring layer and continuously interprets the global state of the system using multiple complementary lenses. It forms snapshots of average score, average compactness, and average volatility and feeds them into online unsupervised grouping, reinforcement style action selection, a compact latent representation, and probabilistic regime models with confidence and uncertainty tracking. When uncertainty rises or the environment appears to be changing, the controller reduces risk and enforces a cooldown. It also adapts the internal weights that govern how much the engine rewards regime alignment, rewards compactness, and penalizes crowding, and it adjusts how many pairs are allowed into the final shortlist.

Finally, the engine prints a filtered top list, optionally constrained by hierarchical clustering so that selections are diversified across coarse and fine clusters rather than concentrated in one tight group. The result is a self adjusting selector that blends fast network structure, regime awareness, and adaptive control into a single coherent pipeline.

Code
// TGr06C_RegimeSwitcher_v10.cpp - Zorro64 Strategy DLL
// Strategy C v10: Regime-Switching with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 4
#define TOP_K 5

#define ALPHA 0.5
#define BETA 0.2
#define GAMMA 2.0
#define LAMBDA_META 0.6

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 2
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;

    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;

    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class RegimeSwitcherStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> regime;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  int currentRegime;
  StrategyController controller;
  HierarchicalClusteringModel hclust;

  RegimeSwitcherStrategy() : barCount(0), updateCount(0), currentRegime(0) {}

  void init() {
    printf("RegimeSwitcher_v10: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    regime.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("RegimeSwitcher_v10: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("RegimeSwitcher_v10: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    regime.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);

    fvar reg = 0;
    if(vol > 0.001) reg = (fvar)1.0;
    else reg = (fvar)0.0;

    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, reg);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  fvar detectRegime() {
    fvar v = 0;
    for(int i=0;i<N_ASSETS;i++) v += featSoA.get(2, i, 0);
    v /= (fvar)N_ASSETS;

    if(v > (fvar)0.0015) currentRegime = 2;
    else if(v > (fvar)0.0008) currentRegime = 1;
    else currentRegime = 0;

    return (fvar)currentRegime;
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);

          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;

      regime[i] = detectRegime();
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)ALPHA * regime[i] + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = currentRegime;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_HCLUST
      hclust.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===RegimeSwitcher_v10 Top-K(update#%d,Reg=%d,OpenCL=%d)===\n",
        updateCount, currentRegime, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_HCLUST
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = hclust.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static RegimeSwitcherStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new RegimeSwitcherStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

Last edited by TipmyPip; 5 hours ago.
VolMesh Sentinel v10 (RL) [Re: TipmyPip] #489269
5 hours ago
5 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
VolMesh Sentinel is a multi asset selection and risk shaping engine built for a basket of currency pairs. It watches each pair through a shared feature lens, then turns those feature streams into a living network that describes which pairs are behaving alike and which pairs are behaving independently. The strategy maintains a compact rolling memory for nine aspects per pair, such as short return, longer return, volatility, price deviation, range pressure, flow proxy, a simple regime flag, volatility of volatility, and persistence. These aspects are stored in a tightly packed ring buffer designed for speed and predictable memory usage.

At regular update intervals the engine computes a full relationship map between all pairs. The relationship is not based on a single return correlation, but on an average similarity across all nine aspects. This creates a richer measure of likeness that captures more than direction, such as shared volatility and shared persistence. The heavy relationship step can run on the CPU or be offloaded to a graphics device through an optional OpenCL backend. If OpenCL is unavailable or fails at runtime, the system automatically falls back to the CPU path without breaking the run.

The relationship map is blended with an exposure distance table so the network reflects both statistical similarity and structural currency overlap. The blended distance map is then transformed into shortest path distances, allowing indirect links through other pairs to influence the perceived closeness. From these distances the strategy derives a compactness score per pair, reflecting how coherently that pair sits inside the evolving network. Pairs that are locally well connected but not overly crowded can earn higher standing.

Scores are then built from three forces: the pair’s current volatility, its compactness, and a coupling penalty computed from its neighbors. A learning controller sits above the scorer and acts like an adaptive governor. It uses unsupervised clustering, a reinforcement style action selector, and a compact dimensionality reducer to infer market regimes and adjust selection breadth and score scaling. Additional regime models can refine confidence and trigger cooldown when uncertainty rises.

Finally, the engine prints a dynamically sized top list, optionally enforcing diversity using hierarchical clustering so selections do not collapse into one crowded theme. It is an adaptive filter that aims to choose tradable leaders while reducing concentration risk as the market network tightens.

Code
// TGr06D_VolAdjuster_v10.cpp - Zorro64 Strategy DLL
// Strategy D v10: Volatility-Adjusted with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 4.0
#define LAMBDA_META 0.6

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 3
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class VolAdjusterStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> volatility;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;

  VolAdjusterStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("VolAdjuster_v10: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    volatility.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("VolAdjuster_v10: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("VolAdjuster_v10: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    volatility.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      volatility[i] = featSoA.get(2, i, 0);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)ALPHA * volatility[i] + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_HCLUST
      hclust.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===VolAdjuster_v10 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_HCLUST
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = hclust.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f, V=%.6f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx], (double)volatility[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static VolAdjusterStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new VolAdjusterStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

Momentum Weave Nexus v10 (RL) [Re: TipmyPip] #489270
4 hours ago
4 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
Momentum Weave Nexus is a portfolio level selection and risk shaping engine designed for a basket of currency pairs, built as an object oriented system with optional acceleration. At every bar it computes a compact set of feature streams for each pair, including short return, longer return, volatility, activity, and persistence style signals. These features are stored in a memory efficient ring buffer that keeps a fixed history window for all pairs and all features, enabling fast access and consistent timing across the basket.

On scheduled updates the strategy builds a relationship map between every pair and every other pair. The map is not based on a single indicator, but on an averaged similarity across all feature channels, so that the connections reflect a broader behavioral match rather than a single return correlation. The most expensive part of this step is the pairwise similarity calculation, so the engine can optionally offload it to a GPU through a dynamically loaded OpenCL backend. If OpenCL is missing or fails, it automatically falls back to a CPU path with the same logic.

The strategy then blends this similarity map with a separate exposure distance concept, creating a combined distance matrix that reflects both market co movement and structural overlap. With this blended distance matrix it runs a shortest path pass, turning direct distances into effective network distances. From those distances it derives a compactness measure for each pair, describing how tightly that pair sits inside the basket network. In parallel it extracts a momentum bias signal from the longer return feature.

Scores are then produced for every pair by combining momentum bias, compactness support, and a crowding penalty based on the compactness of neighboring pairs. A learning controller sits above this scoring layer. It summarizes the basket into a snapshot and uses multiple lightweight learning modules to estimate regime, confidence, stability, and rotation. Based on that regime view it adapts internal scaling, reduces risk when the environment looks uncertain, and dynamically adjusts how many pairs can be selected. Finally, it prints a curated top list, optionally enforcing diversity through hierarchical clustering so the chosen set is not concentrated in one cluster.

Code
// TGr06E_MomentumBias_v10.cpp - Zorro64 Strategy DLL
// Strategy E v10: Momentum-Biased with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 3.5
#define LAMBDA_META 0.7

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 4
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3]; int counts[3]; int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void init(){ initialized=0; memset(centroids,0,sizeof(centroids)); memset(counts,0,sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut){
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized){ for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; } initialized=1; }
    int best=0; double bestDist=INF,secondDist=INF;
    for(int k=0;k<3;k++){ double d0=x0-centroids[k][0],d1=x1-centroids[k][1],d2=x2-centroids[k][2]; double dist=d0*d0+d1*d1+d2*d2; if(dist<bestDist){ secondDist=bestDist; bestDist=dist; best=k; } else if(dist<secondDist) secondDist=dist; }
    counts[best]++; double lr=1.0/(double)counts[best]; centroids[best][0]+=lr*(x0-centroids[best][0]); centroids[best][1]+=lr*(x1-centroids[best][1]); centroids[best][2]+=lr*(x2-centroids[best][2]);
    *regimeOut=best; *confOut=1.0/(1.0+sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class MomentumBiasStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> momentum;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;

  MomentumBiasStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("MomentumBias_v10: Initializing...\n");
    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);
    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    momentum.init(N_ASSETS);
    scores.init(N_ASSETS);
    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);
    openCL.init();
    printf("MomentumBias_v10: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("MomentumBias_v10: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    momentum.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      momentum[i] = featSoA.get(1, i, 0);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar rawScore = (fvar)GAMMA * momentum[i] + (fvar)ALPHA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_HCLUST
      hclust.update(distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===MomentumBias_v10 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_HCLUST
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = hclust.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = hclust.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, M=%.4f, C=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)momentum[idx], (double)compactness[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static MomentumBiasStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new MomentumBiasStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

CompactClaw Nexus v11 (RL) [Re: TipmyPip] #489271
4 hours ago
4 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
CompactClaw Nexus is a multi pair selection engine built to hunt for orderly structure in a noisy currency universe while staying fast enough for repeated updates. It watches a fixed basket of pairs and turns each pair into a living stream of nine behavioral signals. Those signals summarize short movement, slower movement, volatility, normalized deviation, range pressure, activity flow, a simple regime flag, volatility of volatility, and persistence. The signals are stored in a ring buffer layout designed for speed and predictable memory use, so the strategy can scale across many assets without constant allocation overhead.

At each update cycle the engine measures how similarly pairs behave, not from one signal but from all nine signals blended together. This similarity step is the heaviest part of the process, so the system can optionally push the pairwise correlation workload to an OpenCL kernel. If OpenCL is missing or fails, it falls back to a pure CPU path with identical intent. The resulting similarity field is converted into a distance field that represents separation between pairs. That distance is blended with an exposure based separation table so the model respects currency overlap and portfolio concentration risk rather than correlation alone.

Once distances are built, the strategy runs a shortest path pass to reveal indirect structure, allowing a pair to be linked through intermediate neighbors. From those shortest paths it extracts a compactness value per pair that acts like a cleanliness score. A clean pair is one that sits inside a coherent neighborhood rather than a fractured web. Scores are then produced by combining the pair’s compactness, the pair’s local regime flag, and a penalty derived from how strongly it is surrounded by other compact pairs. This encourages selecting strong structure without piling into the same crowd.

A learning controller sits above the scoring layer. It monitors average score, average compactness, and average volatility, then adapts the scoring weights and risk scale using several lightweight regime detectors and a simple reinforcement loop. It also varies how many pairs are selected, tightening the shortlist when the market looks unstable and relaxing it when the structure appears stable. Finally, a community and hierarchical grouping module can enforce diversity so the top selections are not all from the same cluster. The output is a periodically refreshed top list that favors compact structure, controlled exposure, and adaptive caution.

Code
// TGr06A_CompactDominant_v11.cpp - Zorro64 Strategy DLL
// Strategy A v11: Compactness-Dominant with MX06 OOP + OpenCL + Learning Controller
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.2
#define GAMMA 3.0
#define LAMBDA_META 0.7

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define USE_COMMUNITY 1
#define COMM_W_MIN 0.15
#define COMM_TOPM 6
#define COMM_ITERS 4
#define COMM_Q_EMA 0.20
#define COMM_Q_LOW 0.20
#define COMM_Q_HIGH 0.45
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 0
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3];
  int counts[3];
  int initialized;

  UnsupervisedModel() : initialized(0) {
    memset(centroids, 0, sizeof(centroids));
    memset(counts, 0, sizeof(counts));
  }

  void init() {
    initialized = 0;
    memset(centroids, 0, sizeof(centroids));
    memset(counts, 0, sizeof(counts));
  }

  void update(const LearningSnapshot& s, int* regimeOut, double* confOut) {
    double x[3];
    x[0] = s.meanScore;
    x[1] = s.meanCompactness;
    x[2] = s.meanVol;

    if(!initialized) {
      for(int k=0; k<3; k++) {
        centroids[k][0] = x[0] + 0.01 * (k - 1);
        centroids[k][1] = x[1] + 0.01 * (1 - k);
        centroids[k][2] = x[2] + 0.005 * (k - 1);
        counts[k] = 1;
      }
      initialized = 1;
    }

    int best = 0;
    double bestDist = INF;
    double secondDist = INF;
    for(int k=0; k<3; k++) {
      double d0 = x[0] - centroids[k][0];
      double d1 = x[1] - centroids[k][1];
      double d2 = x[2] - centroids[k][2];
      double dist = d0*d0 + d1*d1 + d2*d2;
      if(dist < bestDist) {
        secondDist = bestDist;
        bestDist = dist;
        best = k;
      } else if(dist < secondDist) {
        secondDist = dist;
      }
    }

    counts[best]++;
    double lr = 1.0 / (double)counts[best];
    centroids[best][0] += lr * (x[0] - centroids[best][0]);
    centroids[best][1] += lr * (x[1] - centroids[best][1]);
    centroids[best][2] += lr * (x[2] - centroids[best][2]);

    *regimeOut = best;
    *confOut = 1.0 / (1.0 + sqrt(fabs(secondDist - bestDist) + EPS));
  }
};

class RLAgent {
public:
  double q[4];
  int n[4];
  double epsilon;
  int lastAction;
  double lastMeanScore;

  RLAgent() : epsilon(0.10), lastAction(0), lastMeanScore(0) {
    for(int i=0;i<4;i++){ q[i]=0; n[i]=0; }
  }

  void init() {
    epsilon = 0.10;
    lastAction = 0;
    lastMeanScore = 0;
    for(int i=0;i<4;i++){ q[i]=0; n[i]=0; }
  }

  int chooseAction(int updateCount) {
    int exploratory = ((updateCount % 10) == 0);
    if(exploratory) return updateCount % 4;
    int best = 0;
    for(int i=1;i<4;i++) if(q[i] > q[best]) best = i;
    return best;
  }

  void updateReward(double newMeanScore) {
    double reward = newMeanScore - lastMeanScore;
    n[lastAction]++;
    q[lastAction] += (reward - q[lastAction]) / (double)n[lastAction];
    lastMeanScore = newMeanScore;
  }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};


class CommunityDetectionModel {
public:
  int communityId[N_ASSETS];
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCommunities;
  fvar modularityQ;
  fvar qSmooth;

  void init() {
    nCommunities = 1;
    modularityQ = 0;
    qSmooth = 0;
    for(int i=0;i<N_ASSETS;i++) {
      communityId[i] = 0;
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  static int argmaxLabel(const fvar w[N_ASSETS], const int label[N_ASSETS], int node) {
    fvar acc[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) acc[i] = 0;
    for(int j=0;j<N_ASSETS;j++) {
      if(j == node) continue;
      int l = label[j];
      if(l < 0 || l >= N_ASSETS) continue;
      acc[l] += w[j];
    }
    int best = label[node];
    fvar bestV = -1;
    for(int l=0;l<N_ASSETS;l++) {
      if(acc[l] > bestV) { bestV = acc[l]; best = l; }
    }
    return best;
  }

  void update(const fvar* corrMatrix, const fvar* distMatrix) {
    if(!corrMatrix || !distMatrix) return;

    fvar W[N_ASSETS][N_ASSETS];
    fvar degree[N_ASSETS];
    int label[N_ASSETS];

    for(int i=0;i<N_ASSETS;i++) {
      degree[i] = 0;
      label[i] = i;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) W[i][j] = 0;
        else {
          fvar w = (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          if(w < (fvar)COMM_W_MIN) w = 0;
          W[i][j] = w;
          degree[i] += w;
        }
      }
    }

    // Optional top-M pruning for determinism/noise control
    for(int i=0;i<N_ASSETS;i++) {
      int keep[N_ASSETS];
      for(int j=0;j<N_ASSETS;j++) keep[j] = 0;
      for(int k=0;k<COMM_TOPM;k++) {
        int best = -1;
        fvar bestW = 0;
        for(int j=0;j<N_ASSETS;j++) {
          if(i==j || keep[j]) continue;
          if(W[i][j] > bestW) { bestW = W[i][j]; best = j; }
        }
        if(best >= 0) keep[best] = 1;
      }
      for(int j=0;j<N_ASSETS;j++) if(i!=j && !keep[j]) W[i][j] = 0;
    }

    for(int it=0; it<COMM_ITERS; it++) {
      for(int i=0;i<N_ASSETS;i++) {
        label[i] = argmaxLabel(W[i], label, i);
      }
    }

    // compress labels
    int map[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) map[i] = -1;
    int nLab = 0;
    for(int i=0;i<N_ASSETS;i++) {
      int l = label[i];
      if(l < 0 || l >= N_ASSETS) l = 0;
      if(map[l] < 0) map[l] = nLab++;
      communityId[i] = map[l];
    }
    if(nLab < 1) nLab = 1;
    nCommunities = nLab;

    // modularity approximation
    fvar m2 = 0;
    for(int i=0;i<N_ASSETS;i++) for(int j=0;j<N_ASSETS;j++) m2 += W[i][j];
    if(m2 < (fvar)EPS) {
      modularityQ = 0;
    } else {
      fvar q = 0;
      for(int i=0;i<N_ASSETS;i++) {
        for(int j=0;j<N_ASSETS;j++) {
          if(communityId[i] == communityId[j]) {
            q += W[i][j] - (degree[i] * degree[j] / m2);
          }
        }
      }
      modularityQ = q / m2;
    }

    qSmooth = (fvar)(1.0 - COMM_Q_EMA) * qSmooth + (fvar)COMM_Q_EMA * modularityQ;

    for(int i=0;i<N_ASSETS;i++) {
      int c = communityId[i];
      if(c < 0) c = 0;
      clusterCoarse[i] = c % HCLUST_COARSE_K;
      clusterFine[i] = c % HCLUST_FINE_K;
    }
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class CompactDominantStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;
  CommunityDetectionModel comm;

  CompactDominantStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("CompactDominant_v11: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("CompactDominant_v11: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();
    comm.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("CompactDominant_v11: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar regime = featSoA.get(6, i, 0);
      fvar rawScore = (fvar)ALPHA * regime + (fvar)GAMMA * compactness[i] - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0;
    s.meanCompactness = 0;
    s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_COMMUNITY
      hclust.update(distMatrix.data);
#endif
#if USE_COMMUNITY
      comm.update(corrMatrix.data, distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
#if USE_COMMUNITY
    if(comm.qSmooth < (fvar)COMM_Q_LOW && topN > 2) topN--;
    if(comm.qSmooth > (fvar)COMM_Q_HIGH && topN < TOP_K) topN++;
#endif
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===CompactDominant_v11 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);
#if USE_COMMUNITY
      printf(" communities=%d Q=%.4f\n", comm.nCommunities, (double)comm.qSmooth);
#endif

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_COMMUNITY
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = comm.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = comm.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = comm.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static CompactDominantStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new CompactDominantStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

CrowdAverse Lattice Engine [Re: TipmyPip] #489272
4 hours ago
4 hours ago
Joined: Sep 2017
Posts: 263
TipmyPip Online OP
Member
TipmyPip  Online OP
Member

Joined: Sep 2017
Posts: 263
CrowdAverse Lattice Engine is a portfolio selector and risk controller designed to avoid crowded exposure while staying adaptive to changing market structure. It watches a fixed universe of currency pairs and builds a living map of how those pairs behave, how tightly they move together, and how concentrated the market has become. Each pair is described by a compact set of nine feature streams that represent movement, longer drift, volatility, relative displacement, range pressure, flow activity, a regime flag, volatility of volatility, and persistence. These features are stored in a ring buffer layout that is optimized for fast access and low overhead.

At regular update intervals, the engine measures pair to pair similarity by computing average correlations across all feature channels. This is the heaviest computation, so the system can offload it to an OpenCL kernel when available. If the OpenCL library is missing, no compatible device exists, or the kernel fails to build or run, the engine automatically falls back to a full CPU implementation. The result is a correlation matrix that reflects both price behavior and feature level structure rather than raw returns alone.

Next, the engine blends this correlation based closeness with an exposure based distance map. This second distance stream is intended to represent how similar two pairs are in their currency composition, so that the model can detect and penalize hidden concentration even when correlations are temporarily low. The blend creates a single distance matrix that becomes the basis for a network routing step. A shortest path pass converts direct distances into effective distances through the full network, then each pair receives a compactness score that represents how centrally connected it is within the current market web. A separate entropy like measure is computed from feature variance to represent local instability.

Scores are then built to reward clarity and penalize crowding: a pair is favored when it is structurally coherent and not surrounded by similarly favored neighbors, and it is discouraged when it sits inside a dense cluster of competing exposures. A learning controller sits above this scoring layer. It snapshots market wide averages, runs several lightweight unsupervised and sequential regime detectors, tracks stability, and adjusts selection breadth and score scaling. When regime confidence falls or switching risk rises, it enters a cooldown behavior that reduces aggressiveness. Finally, a community and hierarchy layer attempts to diversify the chosen set by spreading picks across clusters, so the engine does not select the same theme repeatedly even when scores are high.

Code
// TGr06B_CrowdAverse_v11.cpp - Zorro64 Strategy DLL
// Strategy B v11: Crowd-Averse with MX06 OOP + OpenCL + Learning Controller
//
// Notes:
// - Keeps full CPU fallback.
// - OpenCL is optional: if OpenCL.dll missing / no device / kernel build fails -> CPU path.
// - OpenCL accelerates the heavy correlation matrix step by offloading pairwise correlations.
// - Correlation is computed in float on GPU; results are stored back into fvar corrMatrix.

#define _CRT_SECURE_NO_WARNINGS
#include <zorro.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <windows.h>
#include <stddef.h>

#define INF 1e30
#define EPS 1e-12
#define N_ASSETS 28
#define FEAT_N 9
#define FEAT_WINDOW 200
#define UPDATE_EVERY 5
#define TOP_K 5

#define ALPHA 0.1
#define BETA 0.3
#define GAMMA 2.5
#define LAMBDA_META 0.5

#define USE_ML 1
#define USE_UNSUP 1
#define USE_RL 1
#define USE_PCA 1
#define USE_GMM 1
#define USE_HMM 1
#define HMM_K 3
#define HMM_DIM 8
#define HMM_VAR_FLOOR 1e-4
#define HMM_SMOOTH 0.02
#define HMM_ENTROPY_TH 0.85
#define HMM_SWITCH_TH 0.35
#define HMM_MIN_RISK 0.25
#define HMM_COOLDOWN_UPDATES 2
#define HMM_ONLINE_UPDATE 1
#define USE_KMEANS 1
#define KMEANS_K 3
#define KMEANS_DIM 8
#define KMEANS_ETA 0.03
#define KMEANS_DIST_EMA 0.08
#define KMEANS_STABILITY_MIN 0.35
#define KMEANS_ONLINE_UPDATE 1
#define USE_SPECTRAL 1
#define SPECTRAL_K 4
#define USE_HCLUST 1
#define HCLUST_COARSE_K 4
#define HCLUST_FINE_K 8
#define USE_COMMUNITY 1
#define COMM_W_MIN 0.15
#define COMM_TOPM 6
#define COMM_ITERS 4
#define COMM_Q_EMA 0.20
#define COMM_Q_LOW 0.20
#define COMM_Q_HIGH 0.45
#define GMM_K 3
#define GMM_DIM 8
#define GMM_ALPHA 0.02
#define GMM_VAR_FLOOR 1e-4
#define GMM_ENTROPY_COEFF 0.45
#define GMM_MIN_RISK 0.25
#define GMM_ONLINE_UPDATE 1
#define STRATEGY_PROFILE 1
#define PCA_DIM 6
#define PCA_COMP 3
#define PCA_WINDOW 128
#define PCA_REBUILD_EVERY 4

#ifdef TIGHT_MEM
typedef float fvar;
#else
typedef double fvar;
#endif

static const char* ASSET_NAMES[] = {
  "EURUSD","GBPUSD","USDCHF","USDJPY","AUDUSD","AUDCAD","AUDCHF","AUDJPY","AUDNZD",
  "CADJPY","CADCHF","EURAUD","EURCAD","EURCHF","EURGBP","EURJPY","EURNZD","GBPAUD",
  "GBPCAD","GBPCHF","GBPJPY","GBPNZD","NZDCAD","NZDCHF","NZDJPY","NZDUSD","USDCAD"
};
static const char* CURRENCIES[] = {"EUR","GBP","USD","CHF","JPY","AUD","CAD","NZD"};
#define N_CURRENCIES 8

// ---------------------------- Exposure Table ----------------------------

struct ExposureTable {
  int exposure[N_ASSETS][N_CURRENCIES];
  double exposureDist[N_ASSETS][N_ASSETS];

  void init() {
    for(int i=0;i<N_ASSETS;i++){
      for(int c=0;c<N_CURRENCIES;c++){
        exposure[i][c] = 0;
      }
    }
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        exposureDist[i][j] = 0.0;
      }
    }
  }

  inline double getDist(int i,int j) const { return exposureDist[i][j]; }
};

// ---------------------------- Slab Allocator ----------------------------

template<typename T>
class SlabAllocator {
public:
  T* data;
  int capacity;

  SlabAllocator() : data(NULL), capacity(0) {}
  ~SlabAllocator() { shutdown(); }

  void init(int size) {
    shutdown();
    capacity = size;
    data = (T*)malloc((size_t)capacity * sizeof(T));
    if(data) memset(data, 0, (size_t)capacity * sizeof(T));
  }

  void shutdown() {
    if(data) free(data);
    data = NULL;
    capacity = 0;
  }

  T& operator[](int i) { return data[i]; }
  const T& operator[](int i) const { return data[i]; }
};

// ---------------------------- Feature Buffer (SoA ring) ----------------------------

struct FeatureBufferSoA {
  SlabAllocator<fvar> buffer;
  int windowSize;
  int currentIndex;

  void init(int assets, int window) {
    windowSize = window;
    currentIndex = 0;
    buffer.init(FEAT_N * assets * window);
  }

  void shutdown() { buffer.shutdown(); }

  inline int offset(int feat,int asset,int t) const {
    return (feat * N_ASSETS + asset) * windowSize + t;
  }

  void push(int feat,int asset,fvar value) {
    buffer[offset(feat, asset, currentIndex)] = value;
    currentIndex = (currentIndex + 1) % windowSize;
  }

  // t=0 => most recent
  fvar get(int feat,int asset,int t) const {
    int idx = (currentIndex - 1 - t + windowSize) % windowSize;
    return buffer[offset(feat, asset, idx)];
  }
};

// ---------------------------- Minimal OpenCL (dynamic) ----------------------------

typedef struct _cl_platform_id*   cl_platform_id;
typedef struct _cl_device_id*     cl_device_id;
typedef struct _cl_context*       cl_context;
typedef struct _cl_command_queue* cl_command_queue;
typedef struct _cl_program*       cl_program;
typedef struct _cl_kernel*        cl_kernel;
typedef struct _cl_mem*           cl_mem;
typedef unsigned int              cl_uint;
typedef int                       cl_int;
typedef unsigned long long        cl_ulong;
typedef size_t                    cl_bool;

#define CL_SUCCESS 0
#define CL_DEVICE_TYPE_CPU (1ULL << 1)
#define CL_DEVICE_TYPE_GPU (1ULL << 2)
#define CL_MEM_READ_ONLY   (1ULL << 2)
#define CL_MEM_WRITE_ONLY  (1ULL << 1)
#define CL_MEM_READ_WRITE  (1ULL << 0)
#define CL_TRUE  1
#define CL_FALSE 0
#define CL_PROGRAM_BUILD_LOG 0x1183

class OpenCLBackend {
public:
  HMODULE hOpenCL;
  int ready;

  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_command_queue queue;
  cl_program program;
  cl_kernel kCorr;

  cl_mem bufFeat;
  cl_mem bufCorr;

  int featBytes;
  int corrBytes;

  cl_int (*clGetPlatformIDs)(cl_uint, cl_platform_id*, cl_uint*);
  cl_int (*clGetDeviceIDs)(cl_platform_id, cl_ulong, cl_uint, cl_device_id*, cl_uint*);
  cl_context (*clCreateContext)(void*, cl_uint, const cl_device_id*, void*, void*, cl_int*);
  cl_command_queue (*clCreateCommandQueue)(cl_context, cl_device_id, cl_ulong, cl_int*);
  cl_program (*clCreateProgramWithSource)(cl_context, cl_uint, const char**, const size_t*, cl_int*);
  cl_int (*clBuildProgram)(cl_program, cl_uint, const cl_device_id*, const char*, void*, void*);
  cl_int (*clGetProgramBuildInfo)(cl_program, cl_device_id, cl_uint, size_t, void*, size_t*);
  cl_kernel (*clCreateKernel)(cl_program, const char*, cl_int*);
  cl_int (*clSetKernelArg)(cl_kernel, cl_uint, size_t, const void*);
  cl_mem (*clCreateBuffer)(cl_context, cl_ulong, size_t, void*, cl_int*);
  cl_int (*clEnqueueWriteBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueReadBuffer)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void*, cl_uint, const void*, void*);
  cl_int (*clEnqueueNDRangeKernel)(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, const void*, void*);
  cl_int (*clFinish)(cl_command_queue);
  cl_int (*clReleaseMemObject)(cl_mem);
  cl_int (*clReleaseKernel)(cl_kernel);
  cl_int (*clReleaseProgram)(cl_program);
  cl_int (*clReleaseCommandQueue)(cl_command_queue);
  cl_int (*clReleaseContext)(cl_context);

  OpenCLBackend()
  : hOpenCL(NULL), ready(0),
    platform(NULL), device(NULL), context(NULL), queue(NULL), program(NULL), kCorr(NULL),
    bufFeat(NULL), bufCorr(NULL),
    featBytes(0), corrBytes(0),
    clGetPlatformIDs(NULL), clGetDeviceIDs(NULL), clCreateContext(NULL), clCreateCommandQueue(NULL),
    clCreateProgramWithSource(NULL), clBuildProgram(NULL), clGetProgramBuildInfo(NULL),
    clCreateKernel(NULL), clSetKernelArg(NULL),
    clCreateBuffer(NULL), clEnqueueWriteBuffer(NULL), clEnqueueReadBuffer(NULL),
    clEnqueueNDRangeKernel(NULL), clFinish(NULL),
    clReleaseMemObject(NULL), clReleaseKernel(NULL), clReleaseProgram(NULL),
    clReleaseCommandQueue(NULL), clReleaseContext(NULL)
  {}

  int loadSymbol(void** fp, const char* name) {
    *fp = (void*)GetProcAddress(hOpenCL, name);
    return (*fp != NULL);
  }

  const char* kernelSource() {
    return
      "__kernel void corr_pairwise(\n"
      "  __global const float* feat,\n"
      "  __global float* outCorr,\n"
      "  const int nAssets,\n"
      "  const int nFeat,\n"
      "  const int windowSize,\n"
      "  const float eps\n"
      "){\n"
      "  int a = (int)get_global_id(0);\n"
      "  int b = (int)get_global_id(1);\n"
      "  if(a >= nAssets || b >= nAssets) return;\n"
      "  if(a >= b) return;\n"
      "  float acc = 0.0f;\n"
      "  for(int f=0; f<nFeat; f++){\n"
      "    int baseA = (f*nAssets + a) * windowSize;\n"
      "    int baseB = (f*nAssets + b) * windowSize;\n"
      "    float mx = 0.0f;\n"
      "    float my = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      mx += feat[baseA + t];\n"
      "      my += feat[baseB + t];\n"
      "    }\n"
      "    mx /= (float)windowSize;\n"
      "    my /= (float)windowSize;\n"
      "    float sxx = 0.0f;\n"
      "    float syy = 0.0f;\n"
      "    float sxy = 0.0f;\n"
      "    for(int t=0; t<windowSize; t++){\n"
      "      float dx = feat[baseA + t] - mx;\n"
      "      float dy = feat[baseB + t] - my;\n"
      "      sxx += dx*dx;\n"
      "      syy += dy*dy;\n"
      "      sxy += dx*dy;\n"
      "    }\n"
      "    float den = sqrt(sxx*syy + eps);\n"
      "    float corr = (den > eps) ? (sxy/den) : 0.0f;\n"
      "    acc += corr;\n"
      "  }\n"
      "  outCorr[a*nAssets + b] = acc / (float)nFeat;\n"
      "}\n";
  }

  void printBuildLog() {
    if(!clGetProgramBuildInfo || !program || !device) return;
    size_t logSize = 0;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    if(logSize == 0) return;
    char* log = (char*)malloc(logSize + 1);
    if(!log) return;
    memset(log, 0, logSize + 1);
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    printf("OpenCL build log:\n%s\n", log);
    free(log);
  }

  void init() {
    ready = 0;

    hOpenCL = LoadLibraryA("OpenCL.dll");
    if(!hOpenCL) {
      printf("OpenCL: CPU (OpenCL.dll missing)\n");
      return;
    }

    if(!loadSymbol((void**)&clGetPlatformIDs,       "clGetPlatformIDs")) return;
    if(!loadSymbol((void**)&clGetDeviceIDs,         "clGetDeviceIDs")) return;
    if(!loadSymbol((void**)&clCreateContext,        "clCreateContext")) return;
    if(!loadSymbol((void**)&clCreateCommandQueue,   "clCreateCommandQueue")) return;
    if(!loadSymbol((void**)&clCreateProgramWithSource,"clCreateProgramWithSource")) return;
    if(!loadSymbol((void**)&clBuildProgram,         "clBuildProgram")) return;
    if(!loadSymbol((void**)&clGetProgramBuildInfo,  "clGetProgramBuildInfo")) return;
    if(!loadSymbol((void**)&clCreateKernel,         "clCreateKernel")) return;
    if(!loadSymbol((void**)&clSetKernelArg,         "clSetKernelArg")) return;
    if(!loadSymbol((void**)&clCreateBuffer,         "clCreateBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueWriteBuffer,   "clEnqueueWriteBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueReadBuffer,    "clEnqueueReadBuffer")) return;
    if(!loadSymbol((void**)&clEnqueueNDRangeKernel, "clEnqueueNDRangeKernel")) return;
    if(!loadSymbol((void**)&clFinish,               "clFinish")) return;
    if(!loadSymbol((void**)&clReleaseMemObject,     "clReleaseMemObject")) return;
    if(!loadSymbol((void**)&clReleaseKernel,        "clReleaseKernel")) return;
    if(!loadSymbol((void**)&clReleaseProgram,       "clReleaseProgram")) return;
    if(!loadSymbol((void**)&clReleaseCommandQueue,  "clReleaseCommandQueue")) return;
    if(!loadSymbol((void**)&clReleaseContext,       "clReleaseContext")) return;

    cl_uint nPlat = 0;
    if(clGetPlatformIDs(0, NULL, &nPlat) != CL_SUCCESS || nPlat == 0) {
      printf("OpenCL: CPU (no platform)\n");
      return;
    }
    clGetPlatformIDs(1, &platform, NULL);

    cl_uint nDev = 0;
    cl_int ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &nDev);
    if(ok != CL_SUCCESS || nDev == 0) {
      ok = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, &nDev);
      if(ok != CL_SUCCESS || nDev == 0) {
        printf("OpenCL: CPU (no device)\n");
        return;
      }
    }

    cl_int err = 0;
    context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
    if(err != CL_SUCCESS || !context) {
      printf("OpenCL: CPU (context fail)\n");
      return;
    }

    queue = clCreateCommandQueue(context, device, 0, &err);
    if(err != CL_SUCCESS || !queue) {
      printf("OpenCL: CPU (queue fail)\n");
      return;
    }

    const char* src = kernelSource();
    program = clCreateProgramWithSource(context, 1, &src, NULL, &err);
    if(err != CL_SUCCESS || !program) {
      printf("OpenCL: CPU (program fail)\n");
      return;
    }

    err = clBuildProgram(program, 1, &device, "", NULL, NULL);
    if(err != CL_SUCCESS) {
      printf("OpenCL: CPU (build fail)\n");
      printBuildLog();
      return;
    }

    kCorr = clCreateKernel(program, "corr_pairwise", &err);
    if(err != CL_SUCCESS || !kCorr) {
      printf("OpenCL: CPU (kernel fail)\n");
      printBuildLog();
      return;
    }

    featBytes = FEAT_N * N_ASSETS * FEAT_WINDOW * (int)sizeof(float);
    corrBytes = N_ASSETS * N_ASSETS * (int)sizeof(float);

    bufFeat = clCreateBuffer(context, CL_MEM_READ_ONLY, (size_t)featBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufFeat) {
      printf("OpenCL: CPU (bufFeat fail)\n");
      return;
    }

    bufCorr = clCreateBuffer(context, CL_MEM_WRITE_ONLY, (size_t)corrBytes, NULL, &err);
    if(err != CL_SUCCESS || !bufCorr) {
      printf("OpenCL: CPU (bufCorr fail)\n");
      return;
    }

    ready = 1;
    printf("OpenCL: READY (kernel+buffers)\n");
  }

  void shutdown() {
    if(bufCorr) { clReleaseMemObject(bufCorr); bufCorr = NULL; }
    if(bufFeat) { clReleaseMemObject(bufFeat); bufFeat = NULL; }
    if(kCorr) { clReleaseKernel(kCorr); kCorr = NULL; }
    if(program) { clReleaseProgram(program); program = NULL; }
    if(queue) { clReleaseCommandQueue(queue); queue = NULL; }
    if(context) { clReleaseContext(context); context = NULL; }
    if(hOpenCL) { FreeLibrary(hOpenCL); hOpenCL = NULL; }
    ready = 0;
  }

  int computeCorrelationMatrixCL(const float* featLinear, float* outCorr, int nAssets, int nFeat, int windowSize) {
    if(!ready) return 0;
    if(!featLinear || !outCorr) return 0;

    cl_int err = clEnqueueWriteBuffer(queue, bufFeat, CL_TRUE, 0, (size_t)featBytes, featLinear, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    float eps = 1e-12f;
    err = CL_SUCCESS;
    err |= clSetKernelArg(kCorr, 0, sizeof(cl_mem), &bufFeat);
    err |= clSetKernelArg(kCorr, 1, sizeof(cl_mem), &bufCorr);
    err |= clSetKernelArg(kCorr, 2, sizeof(int), &nAssets);
    err |= clSetKernelArg(kCorr, 3, sizeof(int), &nFeat);
    err |= clSetKernelArg(kCorr, 4, sizeof(int), &windowSize);
    err |= clSetKernelArg(kCorr, 5, sizeof(float), &eps);
    if(err != CL_SUCCESS) return 0;

    size_t global[2];
    global[0] = (size_t)nAssets;
    global[1] = (size_t)nAssets;

    err = clEnqueueNDRangeKernel(queue, kCorr, 2, NULL, global, NULL, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    err = clFinish(queue);
    if(err != CL_SUCCESS) return 0;

    err = clEnqueueReadBuffer(queue, bufCorr, CL_TRUE, 0, (size_t)corrBytes, outCorr, 0, NULL, NULL);
    if(err != CL_SUCCESS) return 0;

    return 1;
  }
};

// ---------------------------- Learning Layer ----------------------------

struct LearningSnapshot {
  double meanScore;
  double meanCompactness;
  double meanVol;
  int regime;
  double regimeConfidence;
};

class UnsupervisedModel {
public:
  double centroids[3][3];
  int counts[3];
  int initialized;
  UnsupervisedModel() : initialized(0) { memset(centroids, 0, sizeof(centroids)); memset(counts, 0, sizeof(counts)); }
  void init() { initialized = 0; memset(centroids, 0, sizeof(centroids)); memset(counts, 0, sizeof(counts)); }
  void update(const LearningSnapshot& s, int* regimeOut, double* confOut) {
    double x0=s.meanScore,x1=s.meanCompactness,x2=s.meanVol;
    if(!initialized) {
      for(int k=0;k<3;k++){ centroids[k][0]=x0+0.01*(k-1); centroids[k][1]=x1+0.01*(1-k); centroids[k][2]=x2+0.005*(k-1); counts[k]=1; }
      initialized = 1;
    }
    int best=0; double bestDist=INF, secondDist=INF;
    for(int k=0;k<3;k++) {
      double d0=x0-centroids[k][0], d1=x1-centroids[k][1], d2=x2-centroids[k][2];
      double dist=d0*d0+d1*d1+d2*d2;
      if(dist < bestDist){ secondDist=bestDist; bestDist=dist; best=k; }
      else if(dist < secondDist){ secondDist=dist; }
    }
    counts[best]++;
    double lr = 1.0/(double)counts[best];
    centroids[best][0] += lr*(x0-centroids[best][0]);
    centroids[best][1] += lr*(x1-centroids[best][1]);
    centroids[best][2] += lr*(x2-centroids[best][2]);
    *regimeOut = best;
    *confOut = 1.0/(1.0 + sqrt(fabs(secondDist-bestDist)+EPS));
  }
};

class RLAgent {
public:
  double q[4]; int n[4]; int lastAction; double lastMeanScore;
  RLAgent() : lastAction(0), lastMeanScore(0) { for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  void init(){ lastAction=0; lastMeanScore=0; for(int i=0;i<4;i++){q[i]=0;n[i]=0;} }
  int chooseAction(int updateCount){ if((updateCount%10)==0) return updateCount%4; int b=0; for(int i=1;i<4;i++) if(q[i]>q[b]) b=i; return b; }
  void updateReward(double newMeanScore){ double r=newMeanScore-lastMeanScore; n[lastAction]++; q[lastAction]+=(r-q[lastAction])/(double)n[lastAction]; lastMeanScore=newMeanScore; }
};

class PCAModel {
public:
  double hist[PCA_WINDOW][PCA_DIM];
  double mean[PCA_DIM];
  double stdev[PCA_DIM];
  double latent[PCA_COMP];
  double explainedVar[PCA_COMP];
  int writeIdx;
  int count;
  int rebuildEvery;
  int updates;
  double dom;
  double rot;
  double prevExplained0;

  PCAModel() : writeIdx(0), count(0), rebuildEvery(PCA_REBUILD_EVERY), updates(0), dom(0), rot(0), prevExplained0(0) {
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void init() {
    writeIdx = 0;
    count = 0;
    updates = 0;
    dom = 0;
    rot = 0;
    prevExplained0 = 0;
    memset(hist, 0, sizeof(hist));
    memset(mean, 0, sizeof(mean));
    memset(stdev, 0, sizeof(stdev));
    memset(latent, 0, sizeof(latent));
    memset(explainedVar, 0, sizeof(explainedVar));
  }

  void pushSnapshot(const double x[PCA_DIM]) {
    for(int d=0; d<PCA_DIM; d++) hist[writeIdx][d] = x[d];
    writeIdx = (writeIdx + 1) % PCA_WINDOW;
    if(count < PCA_WINDOW) count++;
  }

  void rebuildStats() {
    if(count <= 0) return;
    for(int d=0; d<PCA_DIM; d++) {
      double m = 0;
      for(int i=0; i<count; i++) m += hist[i][d];
      m /= (double)count;
      mean[d] = m;

      double v = 0;
      for(int i=0; i<count; i++) {
        double dd = hist[i][d] - m;
        v += dd * dd;
      }
      v /= (double)count;
      stdev[d] = sqrt(v + EPS);
    }
  }

  void update(const LearningSnapshot& snap, int regime, double conf) {
    double x[PCA_DIM];
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = (double)regime / 2.0;
    x[4] = conf;
    x[5] = snap.meanScore - snap.meanCompactness;

    pushSnapshot(x);
    updates++;
    if((updates % rebuildEvery) == 0 || count < 4) rebuildStats();

    double z[PCA_DIM];
    for(int d=0; d<PCA_DIM; d++) z[d] = (x[d] - mean[d]) / (stdev[d] + EPS);

    latent[0] = 0.60*z[0] + 0.30*z[1] + 0.10*z[2];
    latent[1] = 0.25*z[0] - 0.45*z[1] + 0.20*z[2] + 0.10*z[4];
    latent[2] = 0.20*z[2] + 0.50*z[3] - 0.30*z[5];

    double a0 = fabs(latent[0]);
    double a1 = fabs(latent[1]);
    double a2 = fabs(latent[2]);
    double sumA = a0 + a1 + a2 + EPS;

    explainedVar[0] = a0 / sumA;
    explainedVar[1] = a1 / sumA;
    explainedVar[2] = a2 / sumA;

    dom = explainedVar[0];
    rot = fabs(explainedVar[0] - prevExplained0);
    prevExplained0 = explainedVar[0];
  }
};

class GMMRegimeModel {
public:
  double pi[GMM_K];
  double mu[GMM_K][GMM_DIM];
  double var[GMM_K][GMM_DIM];
  double p[GMM_K];
  double entropy;
  double conf;
  int bestRegime;
  int initialized;

  GMMRegimeModel() : entropy(0), conf(0), bestRegime(0), initialized(0) {
    memset(pi, 0, sizeof(pi));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(p, 0, sizeof(p));
  }

  void init() {
    initialized = 0;
    entropy = 0;
    conf = 0;
    bestRegime = 0;
    for(int k=0;k<GMM_K;k++) {
      pi[k] = 1.0 / (double)GMM_K;
      for(int d=0; d<GMM_DIM; d++) {
        mu[k][d] = 0.02 * (k - 1);
        var[k][d] = 1.0;
      }
      p[k] = 1.0 / (double)GMM_K;
    }
    initialized = 1;
  }

  static double gaussianDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<GMM_DIM; d++) {
      double vv = v[d];
      if(vv < GMM_VAR_FLOOR) vv = GMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void infer(const double x[GMM_DIM]) {
    if(!initialized) init();
    double sum = 0;
    for(int k=0;k<GMM_K;k++) {
      double g = gaussianDiag(x, mu[k], var[k]);
      p[k] = pi[k] * g;
      sum += p[k];
    }
    if(sum < EPS) {
      for(int k=0;k<GMM_K;k++) p[k] = 1.0 / (double)GMM_K;
    } else {
      for(int k=0;k<GMM_K;k++) p[k] /= sum;
    }

    bestRegime = 0;
    conf = p[0];
    for(int k=1;k<GMM_K;k++) {
      if(p[k] > conf) {
        conf = p[k];
        bestRegime = k;
      }
    }

    entropy = 0;
    for(int k=0;k<GMM_K;k++) entropy -= p[k] * log(p[k] + EPS);

#if GMM_ONLINE_UPDATE
    // lightweight incremental update (EM-like with forgetting)
    for(int k=0;k<GMM_K;k++) {
      double w = GMM_ALPHA * p[k];
      pi[k] = (1.0 - GMM_ALPHA) * pi[k] + w;
      for(int d=0; d<GMM_DIM; d++) {
        double diff = x[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < GMM_VAR_FLOOR) var[k][d] = GMM_VAR_FLOOR;
      }
    }
#endif
  }
};


class HMMRegimeModel {
public:
  double A[HMM_K][HMM_K];
  double mu[HMM_K][HMM_DIM];
  double var[HMM_K][HMM_DIM];
  double posterior[HMM_K];
  double entropy;
  double conf;
  double switchProb;
  int regime;
  int initialized;

  HMMRegimeModel() : entropy(0), conf(0), switchProb(0), regime(0), initialized(0) {
    memset(A, 0, sizeof(A));
    memset(mu, 0, sizeof(mu));
    memset(var, 0, sizeof(var));
    memset(posterior, 0, sizeof(posterior));
  }

  void init() {
    for(int i=0;i<HMM_K;i++) {
      for(int j=0;j<HMM_K;j++) A[i][j] = (i==j) ? 0.90 : 0.10/(double)(HMM_K-1);
      for(int d=0; d<HMM_DIM; d++) {
        mu[i][d] = 0.03 * (i - 1);
        var[i][d] = 1.0;
      }
      posterior[i] = 1.0/(double)HMM_K;
    }
    regime = 0;
    conf = posterior[0];
    entropy = 0;
    switchProb = 0;
    initialized = 1;
  }

  static double emissionDiag(const double* x, const double* m, const double* v) {
    double logp = 0;
    for(int d=0; d<HMM_DIM; d++) {
      double vv = v[d];
      if(vv < HMM_VAR_FLOOR) vv = HMM_VAR_FLOOR;
      double z = x[d] - m[d];
      logp += -0.5 * (z*z / vv + log(vv + EPS));
    }
    if(logp < -80.0) logp = -80.0;
    return exp(logp);
  }

  void filter(const double obs[HMM_DIM]) {
    if(!initialized) init();

    double pred[HMM_K];
    for(int j=0;j<HMM_K;j++) {
      pred[j] = 0;
      for(int i=0;i<HMM_K;i++) pred[j] += posterior[i] * A[i][j];
    }

    double alpha[HMM_K];
    double sum = 0;
    for(int k=0;k<HMM_K;k++) {
      double emit = emissionDiag(obs, mu[k], var[k]);
      alpha[k] = pred[k] * emit;
      sum += alpha[k];
    }
    if(sum < EPS) {
      for(int k=0;k<HMM_K;k++) alpha[k] = 1.0/(double)HMM_K;
    } else {
      for(int k=0;k<HMM_K;k++) alpha[k] /= sum;
    }

    for(int k=0;k<HMM_K;k++) posterior[k] = alpha[k];

    regime = 0;
    conf = posterior[0];
    for(int k=1;k<HMM_K;k++) if(posterior[k] > conf) { conf = posterior[k]; regime = k; }

    entropy = 0;
    for(int k=0;k<HMM_K;k++) entropy -= posterior[k] * log(posterior[k] + EPS);

    switchProb = 1.0 - A[regime][regime];
    if(switchProb < 0) switchProb = 0;
    if(switchProb > 1) switchProb = 1;

#if HMM_ONLINE_UPDATE
    for(int k=0;k<HMM_K;k++) {
      double w = HMM_SMOOTH * posterior[k];
      for(int d=0; d<HMM_DIM; d++) {
        double diff = obs[d] - mu[k][d];
        mu[k][d] += w * diff;
        var[k][d] = (1.0 - w) * var[k][d] + w * diff * diff;
        if(var[k][d] < HMM_VAR_FLOOR) var[k][d] = HMM_VAR_FLOOR;
      }
    }
#endif
  }
};

class KMeansRegimeModel {
public:
  double centroids[KMEANS_K][KMEANS_DIM];
  double distEma;
  double distVarEma;
  int initialized;
  int regime;
  double dist;
  double stability;

  KMeansRegimeModel() : distEma(0), distVarEma(1), initialized(0), regime(0), dist(0), stability(0) {
    memset(centroids, 0, sizeof(centroids));
  }

  void init() {
    distEma = 0;
    distVarEma = 1;
    initialized = 0;
    regime = 0;
    dist = 0;
    stability = 0;
    memset(centroids, 0, sizeof(centroids));
  }

  void seed(const double x[KMEANS_DIM]) {
    for(int k=0;k<KMEANS_K;k++) {
      for(int d=0; d<KMEANS_DIM; d++) {
        centroids[k][d] = x[d] + 0.03 * (k - 1);
      }
    }
    initialized = 1;
  }

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void predictAndUpdate(const double x[KMEANS_DIM]) {
    if(!initialized) seed(x);

    int best = 0;
    double bestDist = INF;
    for(int k=0;k<KMEANS_K;k++) {
      double s = 0;
      for(int d=0; d<KMEANS_DIM; d++) {
        double z = x[d] - centroids[k][d];
        s += z * z;
      }
      double dk = sqrt(s + EPS);
      if(dk < bestDist) {
        bestDist = dk;
        best = k;
      }
    }

    regime = best;
    dist = bestDist;

    distEma = (1.0 - KMEANS_DIST_EMA) * distEma + KMEANS_DIST_EMA * dist;
    double dd = dist - distEma;
    distVarEma = (1.0 - KMEANS_DIST_EMA) * distVarEma + KMEANS_DIST_EMA * dd * dd;
    double distStd = sqrt(distVarEma + EPS);
    double zDist = (dist - distEma) / (distStd + EPS);
    stability = clampRange(1.0 / (1.0 + exp(zDist)), 0.0, 1.0);

#if KMEANS_ONLINE_UPDATE
    for(int d=0; d<KMEANS_DIM; d++) {
      centroids[best][d] += KMEANS_ETA * (x[d] - centroids[best][d]);
    }
#endif
  }
};


class SpectralClusterModel {
public:
  int clusterId[N_ASSETS];
  int nClusters;

  void init() {
    nClusters = SPECTRAL_K;
    for(int i=0;i<N_ASSETS;i++) clusterId[i] = i % SPECTRAL_K;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;
    // lightweight deterministic clustering surrogate from distance rows
    for(int i=0;i<N_ASSETS;i++) {
      double sig = 0;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) continue;
        double d = (double)distMatrix[i*N_ASSETS + j];
        if(d < INF) sig += d;
      }
      int cid = (int)fmod(fabs(sig * 1000.0), (double)SPECTRAL_K);
      if(cid < 0) cid = 0;
      if(cid >= SPECTRAL_K) cid = SPECTRAL_K - 1;
      clusterId[i] = cid;
    }
  }
};


class HierarchicalClusteringModel {
public:
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCoarse;
  int nFine;

  int leftChild[2*N_ASSETS];
  int rightChild[2*N_ASSETS];
  int nodeSize[2*N_ASSETS];
  double nodeHeight[2*N_ASSETS];
  double nodeDist[2*N_ASSETS][2*N_ASSETS];
  int rootNode;

  void init() {
    nCoarse = HCLUST_COARSE_K;
    nFine = HCLUST_FINE_K;
    rootNode = N_ASSETS - 1;
    for(int i=0;i<N_ASSETS;i++) {
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  void collectLeaves(int node, int clusterId, int* out) {
    int stack[2*N_ASSETS];
    int sp = 0;
    stack[sp++] = node;
    while(sp > 0) {
      int cur = stack[--sp];
      if(cur < N_ASSETS) {
        out[cur] = clusterId;
      } else {
        if(leftChild[cur] >= 0) stack[sp++] = leftChild[cur];
        if(rightChild[cur] >= 0) stack[sp++] = rightChild[cur];
      }
    }
  }

  void cutByK(int K, int* out) {
    for(int i=0;i<N_ASSETS;i++) out[i] = -1;
    if(K <= 1) {
      for(int i=0;i<N_ASSETS;i++) out[i] = 0;
      return;
    }

    int clusters[2*N_ASSETS];
    int count = 1;
    clusters[0] = rootNode;

    while(count < K) {
      int bestPos = -1;
      double bestHeight = -1;
      for(int i=0;i<count;i++) {
        int node = clusters[i];
        if(node >= N_ASSETS && nodeHeight[node] > bestHeight) {
          bestHeight = nodeHeight[node];
          bestPos = i;
        }
      }
      if(bestPos < 0) break;
      int node = clusters[bestPos];
      int l = leftChild[node];
      int r = rightChild[node];
      clusters[bestPos] = l;
      clusters[count++] = r;
    }

    for(int c=0;c<count;c++) {
      collectLeaves(clusters[c], c, out);
    }
    for(int i=0;i<N_ASSETS;i++) if(out[i] < 0) out[i] = 0;
  }

  void update(const fvar* distMatrix) {
    if(!distMatrix) return;

    int totalNodes = 2 * N_ASSETS;
    for(int i=0;i<totalNodes;i++) {
      leftChild[i] = -1;
      rightChild[i] = -1;
      nodeSize[i] = (i < N_ASSETS) ? 1 : 0;
      nodeHeight[i] = 0;
      for(int j=0;j<totalNodes;j++) nodeDist[i][j] = INF;
    }

    for(int i=0;i<N_ASSETS;i++) {
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) nodeDist[i][j] = 0;
        else {
          double d = (double)distMatrix[i*N_ASSETS + j];
          if(d < 0 || d >= INF) d = 1.0;
          nodeDist[i][j] = d;
        }
      }
    }

    int active[2*N_ASSETS];
    int nActive = N_ASSETS;
    for(int i=0;i<N_ASSETS;i++) active[i] = i;
    int nextNode = N_ASSETS;

    while(nActive > 1 && nextNode < 2*N_ASSETS) {
      int ai = 0, aj = 1;
      double best = INF;
      for(int i=0;i<nActive;i++) {
        for(int j=i+1;j<nActive;j++) {
          int a = active[i], b = active[j];
          if(nodeDist[a][b] < best) {
            best = nodeDist[a][b];
            ai = i; aj = j;
          }
        }
      }

      int a = active[ai];
      int b = active[aj];
      int m = nextNode++;

      leftChild[m] = a;
      rightChild[m] = b;
      nodeHeight[m] = best;
      nodeSize[m] = nodeSize[a] + nodeSize[b];

      for(int i=0;i<nActive;i++) {
        if(i == ai || i == aj) continue;
        int k = active[i];
        double da = nodeDist[a][k];
        double db = nodeDist[b][k];
        double dm = (nodeSize[a] * da + nodeSize[b] * db) / (double)(nodeSize[a] + nodeSize[b]);
        nodeDist[m][k] = dm;
        nodeDist[k][m] = dm;
      }
      nodeDist[m][m] = 0;

      if(aj < ai) { int t=ai; ai=aj; aj=t; }
      for(int i=aj;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      for(int i=ai;i<nActive-1;i++) active[i] = active[i+1];
      nActive--;
      active[nActive++] = m;
    }

    rootNode = active[0];

    int kc = HCLUST_COARSE_K;
    if(kc < 1) kc = 1;
    if(kc > N_ASSETS) kc = N_ASSETS;
    int kf = HCLUST_FINE_K;
    if(kf < 1) kf = 1;
    if(kf > N_ASSETS) kf = N_ASSETS;

    cutByK(kc, clusterCoarse);
    cutByK(kf, clusterFine);
    nCoarse = kc;
    nFine = kf;
  }
};


class CommunityDetectionModel {
public:
  int communityId[N_ASSETS];
  int clusterCoarse[N_ASSETS];
  int clusterFine[N_ASSETS];
  int nCommunities;
  fvar modularityQ;
  fvar qSmooth;

  void init() {
    nCommunities = 1;
    modularityQ = 0;
    qSmooth = 0;
    for(int i=0;i<N_ASSETS;i++) {
      communityId[i] = 0;
      clusterCoarse[i] = i % HCLUST_COARSE_K;
      clusterFine[i] = i % HCLUST_FINE_K;
    }
  }

  static int argmaxLabel(const fvar w[N_ASSETS], const int label[N_ASSETS], int node) {
    fvar acc[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) acc[i] = 0;
    for(int j=0;j<N_ASSETS;j++) {
      if(j == node) continue;
      int l = label[j];
      if(l < 0 || l >= N_ASSETS) continue;
      acc[l] += w[j];
    }
    int best = label[node];
    fvar bestV = -1;
    for(int l=0;l<N_ASSETS;l++) {
      if(acc[l] > bestV) { bestV = acc[l]; best = l; }
    }
    return best;
  }

  void update(const fvar* corrMatrix, const fvar* distMatrix) {
    if(!corrMatrix || !distMatrix) return;

    fvar W[N_ASSETS][N_ASSETS];
    fvar degree[N_ASSETS];
    int label[N_ASSETS];

    for(int i=0;i<N_ASSETS;i++) {
      degree[i] = 0;
      label[i] = i;
      for(int j=0;j<N_ASSETS;j++) {
        if(i == j) W[i][j] = 0;
        else {
          fvar w = (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          if(w < (fvar)COMM_W_MIN) w = 0;
          W[i][j] = w;
          degree[i] += w;
        }
      }
    }

    // Optional top-M pruning for determinism/noise control
    for(int i=0;i<N_ASSETS;i++) {
      int keep[N_ASSETS];
      for(int j=0;j<N_ASSETS;j++) keep[j] = 0;
      for(int k=0;k<COMM_TOPM;k++) {
        int best = -1;
        fvar bestW = 0;
        for(int j=0;j<N_ASSETS;j++) {
          if(i==j || keep[j]) continue;
          if(W[i][j] > bestW) { bestW = W[i][j]; best = j; }
        }
        if(best >= 0) keep[best] = 1;
      }
      for(int j=0;j<N_ASSETS;j++) if(i!=j && !keep[j]) W[i][j] = 0;
    }

    for(int it=0; it<COMM_ITERS; it++) {
      for(int i=0;i<N_ASSETS;i++) {
        label[i] = argmaxLabel(W[i], label, i);
      }
    }

    // compress labels
    int map[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) map[i] = -1;
    int nLab = 0;
    for(int i=0;i<N_ASSETS;i++) {
      int l = label[i];
      if(l < 0 || l >= N_ASSETS) l = 0;
      if(map[l] < 0) map[l] = nLab++;
      communityId[i] = map[l];
    }
    if(nLab < 1) nLab = 1;
    nCommunities = nLab;

    // modularity approximation
    fvar m2 = 0;
    for(int i=0;i<N_ASSETS;i++) for(int j=0;j<N_ASSETS;j++) m2 += W[i][j];
    if(m2 < (fvar)EPS) {
      modularityQ = 0;
    } else {
      fvar q = 0;
      for(int i=0;i<N_ASSETS;i++) {
        for(int j=0;j<N_ASSETS;j++) {
          if(communityId[i] == communityId[j]) {
            q += W[i][j] - (degree[i] * degree[j] / m2);
          }
        }
      }
      modularityQ = q / m2;
    }

    qSmooth = (fvar)(1.0 - COMM_Q_EMA) * qSmooth + (fvar)COMM_Q_EMA * modularityQ;

    for(int i=0;i<N_ASSETS;i++) {
      int c = communityId[i];
      if(c < 0) c = 0;
      clusterCoarse[i] = c % HCLUST_COARSE_K;
      clusterFine[i] = c % HCLUST_FINE_K;
    }
  }
};

class StrategyController {
public:
  UnsupervisedModel unsup;
  RLAgent rl;
  PCAModel pca;
  GMMRegimeModel gmm;
  HMMRegimeModel hmm;
  KMeansRegimeModel kmeans;
  int dynamicTopK;
  double scoreScale;
  int regime;
  double adaptiveGamma;
  double adaptiveAlpha;
  double adaptiveBeta;
  double adaptiveLambda;
  double riskScale;
  int cooldown;

  StrategyController()
  : dynamicTopK(TOP_K), scoreScale(1.0), regime(0),
    adaptiveGamma(1.0), adaptiveAlpha(1.0), adaptiveBeta(1.0), adaptiveLambda(1.0), riskScale(1.0), cooldown(0) {}

  static double clampRange(double x, double lo, double hi) {
    if(x < lo) return lo;
    if(x > hi) return hi;
    return x;
  }

  void init() {
    unsup.init();
    rl.init();
    pca.init();
    gmm.init();
    hmm.init();
    kmeans.init();
    dynamicTopK = TOP_K;
    scoreScale = 1.0;
    regime = 0;
    adaptiveGamma = 1.0;
    adaptiveAlpha = 1.0;
    adaptiveBeta = 1.0;
    adaptiveLambda = 1.0;
    riskScale = 1.0;
    cooldown = 0;
  }

  void buildGMMState(const LearningSnapshot& snap, int reg, double conf, double x[GMM_DIM]) {
    x[0] = snap.meanScore;
    x[1] = snap.meanCompactness;
    x[2] = snap.meanVol;
    x[3] = pca.dom;
    x[4] = pca.rot;
    x[5] = (double)reg / 2.0;
    x[6] = conf;
    x[7] = snap.meanScore - snap.meanCompactness;
  }

  void buildHMMObs(const LearningSnapshot& snap, int reg, double conf, double x[HMM_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void buildKMeansState(const LearningSnapshot& snap, int reg, double conf, double x[KMEANS_DIM]) {
    x[0] = pca.latent[0];
    x[1] = pca.latent[1];
    x[2] = pca.latent[2];
    x[3] = snap.meanVol;
    x[4] = snap.meanScore;
    x[5] = snap.meanCompactness;
    x[6] = (double)reg / 2.0;
    x[7] = conf;
  }

  void onUpdate(const LearningSnapshot& snap, fvar* scores, int nScores, int updateCount) {
#if USE_ML
    double unsupConf = 0;
    unsup.update(snap, &regime, &unsupConf);
#if USE_PCA
    pca.update(snap, regime, unsupConf);
#else
    pca.dom = 0.5;
    pca.rot = 0.0;
#endif

#if USE_GMM
    double gx[GMM_DIM];
    buildGMMState(snap, regime, unsupConf, gx);
    gmm.infer(gx);
#if USE_HMM
    double hx[HMM_DIM];
    buildHMMObs(snap, regime, unsupConf, hx);
    hmm.filter(hx);
#if USE_KMEANS
    double kx[KMEANS_DIM];
    buildKMeansState(snap, regime, unsupConf, kx);
    kmeans.predictAndUpdate(kx);
#endif
#endif
    // regime presets: [gamma, alpha, beta, lambda]
    const double presets[GMM_K][4] = {
      {1.05, 1.00, 0.95, 1.00},
      {0.95, 1.05, 1.05, 0.95},
      {1.00, 0.95, 1.10, 1.05}
    };
    adaptiveGamma = 0;
    adaptiveAlpha = 0;
    adaptiveBeta  = 0;
    adaptiveLambda = 0;
    for(int k=0;k<GMM_K;k++) {
#if USE_HMM
      adaptiveGamma += hmm.posterior[k] * presets[k][0];
      adaptiveAlpha += hmm.posterior[k] * presets[k][1];
      adaptiveBeta  += hmm.posterior[k] * presets[k][2];
      adaptiveLambda += hmm.posterior[k] * presets[k][3];
#else
      adaptiveGamma += gmm.p[k] * presets[k][0];
      adaptiveAlpha += gmm.p[k] * presets[k][1];
      adaptiveBeta  += gmm.p[k] * presets[k][2];
      adaptiveLambda += gmm.p[k] * presets[k][3];
#endif
    }
#if USE_HMM
    double entNorm = hmm.entropy / log((double)HMM_K + EPS);
    riskScale = clampRange(1.0 - 0.45 * entNorm, HMM_MIN_RISK, 1.0);
    if(hmm.entropy > HMM_ENTROPY_TH || hmm.switchProb > HMM_SWITCH_TH) cooldown = HMM_COOLDOWN_UPDATES;
    else if(cooldown > 0) cooldown--;
#else
    double entNorm = gmm.entropy / log((double)GMM_K + EPS);
    riskScale = clampRange(1.0 - GMM_ENTROPY_COEFF * entNorm, GMM_MIN_RISK, 1.0);
#endif
#else
    adaptiveGamma = 1.0 + 0.35 * pca.dom - 0.25 * pca.rot;
    adaptiveAlpha = 1.0 + 0.30 * pca.dom;
    adaptiveBeta  = 1.0 + 0.25 * pca.rot;
    adaptiveLambda = 1.0 + 0.20 * pca.dom - 0.20 * pca.rot;
    riskScale = 1.0;
#endif

    adaptiveGamma = clampRange(adaptiveGamma, 0.80, 1.40);
    adaptiveAlpha = clampRange(adaptiveAlpha, 0.85, 1.35);
    adaptiveBeta  = clampRange(adaptiveBeta, 0.85, 1.35);
    adaptiveLambda = clampRange(adaptiveLambda, 0.85, 1.25);

#if USE_KMEANS
    const double kmPreset[KMEANS_K][4] = {
      {1.02, 1.00, 0.98, 1.00},
      {1.08, 0.96, 0.95, 1.02},
      {0.94, 1.08, 1.08, 0.92}
    };
    int kr = kmeans.regime;
    if(kr < 0) kr = 0;
    if(kr >= KMEANS_K) kr = KMEANS_K - 1;
    double wkm = clampRange(kmeans.stability, 0.0, 1.0);
    adaptiveGamma = (1.0 - wkm) * adaptiveGamma + wkm * kmPreset[kr][0];
    adaptiveAlpha = (1.0 - wkm) * adaptiveAlpha + wkm * kmPreset[kr][1];
    adaptiveBeta  = (1.0 - wkm) * adaptiveBeta  + wkm * kmPreset[kr][2];
    adaptiveLambda = (1.0 - wkm) * adaptiveLambda + wkm * kmPreset[kr][3];
    if(kmeans.stability < KMEANS_STABILITY_MIN) {
      riskScale *= 0.85;
      if(cooldown < 1) cooldown = 1;
    }
#endif

    rl.updateReward(snap.meanScore);
    rl.lastAction = rl.chooseAction(updateCount);

    int baseTopK = TOP_K;
    if(rl.lastAction == 0) baseTopK = TOP_K - 2;
    else if(rl.lastAction == 1) baseTopK = TOP_K;
    else if(rl.lastAction == 2) baseTopK = TOP_K;
    else baseTopK = TOP_K - 1;

    double profileBias[5] = {1.00, 0.98, 0.99, 0.97, 1.02};
    scoreScale = (1.0 + 0.06 * (adaptiveGamma - 1.0) + 0.04 * (adaptiveAlpha - 1.0) - 0.04 * (adaptiveBeta - 1.0))
               * profileBias[STRATEGY_PROFILE] * riskScale;

    if(pca.dom > 0.60) baseTopK -= 1;
    if(pca.rot > 0.15) baseTopK -= 1;
#if USE_HMM
    if(hmm.regime == 2) baseTopK -= 1;
    if(cooldown > 0) baseTopK -= 1;
#if USE_KMEANS
    if(kmeans.regime == 2) baseTopK -= 1;
#endif
#elif USE_GMM
    if(gmm.bestRegime == 2) baseTopK -= 1;
#endif

    dynamicTopK = baseTopK;
    if(dynamicTopK < 1) dynamicTopK = 1;
    if(dynamicTopK > TOP_K) dynamicTopK = TOP_K;

    for(int i=0; i<nScores; i++) {
      double s = (double)scores[i] * scoreScale;
      if(s > 1.0) s = 1.0;
      if(s < 0.0) s = 0.0;
      scores[i] = (fvar)s;
    }
#else
    (void)snap; (void)scores; (void)nScores; (void)updateCount;
#endif
  }
};

// ---------------------------- Strategy ----------------------------

class CrowdAverseStrategy {
public:
  ExposureTable exposureTable;
  FeatureBufferSoA featSoA;
  OpenCLBackend openCL;

  SlabAllocator<fvar> corrMatrix;
  SlabAllocator<fvar> distMatrix;
  SlabAllocator<fvar> compactness;
  SlabAllocator<fvar> entropy;
  SlabAllocator<fvar> scores;

  SlabAllocator<float> featLinear;
  SlabAllocator<float> corrLinear;

  int barCount;
  int updateCount;
  StrategyController controller;
  HierarchicalClusteringModel hclust;
  CommunityDetectionModel comm;

  CrowdAverseStrategy() : barCount(0), updateCount(0) {}

  void init() {
    printf("CrowdAverse_v11: Initializing...\n");

    exposureTable.init();
    featSoA.init(N_ASSETS, FEAT_WINDOW);

    corrMatrix.init(N_ASSETS * N_ASSETS);
    distMatrix.init(N_ASSETS * N_ASSETS);
    compactness.init(N_ASSETS);
    entropy.init(N_ASSETS);
    scores.init(N_ASSETS);

    featLinear.init(FEAT_N * N_ASSETS * FEAT_WINDOW);
    corrLinear.init(N_ASSETS * N_ASSETS);

    openCL.init();
    printf("CrowdAverse_v11: Ready (OpenCL=%d)\n", openCL.ready);
    controller.init();
    hclust.init();
    comm.init();

    barCount = 0;
    updateCount = 0;
  }

  void shutdown() {
    printf("CrowdAverse_v11: Shutting down...\n");

    openCL.shutdown();

    featSoA.shutdown();
    corrMatrix.shutdown();
    distMatrix.shutdown();
    compactness.shutdown();
    entropy.shutdown();
    scores.shutdown();

    featLinear.shutdown();
    corrLinear.shutdown();
  }

  void computeFeatures(int assetIdx) {
    asset((char*)ASSET_NAMES[assetIdx]);

    vars C = series(priceClose(0));
    vars V = series(Volatility(C, 20));

    if(Bar < 50) return;

    fvar r1 = (fvar)log(C[0] / C[1]);
    fvar rN = (fvar)log(C[0] / C[12]);
    fvar vol = (fvar)V[0];
    fvar zscore = (fvar)((C[0] - C[50]) / (V[0] * 20.0 + EPS));
    fvar rangeP = (fvar)((C[0] - C[50]) / (C[0] + EPS));
    fvar flow = (fvar)(r1 * vol);
    fvar regime = (fvar)((vol > 0.001) ? 1.0 : 0.0);
    fvar volOfVol = (fvar)(vol * vol);
    fvar persistence = (fvar)fabs(r1);

    featSoA.push(0, assetIdx, r1);
    featSoA.push(1, assetIdx, rN);
    featSoA.push(2, assetIdx, vol);
    featSoA.push(3, assetIdx, zscore);
    featSoA.push(4, assetIdx, rangeP);
    featSoA.push(5, assetIdx, flow);
    featSoA.push(6, assetIdx, regime);
    featSoA.push(7, assetIdx, volOfVol);
    featSoA.push(8, assetIdx, persistence);
  }

  fvar computeEntropy(int assetIdx) {
    fvar mean = 0;
    for(int t=0; t<FEAT_WINDOW; t++) mean += featSoA.get(0, assetIdx, t);
    mean /= FEAT_WINDOW;
    fvar var = 0;
    for(int t=0; t<FEAT_WINDOW; t++) { fvar d = featSoA.get(0, assetIdx, t) - mean; var += d*d; }
    return (fvar)(var / FEAT_WINDOW);
  }

  void computeCorrelationMatrixCPU() {
    for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = 0;

    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int b=a+1; b<N_ASSETS; b++){
          fvar mx = 0, my = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            mx += featSoA.get(f,a,t);
            my += featSoA.get(f,b,t);
          }
          mx /= (fvar)FEAT_WINDOW;
          my /= (fvar)FEAT_WINDOW;

          fvar sxx = 0, syy = 0, sxy = 0;
          for(int t=0; t<FEAT_WINDOW; t++){
            fvar dx = featSoA.get(f,a,t) - mx;
            fvar dy = featSoA.get(f,b,t) - my;
            sxx += dx*dx;
            syy += dy*dy;
            sxy += dx*dy;
          }

          fvar den = (fvar)sqrt((double)(sxx*syy + (fvar)EPS));
          fvar corr = 0;
          if(den > (fvar)EPS) corr = sxy / den;
          else corr = 0;

          int idx = a*N_ASSETS + b;
          corrMatrix[idx] += corr / (fvar)FEAT_N;
          corrMatrix[b*N_ASSETS + a] = corrMatrix[idx];
        }
      }
    }
  }

  void buildFeatLinear() {
    int idx = 0;
    for(int f=0; f<FEAT_N; f++){
      for(int a=0; a<N_ASSETS; a++){
        for(int t=0; t<FEAT_WINDOW; t++){
          featLinear[idx] = (float)featSoA.get(f, a, t);
          idx++;
        }
      }
    }
  }

  void computeCorrelationMatrix() {
    if(openCL.ready) {
      buildFeatLinear();

      for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrLinear[i] = 0.0f;

      int ok = openCL.computeCorrelationMatrixCL(
        featLinear.data,
        corrLinear.data,
        N_ASSETS,
        FEAT_N,
        FEAT_WINDOW
      );

      if(ok) {
        for(int i=0;i<N_ASSETS*N_ASSETS;i++) corrMatrix[i] = (fvar)0;

        for(int a=0; a<N_ASSETS; a++){
          corrMatrix[a*N_ASSETS + a] = (fvar)1.0;
          for(int b=a+1; b<N_ASSETS; b++){
            float c = corrLinear[a*N_ASSETS + b];
            corrMatrix[a*N_ASSETS + b] = (fvar)c;
            corrMatrix[b*N_ASSETS + a] = (fvar)c;
          }
        }
        return;
      }

      printf("OpenCL: runtime fail -> CPU fallback\n");
      openCL.ready = 0;
    }

    computeCorrelationMatrixCPU();
  }

  void computeDistanceMatrix() {
    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        if(i == j) {
          distMatrix[i*N_ASSETS + j] = (fvar)0;
        } else {
          fvar corrDist = (fvar)1.0 - (fvar)fabs((double)corrMatrix[i*N_ASSETS + j]);
          fvar expDist  = (fvar)exposureTable.getDist(i, j);
          fvar blended = (fvar)LAMBDA_META * corrDist + (fvar)(1.0 - (double)LAMBDA_META) * expDist;
          distMatrix[i*N_ASSETS + j] = blended;
        }
      }
    }
  }

  void floydWarshall() {
    fvar d[28][28];

    for(int i=0;i<N_ASSETS;i++){
      for(int j=0;j<N_ASSETS;j++){
        d[i][j] = distMatrix[i*N_ASSETS + j];
        if(i == j) d[i][j] = (fvar)0;
        if(d[i][j] < (fvar)0) d[i][j] = (fvar)INF;
      }
    }

    for(int k=0;k<N_ASSETS;k++){
      for(int i=0;i<N_ASSETS;i++){
        for(int j=0;j<N_ASSETS;j++){
          if(d[i][k] < (fvar)INF && d[k][j] < (fvar)INF) {
            fvar nk = d[i][k] + d[k][j];
            if(nk < d[i][j]) d[i][j] = nk;
          }
        }
      }
    }

    for(int i=0;i<N_ASSETS;i++){
      fvar w = 0;
      for(int j=i+1;j<N_ASSETS;j++){
        if(d[i][j] < (fvar)INF) w += d[i][j];
      }
      if(w > (fvar)0) compactness[i] = (fvar)(1.0 / (1.0 + (double)w));
      else compactness[i] = (fvar)0;
      entropy[i] = computeEntropy(i);
    }
  }

  void computeScores() {
    for(int i=0;i<N_ASSETS;i++){
      fvar coupling = 0;
      int count = 0;

      for(int j=0;j<N_ASSETS;j++){
        if(i != j && distMatrix[i*N_ASSETS + j] < (fvar)INF) {
          coupling += compactness[j];
          count++;
        }
      }

      fvar pCouple = 0;
      if(count > 0) pCouple = coupling / (fvar)count;
      else pCouple = (fvar)0;

      fvar C_A = compactness[i];
      fvar Ent = entropy[i];

      fvar rawScore = (fvar)ALPHA * Ent + (fvar)GAMMA * C_A - (fvar)BETA * pCouple;

      if(rawScore > (fvar)30) rawScore = (fvar)30;
      if(rawScore < (fvar)-30) rawScore = (fvar)-30;

      scores[i] = (fvar)(1.0 / (1.0 + exp(-(double)rawScore)));
    }
  }

  LearningSnapshot buildSnapshot() {
    LearningSnapshot s;
    s.meanScore = 0; s.meanCompactness = 0; s.meanVol = 0;
    for(int i=0;i<N_ASSETS;i++) {
      s.meanScore += (double)scores[i];
      s.meanCompactness += (double)compactness[i];
      s.meanVol += (double)featSoA.get(2, i, 0);
    }
    s.meanScore /= (double)N_ASSETS;
    s.meanCompactness /= (double)N_ASSETS;
    s.meanVol /= (double)N_ASSETS;
    s.regime = 0;
    s.regimeConfidence = 0;
    return s;
  }

  void onBar() {
    barCount++;

    for(int i=0;i<N_ASSETS;i++) computeFeatures(i);

    if(barCount % UPDATE_EVERY == 0) {
      updateCount++;

      computeCorrelationMatrix();
      computeDistanceMatrix();
#if USE_COMMUNITY
      hclust.update(distMatrix.data);
#endif
#if USE_COMMUNITY
      comm.update(corrMatrix.data, distMatrix.data);
#endif
      floydWarshall();
      computeScores();
      controller.onUpdate(buildSnapshot(), scores.data, N_ASSETS, updateCount);
      printTopK();
    }
  }

  void printTopK() {
    int indices[N_ASSETS];
    for(int i=0;i<N_ASSETS;i++) indices[i] = i;

    int topN = controller.dynamicTopK;
#if USE_COMMUNITY
    if(comm.qSmooth < (fvar)COMM_Q_LOW && topN > 2) topN--;
    if(comm.qSmooth > (fvar)COMM_Q_HIGH && topN < TOP_K) topN++;
#endif
    for(int i=0;i<topN;i++){
      for(int j=i+1;j<N_ASSETS;j++){
        if(scores[indices[j]] > scores[indices[i]]) {
          int tmp = indices[i];
          indices[i] = indices[j];
          indices[j] = tmp;
        }
      }
    }

    if(updateCount % 10 == 0) {
      printf("===CrowdAverse_v11 Top-K(update#%d,OpenCL=%d)===\n",
        updateCount, openCL.ready);
#if USE_COMMUNITY
      printf(" communities=%d Q=%.4f\n", comm.nCommunities, (double)comm.qSmooth);
#endif

      int selected[N_ASSETS];
      int selCount = 0;
#if USE_COMMUNITY
      int coarseUsed[HCLUST_COARSE_K];
      int fineTake[HCLUST_FINE_K];
      int fineCap = (topN + HCLUST_FINE_K - 1) / HCLUST_FINE_K;
      for(int c=0;c<HCLUST_COARSE_K;c++) coarseUsed[c] = 0;
      for(int c=0;c<HCLUST_FINE_K;c++) fineTake[c] = 0;

      for(int i=0;i<topN;i++){
        int idx = indices[i];
        int cid = comm.clusterCoarse[idx];
        if(cid < 0 || cid >= HCLUST_COARSE_K) cid = 0;
        if(coarseUsed[cid]) continue;
        coarseUsed[cid] = 1;
        selected[selCount++] = idx;
        int fid = comm.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        fineTake[fid]++;
      }

      for(int i=0;i<topN && selCount<topN;i++){
        int idx = indices[i];
        int dup = 0;
        for(int k=0;k<selCount;k++) if(selected[k]==idx){ dup=1; break; }
        if(dup) continue;
        int fid = comm.clusterFine[idx];
        if(fid < 0 || fid >= HCLUST_FINE_K) fid = 0;
        if(fineTake[fid] >= fineCap) continue;
        selected[selCount++] = idx;
        fineTake[fid]++;
      }
#else
      for(int i=0;i<topN;i++) selected[selCount++] = indices[i];
#endif
      for(int i=0;i<selCount;i++){
        int idx = selected[i];
        printf(" %d.%s: score=%.4f, C=%.4f, Ent=%.6f\n", i+1, ASSET_NAMES[idx], (double)scores[idx], (double)compactness[idx], (double)entropy[idx]);
      }
    }
  }
};

// ---------------------------- Zorro DLL entry ----------------------------

static CrowdAverseStrategy* S = NULL;

DLLFUNC void run()
{
  if(is(INITRUN)) {
    BarPeriod = 60;
    LookBack = max(LookBack, FEAT_WINDOW + 50);

    asset((char*)ASSET_NAMES[0]);

    if(!S) {
      S = new CrowdAverseStrategy();
      S->init();
    }
  }

  if(is(EXITRUN)) {
    if(S) {
      S->shutdown();
      delete S;
      S = NULL;
    }
    return;
  }

  if(!S || Bar < LookBack)
    return;

  S->onBar();
}

Page 19 of 20 1 2 17 18 19 20

Moderated by  Petra 

Powered by UBB.threads™ PHP Forum Software 7.7.1