Gamestudio Links
Zorro Links
Newest Posts
Blobsculptor tools and objects download here
by NeoDumont. 03/28/24 03:01
Issue with Multi-Core WFO Training
by aliswee. 03/24/24 20:20
Why Zorro supports up to 72 cores?
by Edgar_Herrera. 03/23/24 21:41
Zorro Trader GPT
by TipmyPip. 03/06/24 09:27
VSCode instead of SED
by 3run. 03/01/24 19:06
AUM Magazine
Latest Screens
The Bible Game
A psychological thriller game
SHADOW (2014)
DEAD TASTE
Who's Online Now
3 registered members (AndrewAMD, Nymphodora, Quad), 919 guests, and 6 spiders.
Key: Admin, Global Mod, Mod
Newest Members
sakolin, rajesh7827, juergen_wue, NITRO_FOREVER, jack0roses
19043 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
Sending a large 2-dimensional array to R #475766
01/07/19 14:56
01/07/19 14:56
Joined: Aug 2016
Posts: 27
M
MaskOfZorro Offline OP
Newbie
MaskOfZorro  Offline OP
Newbie
M

Joined: Aug 2016
Posts: 27
I need to run linear regressions in R on the time series of a universe of stocks, which can be accomplished very simply using lm(), and while I have a solution that uses cbind to construct the data frame of prices from Zorro asset-by-asset, this is painfully slow, and I can't get the way I would think it should be done to work.

I've tried two ways of doing this, each with its own problems.


1) I set up vars Prices[NUM_ASSETS], but while this works easily for a single asset i using Rset("y",Prices[i],DAYS), I'm not sure of what Rset syntax to use to feed the entire array into an R object. Everything I've tried yields nonsense, that's to say, the R object gives values like 9.22632259801047e-238 when I check its contents.


2) I set up var Prices[NUM_ASSETS][DAYS] and feed the price series into this array using a for loop, with the intention of using Rset("y",Prices,DAYS,NUM_ASSETS). NUM_ASSETS = 505, and while this works for a very small number for DAYS, in all other scenarios the script invariably crashes with the for loop for larger numbers, either terminating with Error 111: Crash in function: run() at bar 0 or just crashing Zorro.

The script

Quote:
#include <default.c>
#include <r.h>

#define NUM_ASSETS 505
#define DAYS 90

int asset_num;


function run()
{
StartDate = 20180101;
BarPeriod = 60*24;
LookBack = DAYS;
Commission = Spread = Slippage = 0;

int i;
vars Price;
//vars Prices[NUM_ASSETS]; //for 1)
var Prices[NUM_ASSETS][DAYS]; // for 2)


assetList("History\AssetsSP500.csv");

asset_num = 0;

while(loop(Assets))
{
asset(Loop1);

if(priceClose(0)!= 0 && priceClose(DAYS) != 0 && asset_num < NUM_ASSETS)
{

//Prices[asset_num] = series(priceClose(),DAYS); //for 1)
Price = series(priceClose(),DAYS); // for 2)

for(i=0;i<DAYS;i++) //for 2)
Prices[asset_num][i]= Price[i];
}
asset_num++;
}

// if(!(is(LOOKBACK))) // R code
// {
// Rstart("",2);
// Rset("y",Prices,DAYS,NUM_ASSETS);
// Rx("y",3);
// }
}

Re: Sending a large 2-dimensional array to R [Re: MaskOfZorro] #475776
01/07/19 18:54
01/07/19 18:54
Joined: Aug 2016
Posts: 27
M
MaskOfZorro Offline OP
Newbie
MaskOfZorro  Offline OP
Newbie
M

Joined: Aug 2016
Posts: 27
This is the code I'm trying now, and it yields a matrix in R of the right dimensions, but its contents are again total bs:


[2,] 2.29175545475632e-312 5.98860202721352e-67 7.38971488557103e-130

[3,] 1.74969279245251e-113 1.73618583701266e-218 1.03164333975554e-312

[4,] 1.66834019294894e-267 3.22670143784907e-94 1.85094398490310e-313

etc.

Is this a typecasting issue? How do I get the right price figures into the matrix?

Quote:
#include <default.c>
#include <r.h>

#define NUM_ASSETS 505
#define DAYS 50

int asset_num;
vars Prices[NUM_ASSETS];

function run()
{
StartDate = 20100101;
BarPeriod = 60*24;
LookBack = DAYS;
Commission = Spread = Slippage = 0;

int i;

assetList("History\AssetsSP500.csv");

asset_num = 0;

while(loop(Assets))
{
asset(Loop1);

if(priceClose(0)!= 0 && priceClose(DAYS) != 0 && asset_num < NUM_ASSETS)
{
Prices[asset_num] = series(priceClose(),DAYS);
}
else if(asset_num == NUM_ASSETS)
break;
asset_num++;
}

if(!(is(LOOKBACK)))
{
Rstart("",2);
Rset("y",Prices,DAYS,NUM_ASSETS);
Rx("y",3);
}
}

Re: Sending a large 2-dimensional array to R [Re: MaskOfZorro] #475822
01/09/19 07:55
01/09/19 07:55
Joined: Jul 2000
Posts: 27,977
Frankfurt
jcl Offline

Chief Engineer
jcl  Offline

Chief Engineer

Joined: Jul 2000
Posts: 27,977
Frankfurt
Your Prices variable contains no prices, but pointers. The R language has no pointers. Do really send an array of prices to R and also check if you need a row-major or column-major matrix.

Re: Sending a large 2-dimensional array to R [Re: MaskOfZorro] #478086
09/05/19 21:11
09/05/19 21:11
Joined: Jan 2019
Posts: 73
berlin
L
laz Offline
Junior Member
laz  Offline
Junior Member
L

Joined: Jan 2019
Posts: 73
berlin
I can add some stuff here, because i struggled with similar things.... Some years (2-3) ago i have rewritten the R-bridge (from https://github.com/micclly/mt4R) in free-pascal and used the dll in MT4 very often, i can still remember some details... But don't trust me, try/check it...

1. I would not send that much data over the R-bridge, the good thing is, the bridge does change the transfer mode depending on the type/length of that data (less data = string transfer / more data = chunks or bin files), BUT no matter how, it (the overhead + R's readBin into a matrix) makes it "slow" and you can see (debug only with DebugView.exe) nearly nothing from inside the bridge (just my experience, see TRConsole.AssignVector() / TRConsole.AssignMatrix() in the R-bridge source for details)...

2. Use the R-bridge only for sending short commands, integers, doubles, small strings... Small amounts of data, write all the other stuff in csv files, only send a short command to R to start reading the files. Use data.table fread/fwrite functions, they are pretty fast... And after you now have full control and can see everything, you can choose the path where the files are located... So - if it is not fast enough - use a RAM-DISK to store/transfer files... (normally faster than a SSD)

3. Don't use data-frames in R, they do copy-on-write (modify) that's why they are very slow and inefficient, use data.table. The data.table is (sometimes) harder to use - but - has a lot of advantages and is much faster. It also supports multi-core writing/reading. Use "tracemem" and "microbenchmark" to see what it does, and how fast...

Because of so many problems with speed between Zorro<->R i have now changed the Zorro workflow for my system. I'm now writing my own *.csv files (not via advise(L/S)) for TRAIN and TEST, read them from a RAM-DISK into data.tables in R and use caret (multi-core) to train models on it.

You can also get some more speed by changing how often you create/transfer data. Is it really needed to export the same data again and again? Watch what Zorro normally does...

And!!! Check the R library "memoise", it can cache ANY function call in R + its result (on the RAM-DISK) and store/return that... Even multiple threads can share that! No need to calculate things twice smile




Last edited by laz; 09/05/19 22:10.

Moderated by  Petra 

Gamestudio download | chip programmers | Zorro platform | shop | Data Protection Policy

oP group Germany GmbH | Birkenstr. 25-27 | 63549 Ronneburg / Germany | info (at) opgroup.de

Powered by UBB.threads™ PHP Forum Software 7.7.1