I went and did some tests on bmap_blit() vs bmap_copy(). I ran the following:

Code:
PANEL* info_panel =
{   pos_x = 20;
    pos_y = 20;
    flags = SHOW;
    digits = 0,0,3.0,standard_font,1,test;
    digits = 0,10,3.0,standard_font,1,test2;
}

BMAP* original_bmap = "jug_icon.tga";
BMAP* clone_bmap;

function *bmap_copy(BMAP* target_bmp, BMAP* source_bmp)
{   target_bmp.width = source_bmp.width;
    target_bmp.height = source_bmp.height;
    target_bmp.bytespp = source_bmp.bytespp;
    target_bmp.flags = source_bmp.flags;
    target_bmp.u1 = source_bmp.u1;
    target_bmp.v1 = source_bmp.v1;
    target_bmp.u2 = source_bmp.u2;
    target_bmp.v2 = source_bmp.v2;
    target_bmp.u = source_bmp.u;
    target_bmp.v = source_bmp.v;
    target_bmp.refcount = source_bmp.refcount;
    target_bmp.finalwidth = source_bmp.finalwidth;
    target_bmp.finalheight = source_bmp.finalheight;
    target_bmp.finalbytespp = source_bmp.finalbytespp;
    target_bmp.pitch = source_bmp.pitch;
    target_bmp.finalpitch = source_bmp.finalpitch;
    target_bmp.miplevels = source_bmp.miplevels;
    target_bmp.finalformat = source_bmp.finalformat;
    target_bmp.finalbits = NULL;
    target_bmp.d3dtex = NULL;
    target_bmp.pixels = malloc((source_bmp.width*source_bmp.height)*source_bmp.bytespp);
    memcpy(target_bmp.pixels, source_bmp.pixels, (source_bmp.width*source_bmp.height)*source_bmp.bytespp);
}

function main()
{   var count = 0;
    clone_bmap = bmap_createblack(32,32,24);
    timer();
    while(count<1000)
    {   bmap_copy(clone_bmap,original_bmap);
        count += 1;
    }
    test = timer();
    count = 0;
    timer();
    while(count<1000)
    {   bmap_blit(clone_bmap,original_bmap,NULL,NULL);
        count += 1;
    }
    test2 = timer();
}



Not the most accurate test in the world, i realize, but i just wanted to get a rough idea of how fast bmap_copy vs bmap_blit would be. i found that 1000 calls to bmap_copy took ~950 microseconds and 1000 calls to bmap_blit took ~947 microseconds.

So bmap_blit really is a fast (even slightly faster) than using memcpy directly. I find that more than a little surprising considering that it has to be doing more than just a simple memcpy(). I'm also not understanding why the manual says that the bmap_blit command is slow. memcpy() is about as fast as it gets, and bmap_blit is at least as fast as it - how is that a "slow" command?

Anyways, since bmap_blit *is* as fast as a memcpy() and doesnt have any of the problems that bmap_copy() has, i'm definitly using HeelX's solution. Thank you all for all the advice.