mem_cpy, programmers first choice in copying and scaling bitmaps since 1872.
Gamestudio quote of the day!

I wonder if all bmap_blit is doing is just a memcpy itself? I find the nearly identical run times for the two different commands to be a little suspicious.
Bitmaps can be considered as really huge pieces of data. In most cases you are dealing with RGBA textures and this means 4bytes per pixel, which means that a 512x512 bitmap is 1 MB big. This is not much at all and this should be really fast even with bmap_blit, but if you consider scaling, offsetted copying and cropping (everything supported by bmap_blit and bmap_blitpart), you get additional overhead. Multiply that with the amount of copy-instructions and if you use even bigger bitmaps - good luck!
That is why it is not only marked with "slow", but with "slow (depends on scaling and on whether the bitmap is visible or used as a render target)". Another fact is, that mem_cpy is for sure single threaded. You can actually make your own bmap_copy function faster, if you open e.g. four threads, use a construct like a countdown latch or so for synchronizing and copy the memory data in parallel.
That is, by the way, also the reason why the processing via bmap_process takes only a fraction of the time, because fragment shader processing is ran in parallel on a GPU.
Nevertheless, I doubt that you do any good with creating copies of a bitmap while NOT hinging them into the internal C_LINK structure.