I guess removing c_move is not really important, it was just a thought. But its probably not worth the trouble.
I took a look at you code and was wondering if embedding instructions into eachother improves speed or not.
If they don't, even though your code looks alot more compact it has the same count of instructions:
ITEM MY YOURS
vec_set 3 3
vec_sub 3 3
vec_add 2 2
vec_length 2 2
vec_normalize 2 2
c_trace 1 1
* 1 1
- 1 0
+ 0 1
local_vars 2 1
external_vars 2 3
Note: I didn't take into account the extra if and return your function has, since I think its a nice safeguard and will incorporate that into mine (and didn't count the extra debug instructions in mine since they will be removed).
I'm still thinking how to reduce the amount of instructions needed though (It may not be that critical, but I'm usually very strict with my code's speed and efficiency), function by function little ineficient code can end up acumulating to something noticeable. I am used to first getting things to work and then revising it's speed efficiency (even where aparently not necessary), but it will have to wait, have to see my gf now, and go to my father's birthday later tonight.