vec_set or setting x,y,z manually

Posted By: Reconnoiter

vec_set or setting x,y,z manually - 07/20/14 12:30

what is slower (as in fps) vec_set or setting x,y,z of an entity manually? Or is there no difference?

e.g.:

Code:
vec_set (pointer_blabla.x, my.x);



or

Code:
pointer_blabla.x = my.x; pointer_blabla.y = my.y; pointer_blabla.z = my.z;

Posted By: Kartoffel

Re: vec_set or setting x,y,z manually - 07/20/14 13:12

if one of these is faster, it should be vec_set.
Posted By: Reconnoiter

Re: vec_set or setting x,y,z manually - 07/20/14 16:40

Tnx, that would make sense. I was hestitating since 'speed: fast' is mentioned in manual under vec_set, and not really knowing how 'fast' fast is.
Posted By: WretchedSid

Re: vec_set or setting x,y,z manually - 07/20/14 17:29

It should be setting it directly, actually. There is no 12 byte move instruction, so it also has to do three moves, plus you get the method call overhead as well as having to push the values to the stack and probably storing the framepointer (if Acknex does that. Who knows anyways?!).

However, it doesn't matter. Whatever you think, this isn't and never will be your performance bottleneck. Use a profiler, see where it is slow and optimize these parts. These micro-optimizations are useless and you should pick whatever style you prefer, not what might get retired 30 μops earlier in the processor.
Posted By: CyberGhost

Re: vec_set or setting x,y,z manually - 07/20/14 17:39

I think they're nearly the same (same algorithm). I think directly setting is a bit faster but will take time writing in the code
Posted By: DLively

Re: vec_set or setting x,y,z manually - 07/20/14 18:13

Just skip past this...

Think of it this way: (and someone correct me if im wrong)

how long does it take for you to read:
vec_set(my.x,you.x);

now read this out:
my.x = you.x;
my.y = you.y;
my.z = you.z;

Which one took you longer to read? Thus, a CPU is obviously going to read both of these very very very quickly, however less for it to read, means it can move to the next line of code faster.
Posted By: WretchedSid

Re: vec_set or setting x,y,z manually - 07/20/14 18:59

DLively, not sure if you are serious or not, but this isn't how computers work.

Even very simplified grammars like that of C (simplified compared to English) can't be run on a CPU directly, it's for human consumption only. You need a compiler to compile it down to instruction that the CPU can actually understand, and they have little to no resemblance to what you would write in a high level language like C.

A CPU provides a so called instruction set, a set of instruction that it is capable of executing. You will almost always find instruction to read bytes from memory into CPU registers, store CPU registers into memory, do arithmetic operations on register contents and branching instructions to alter the flow of execution. The instruction set you are (unknowingly) working with is called x86, and it's THE desktop instruction set. It is old as fuck, grew over the years to an absolute beast and is supported by Intel and AMD. On mobile and embedded devices you will most commonly find ARM CPUs which provide the ARM instruction set, completely different than the x86 one. Of course, newer generations of AMD and Intel CPUs also support the x86-64 instruction set, the 64bit instruction set that is backwards compatible with x86 but adds 64bit support and a slew of other instructions (and deprecations, when the CPU is run in 64bit mode).

x86 is what is commonly called a CISC instruction set. CISC stands for "complex instruction set computer", and it's an idea from 80's which basically translates to: Let's add as many highly specialized instructions as possible onto the CPU. As a result, x86 CPUs have support for hardware random number generation, AES encryption and decryption and a shit ton of other stuff. But, it's still very low level and a far cry from anything high level such as C and its standard library.

Let's assume a function that operates on two vectors and copies their x/y/z components. The compiler will generate machine code that first loads the addresses of the pointers to the two vectors into two registers, and then code that moves three times 4 bytes from one address to the other.

In assembler, this looks like this (comments mine, the code was generated by the Clang compiler):
Code:
mov    0x11e004, %eax // Load the address (0x11e004) of the source vector into the EAX register
mov    (%eax), %eax   // Load the 4 bytes at the address found in the EAX register from RAM into the EAX register
mov    0x11e000, %ecx // Load the address (0x11e000) of the destination vector into the ECX register
mov    %eax, (%ecx)   // Store the contents of the EAX register into the address that the ECX register points to (aka transfer into memory)

// Same deal, but, the pointers are offset by 4 bytes (this is the y component)
mov    0x11e004, %eax 
mov    0x4(%eax), %eax
mov    0x11e000, %ecx
mov    %eax, 0x4(%ecx)

// And again, this time with an offset of 8 bytes (this is the z component)
mov    0x11e004, %eax
mov    0x8(%eax), %eax
mov    0x11e000, %ecx
mov    %eax, 0x8(%ecx)

ret // Return to the caller



You may or may not have noticed that despite x86 being a CISC instruction set, it doesn't have instructions to move data directly in memory. You first have to load it into a CPU register and then store it.

And no, this is NOT what the CPU sees. Assembler, again, is for human consumption only. It is more closely to what the CPU will see eventually, but it's still not quite there. It does lack a lot of the high level niceness of C though. In C, the very same would look like this (in fact, this is what I threw at the compiler to get the assembly from earlier):
Code:
void test()
{
        // vec1 is 0x11e000 and vec2 is 0x11e004
	vec1->x = vec2->x;
	vec1->y = vec2->y;
	vec1->z = vec2->z;
}



If you have assembler code, you throw it at an assembler which finally generates the actual machine code out of this. What the CPU will see eventually is the following (same format of the assembler code. Each line is one instruction):
Code:
a1 04 e0 11 00 
8b 00
8b 0d 00 e0 11 00
89 01
a1 04 e0 11 00
8b 40 04
8b 0d 00 e0 11 00
89 41 04
a1 04 e0 11 00
8b 40 08
8b 0d 00 e0 11 00
89 41 08
c3



Except, it's still not exactly what the CPU sees, because this is, again, for human consumption. It IS what the CPU sees in regards to that the hex numbers represent the value of one byte each, but the CPU doesn't see it as text but consumes the bytes).

And here is where thing start to get immensely complex. Modern CPUs are absolute beasts in what they do. They are beyond fucked up and the things that are done to allow as man instruction to retire as fast as possible are insane. You could fill books with one CPU generation alone. Things have come a long way since the first steps in micro-processors, and the worst thing that got in the way were the laws of physics. I'll spare you that for now, mostly because it would require writing at least five more paragraphs to explain a couple of more things in high level before even considering an actual CPU.

For completeness sake though, (you can skip everything now), ARM has what is commonly known as a RISC instruction set, where the R stands for reduced. It has a couple of very general instructions that you have to use together to get the specialized behaviour that you might have gotten out of a CISC instruction set. Usually RISC instruction sets are easier for the CPU to execute because instructions have a fixed length (note how above, the instructions have varying lengths), so things like instruction fetching can be done faster. Also this whole micro-ops thing which I'm not getting into for today.

Questions? I have barely covered anything and left out a huge deal of information, so if anything seems incoherent, ask ahead.

Edit: Google keywords that might be interesting to get some deeper knowledeg into how CPUs work and why they are the way they are (this is an incredibly deep rabbit hole. Beware):
- Pipelined CPU design
- Superscalar CPUs
- CISC
- RISC
- x86 instruction set
- MIPS instruction set
- Out of order execution
- Register renaming
- Signal propagation
- Contamination delay
Posted By: DLively

Re: vec_set or setting x,y,z manually - 07/20/14 20:12

Wow JustSid! I was serious blush aha...
Thanks for the lesson laugh

I have a somewhat better understanding of how CPUS work (Not that your information wasn't in depth enough, or helpful xD, but I'm still very new to that area in computers, so my mind is still wrapping around this new area (as I read it over again)... crazy ) - I do however plan on taking a leap down that rabbit hole when I have more time on my hands, thus helpping me create my games more efficiently and understand why they work they the way they do.

I had an idea of how complex computers are, but wow... that's incredible. I can't believe you wrote all that! Your knowledge is astonishing! Your an awesome individual Sid.

Thanks again bro laugh
DLively.
Posted By: WretchedSid

Re: vec_set or setting x,y,z manually - 07/20/14 20:30

Originally Posted By: DLively
I had an idea of how complex computers are, but wow... that's incredible.

It barely touched the complex stuff yet, I'm afraid! tongue
But I'm happy to clear up at least some misconceptions laugh

I always wanted to write a "How CPUs work, from a developers perspective" kind of blogpost which goes into the nitty gritty detailed stuff that one might want to be aware of for performance sakes, but I'm really bad at explaining stuff because I love to assume that people just know things already or get all bits skimmed over from the context. I might do that one day though, and it's going to be a very very long blogpost.

But I can only strongly suggest everyone to get at least a high level overview of these things. There are a lot of things that may appear counter intuitive that result in better performance and sometimes a small change will result in a heavy performance hit for no obvious reason at all until you take a look at it from the CPUs perspective. Of course, modern CPUs have enough power to just mask this, but especially games with a really tight performance budget will run into these things. Also, it's a ton of fun to learn new things.
Posted By: Reconnoiter

Re: vec_set or setting x,y,z manually - 07/21/14 08:17

Interesting read JustSid.

Quote:
this isn't and never will be your performance bottleneck.
, I knew that about simple lines like the pointer_blabla.x = my.x;. I think they should change fast to very fast in the manual, cause (at least I think) fast has a ring to it that you still need to be care of it cause otherwise you would think they just omitted it entirely in the manual.

Quote:
but I'm really bad at explaining stuff because I love to assume that people just know things already or get all bits skimmed over from the context. I might do that one day though, and it's going to be a very very long blogpost.
, I think you can explain quite well, at least I was being able to follow it grin
Posted By: WretchedSid

Re: vec_set or setting x,y,z manually - 07/21/14 11:25

Originally Posted By: Reconnoiter
I think they should change fast to very fast in the manual, cause (at least I think) fast has a ring to it that you still need to be care of it cause otherwise you would think they just omitted it entirely in the manual.

Honestly, the performance mentioned in the manual is ambiguous at best and downright dangerous at worst.

Firstly, every function in there, not including blocking I/O operations, is blazing fast. Most will execute in the nanosecond area with some maybe taking up to a microsecond. By themselves, you will never ever notice any kind of performance hit from calling them. What matters is context, you can very well write code that performs incredibly bad despite only using functions marked as "fast", if you deliberately, or unknowingly, play against the CPUs branch predictors and pre-fetchers. And of course you can do it the other way around as well, writing code that only uses "slow" functions which achieves much much better performance by tweaking it so that the generated code plays perfectly with the CPU and is ordered in such a way that it has the maximum throughput you can get.

The thing is, context matters. A lot. You really ought to ignore the performance remarks in judging how fast your code will be, because 99% of the time, it will be blazing fast. If your game runs slow, get out a profiler to see what actually takes the time and then optimize these hotspots. Everything else is just wasted time.
Posted By: Quad

Re: vec_set or setting x,y,z manually - 07/21/14 13:09

Originally Posted By: JustSid
If your game runs slow, get out a profiler to see what actually takes the time and then optimize these hotspots. Everything else is just wasted time.

And most of the time that "hotspot" will be rendering.
if your game is a relatively small project and unless you do something immensely stupid you will probably hit gpu bottlenecks long before cpu bottlenecks. And that will also probably be something stupid or unecessary(too much polygons or complex effects for scenery fillers/small props, missing lods etc. or I/O can cause hiccups when loading large shit on runtime)
© 2023 lite-C Forums