Gamestudio Links
Zorro Links
Newest Posts
Data from CSV not parsed correctly
by EternallyCurious. 04/18/24 10:45
StartWeek not working as it should
by Zheka. 04/18/24 10:11
folder management functions
by VoroneTZ. 04/17/24 06:52
lookback setting performance issue
by 7th_zorro. 04/16/24 03:08
zorro 64bit command line support
by 7th_zorro. 04/15/24 09:36
Zorro FIX plugin - Experimental
by flink. 04/14/24 07:48
Zorro FIX plugin - Experimental
by flink. 04/14/24 07:46
AUM Magazine
Latest Screens
The Bible Game
A psychological thriller game
SHADOW (2014)
DEAD TASTE
Who's Online Now
1 registered members (1 invisible), 672 guests, and 0 spiders.
Key: Admin, Global Mod, Mod
Newest Members
EternallyCurious, howardR, 11honza11, ccorrea, sakolin
19047 Registered Users
Previous Gallery
Next Gallery
Print Thread
Rate Thread
Page 4 of 4 1 2 3 4
texcoord3 dx9 prototype #382296
09/06/11 22:40
09/06/11 22:40
8 Images
Joined: Sep 2011
Posts: 13
T
texcoord3 Offline OP
Newbie
texcoord3 dx9 prototype

Greetings,

I used to be pretty active on this forum under the alias "foxfire" - sadly I have forgotten my password and no longer have access to my old email so I made this new account.

Anyway, these are rendered in real-time dynamically with A7 Gamestudio engine. This was the prototype for my own engine API and has served it's purpose well. The prototype is now retired and I use my own proprietary API in dx11.

If you have any questions or need help coding please ask me! I am always glad to help =]

Also, sorry for the LONG absence from here - I've been VERY busy. DX

-Texcoord3-

Last edited by texcoord3; 09/06/11 23:23.
39 Comments
Re: texcoord3 dx9 prototype [Re: WretchedSid] #386924
11/11/11 07:40
11/11/11 07:40
Joined: Mar 2002
Posts: 1,774
Magdeburg
F
FlorianP Offline
Serious User
FlorianP  Offline
Serious User
F

Joined: Mar 2002
Posts: 1,774
Magdeburg
Sure you can write more optimized code in Assembler than in c - and thats kinda the point. You don't write a c-snippet and go like 'hey now im gonna compile this by hand'...
Todays desktop CPUs(all CISCs anyway) still have ways of adressing and tricks in general you couldnt dream of in C. Needs a buttload experience though - but fairly possible!

Last edited by FlorianP; 11/11/11 07:41.
Re: texcoord3 dx9 prototype [Re: FlorianP] #386934
11/11/11 10:32
11/11/11 10:32
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline
Expert
WretchedSid  Offline
Expert

Joined: Apr 2007
Posts: 3,751
Canada
I never said that you can't write slow C code, I said that its nearly impossible to write better assembler than a compiler. Of course this depends on the compiler, I have no doubt that its totally possible to write better assembler than the Lite-C compiler, but better assembler than GCC or LLVM/Clang?

But if by "better" you actually mean easier to read, you are right, that is possible.

Re: texcoord3 dx9 prototype [Re: WretchedSid] #386936
11/11/11 11:18
11/11/11 11:18
Joined: Mar 2002
Posts: 1,774
Magdeburg
F
FlorianP Offline
Serious User
FlorianP  Offline
Serious User
F

Joined: Mar 2002
Posts: 1,774
Magdeburg
Automated optimizations is a topic in the theoretical computer science for a very long time - fact is every optimization done by a toadys machine is obviously imperfect.
(In this case) every compiler has to make assumptions based on the higher language its interpreting. I remember one of the first examples we had in theoretical CS was a floating operation in which the compiler has to assume that you need - lets say 32 bit precision - the compiler has no way of knowing that u might need less, thus ends the optimization. Though this very basic and very old example might not work for any machine theres tons of literature about this.
In fact this exact problem was the reason for (re-)inventing RISC CPUs which try to minimize such problems.

If ure really interested in this topic I suggest u read some books about formal languages and numerical optimizations f.i.

EDIT:
You might try this
http://en.wikipedia.org/wiki/Kahan_summation_algorithm

Code:
float KahanSum
(
  const float *data,
  int n
)
{
   float
     sum = 0.0f,
     C = 0.0f,
     Y,
     T;

   for (int i = 0 ; i < n ; ++i)
   {
      Y = *data++ - C;
      T = sum + Y;
      C = T - sum - Y;
      sum = T;
   }

   return sum;
}


Code:
float AsmSum
(
  const float *data,
  int n
)
{
  float
    result = 0.0f;

  _asm
  {
    mov esi,data
    mov ecx,n
    fldz
    fldz
l1:
    fsubr [esi]
    add esi,4
    fld st(0)
    fadd st(0),st(2)
    fld st(0)
    fsub st(0),st(3)
    fsub st(0),st(2)
    fstp st(2)
    fstp st(2)
    loop l1
    fstp result
    fstp result
  }

  return result;
}



Last edited by FlorianP; 11/11/11 11:26.
Re: texcoord3 dx9 prototype [Re: FlorianP] #386937
11/11/11 11:59
11/11/11 11:59
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline
Expert
WretchedSid  Offline
Expert

Joined: Apr 2007
Posts: 3,751
Canada
Originally Posted By: FlorianP
Automated optimizations is a topic in the theoretical computer science for a very long time - fact is every optimization done by a toadys machine is obviously imperfect.

Of course, but this doesn't mean that humans can do it any better and thats the reason why I still doubt that texcoord3 can produce better assembly.
You know, knowing that compilers aren't perfect and making it better are two totally different pairs of shoes.
No doubt that there are people who can do this, but what are the odds that one of them is here in the forum where most can't even write performant C code? If he had said that he optimized a very few routines in assembly, I would actually believe him, but a complete project? Are you really believing this?

About your floating point example, imo the user should know what kind of data type s/he should use in which case and the compiler should trust that the user knows what s/he is doing. Again, its totally easy to write horrible slow C code that even if optimized by the compiler still performs very bad, however, that was never my point.

Last edited by JustSid; 11/11/11 12:05.
Re: texcoord3 dx9 prototype [Re: WretchedSid] #386940
11/11/11 12:42
11/11/11 12:42
Joined: Mar 2002
Posts: 1,774
Magdeburg
F
FlorianP Offline
Serious User
FlorianP  Offline
Serious User
F

Joined: Mar 2002
Posts: 1,774
Magdeburg
I admit I have no idea what this thread is actually about neither have i any indea what texcoord is capable of...sorry for that.

Of course your totally right that its bogus to write a whole project in Assembler or thinking that you can even get close to the average power of compiler-optimizations these days.
But my point is - theres actually real life examples where assembler has a clear advantage over c - especially in computer graphics.
You are the local Apple-fanboy wink right? So you might have already stumbled over this:
Lets say u want to multiply two 32bit floats to a 64bit result and then get the middle 32bit. ARM processors can do that wihtin one clock-cycle((Prozessor-)Takt, ka ob das die korrekte Übersetzung ist) meaning using one assembler instruction. I don't know a single C-compiler who recognizes this and optimizes it correctly.

Last edited by FlorianP; 11/11/11 13:56.
Re: texcoord3 dx9 prototype [Re: FlorianP] #386947
11/11/11 15:23
11/11/11 15:23
Joined: Dec 2000
Posts: 4,608
mk_1 Offline

Expert
mk_1  Offline

Expert

Joined: Dec 2000
Posts: 4,608
there're certainly compilers that have an ARM backend which use the power of specialized instructions. It is indeed a very daunting task for the compiler to identify several instructions that can be replaced by SSE instrunctions, though. It's all possible but it's also NP complete most of the time which is why sometimes writing a little asm doesn't hurt instead of having a compile time of several days.

EDIT: clock cycle is correct

Last edited by mk_1; 11/11/11 15:24.
Re: texcoord3 dx9 prototype [Re: FlorianP] #386955
11/11/11 17:04
11/11/11 17:04
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline
Expert
WretchedSid  Offline
Expert

Joined: Apr 2007
Posts: 3,751
Canada
Originally Posted By: FlorianP
You are the local Apple-fanboy wink right?

Guilty as charged.

Originally Posted By: FlorianP
Lets say u want to multiply two 32bit floats to a 64bit result and then get the middle 32bit. ARM processors can do that wihtin one clock-cycle((Prozessor-)Takt, ka ob das die korrekte Übersetzung ist) meaning using one assembler instruction.

I did a quick look into the ARM ARM and couldn't find such an instruction in the NEON instruction set reference for any revision of the Cortex A8. Mind pointing me to the one you mean?

Btw, do you really mean one clock cylce or one assembler mnemonic?


Originally Posted By: mk_1
there're certainly compilers that have an ARM backend which use the power of specialized instructions. It is indeed a very daunting task for the compiler to identify several instructions that can be replaced by SSE instrunctions, though.

FWIW: The * operator of my color class uses something that can be optimized by NEON quite well, by just loading both single precision floating point vectors into NEON registers at once and then multiplying them all at once. In fact, LLVM/Clang 3.0 does this when compiling for armv7 in release mode. I'm not quite sure which parameter triggers this, but one certainly does it.

Used LLVM Version:
Quote:
noname:~ Sidney$ clang --version
Apple clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.2.0
Thread model: posix


Re: texcoord3 dx9 prototype [Re: WretchedSid] #386978
11/12/11 03:10
11/12/11 03:10
Joined: Sep 2011
Posts: 13
T
texcoord3 Offline OP
Newbie
texcoord3  Offline OP
Newbie
T

Joined: Sep 2011
Posts: 13
Ok, well I am no God and I am not the most experience programmer in the world.

BUT,
I have tested various code snippets against gcc and vc++ (yes it's c++ but anyway...).

I've noticed that I can code very complex optimizations in assembly that the compilers destroy, mainly when it comes to accessing memory.

For example, my prime optimization is a ray-triangle intersection routine that only reads each possible triangle from ram and then does ALL of the math in registers. Now... maybe a compiler can do that, but at least gcc nor visual studio's compiler produced anything near as efficient.

To be honest, most of, if not all, of the optimizations I have made/plan simply manage variable better than c/c++. While that doesn't always translate to huge gains, or even different code from the compiler, in many examples, such as my ray-tracer, it is a very significant (I am still testing for how significant) difference. I've observed that accessing memory is not particularly fast compared to basic add,mult, etc and so the more I can keep in the cache and use registers, the faster the routines should be.

Also mk_1 noted about SIS (special instruction sets). Indeed I can use these and manage portability in the installer.

Again, I'm not an expert, just a obsessive coder. lol.

Re: texcoord3 dx9 prototype [Re: WretchedSid] #386980
11/12/11 03:22
11/12/11 03:22
Joined: Sep 2011
Posts: 13
T
texcoord3 Offline OP
Newbie
texcoord3  Offline OP
Newbie
T

Joined: Sep 2011
Posts: 13
I find your comments about my coding ability mildly offensive. :{

Have I done something to suggest I am not able to achieve such a project?

Also, you must remember that the project will never be fully (or probably not even mostly) in assembler. At this point I'm just writing many routines in assembler language.

I'm very disappointed in your comment. I have only returned to contribute and ask nothing in return. I am not asking for money nor do I need any. I have been studying computer science since I was able to read. I take my work very seriously. while I might not be the best, I don't think I should be considered a novice.

I am a student at the University of Maryland and I take my honor very seriously. I would never exaggerate or forge results or code and to do so would mean my expulsion from my University. Please reconsider your comment as I apologize for any inconvenience.

Re: texcoord3 dx9 prototype [Re: texcoord3] #386991
11/12/11 10:28
11/12/11 10:28
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline
Expert
WretchedSid  Offline
Expert

Joined: Apr 2007
Posts: 3,751
Canada
I didn't meant to criticize you as a person, I mean, I don't even know you at all. All I was saying is that I have huge doubts that the average programmer here on the forum can write better assembly than a compiler which does optimizations.
And your post looked like "hey, I wrote the project in Assembler", at least thats what I understood from it, and this, and I hope you agree there, is really unlikely to perform better!

By no means its possible to boost performance by helping the compiler out at some parts, I do this too, for example a lot of the matrix and vector calculation done in my engine (iOS) is written in direct assembler to get the most out of NEON. I have no doubt that you can do this too, but like I said, I read your post a bit different. Sorry about that.


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com
Page 4 of 4 1 2 3 4

Moderated by  jcl, Realspawn, Spirit 

Gamestudio download | chip programmers | Zorro platform | shop | Data Protection Policy

oP group Germany GmbH | Birkenstr. 25-27 | 63549 Ronneburg / Germany | info (at) opgroup.de

Powered by UBB.threads™ PHP Forum Software 7.7.1