str_hash

Gamestudio Links

Zorro Links

Newest Posts

Texture/Sprite animation tool. Someone interested in ?
by NeoDumont. 04/29/24 02:20

Trading Journey
by howardR. 04/28/24 09:55

Zorro Trader GPT
by TipmyPip. 04/27/24 13:50

Help with plotting multiple ZigZag
by M_D. 04/26/24 20:03

Data from CSV not parsed correctly
by jcl. 04/26/24 11:18

M1 Oversampling
by jcl. 04/26/24 11:12

Why Zorro supports up to 72 cores?
by jcl. 04/26/24 11:09

Eigenwerbung
by jcl. 04/26/24 11:08

AUM Magazine

Latest Screens

Who's Online Now

3 registered members (AndrewAMD, alibaba, Quad), 761 guests, and 2 spiders.

Key: Admin, Global Mod, Mod

Newest Members

wandaluciaia, Mega_Rod, EternallyCurious, howardR, 11honza11
19049 Registered Users

Print Thread

Rate Thread

str_hash #456304 11/17/15 21:41 11/17/15 21:41
Joined: Jul 2001 Posts: 6,904 H HeelX OP Senior Expert
HeelX OP Senior Expert H Joined: Jul 2001 Posts: 6,904	I was in the need to have a hash function for strings. I copied some code from stackoverflow.com and it didn't work for strings "aa", "aaa", and so on, so I modified it a little bit. However, I don't know if it is stable - but it works for me. Have fun: Code: long str_hash (STRING* str) { int c; long hash = 5381; char* cstr = _chr(str); int index; while (c = *cstr++, index++) { hash = (((hash << str_len(str)) + hash) + c + str_len(str)); } return hash; }

Re: str_hash [Re: HeelX] #456308 11/17/15 23:47 11/17/15 23:47
Joined: Apr 2007 Posts: 3,751 Canada WretchedSid Expert
WretchedSid Expert Joined: Apr 2007 Posts: 3,751 Canada	You could potentially save a lot of work by saving the result of the string length. Anyway, hash functions seem a lot of hit and miss and also a lot of guess work, so if the above one doesn't work out for someone, here is the hashing function that we use in Rayne to hash strings: Code: void UTF8String::RecalcuateHash() { _hash = 0; const uint8 bytes = GetBytes(); for(size_t i = 0; i < _length; i ++) { HashCombine(_hash, UTF8ToUnicode(bytes)); bytes += (UTF8TrailingBytes[bytes] + 1); } } And the HashCombine function looks like this: Code: template<class T> void HashCombine(size_t &seed, const T &value) { std::hash<T> hasher; seed ^= static_cast<size_t>(hasher(value)) + 0x9e3779b9 + (seed << 6) + (seed >> 2); } The idea is to hash each unicode character independently and then combine all of the hashes, to scramble the bits as much as possible. So you would also need a hash function for size_t, which for Rayne is std::hash<size_t>, which uses the cityhash64 function internally. Can be simplified quite a bit, but I'll leave that as an exercise for the reader Also, UTF8TrailingBytes is a function returning the byte length of the UTF8 character. In Lite-C that would always be 0. UTF8ToUnicode simply converts the UTF8 character to unicode, which is dead simple for ASCII characters (its just a cast). If anyone wants to pick this up and use it for Unicode, I would suggest hashing the grapheme clusters instead of the Unicode code points. Shitlord by trade and passion. Graphics programmer at Laminar Research. I write blog posts at feresignum.com

Moderated by HeelX, Lukas, rayp, Rei_Ayanami, Superku, Tobias, TWO, VeT