Gamestudio Links
Zorro Links
Newest Posts
Trading Journey
by howardR. 04/28/24 09:55
Zorro Trader GPT
by TipmyPip. 04/27/24 13:50
Help with plotting multiple ZigZag
by M_D. 04/26/24 20:03
Data from CSV not parsed correctly
by jcl. 04/26/24 11:18
M1 Oversampling
by jcl. 04/26/24 11:12
Why Zorro supports up to 72 cores?
by jcl. 04/26/24 11:09
Eigenwerbung
by jcl. 04/26/24 11:08
AUM Magazine
Latest Screens
The Bible Game
A psychological thriller game
SHADOW (2014)
DEAD TASTE
Who's Online Now
3 registered members (AndrewAMD, alibaba, Quad), 761 guests, and 2 spiders.
Key: Admin, Global Mod, Mod
Newest Members
wandaluciaia, Mega_Rod, EternallyCurious, howardR, 11honza11
19049 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
str_hash #456304
11/17/15 21:41
11/17/15 21:41
Joined: Jul 2001
Posts: 6,904
H
HeelX Offline OP
Senior Expert
HeelX  Offline OP
Senior Expert
H

Joined: Jul 2001
Posts: 6,904
I was in the need to have a hash function for strings. I copied some code from stackoverflow.com and it didn't work for strings "aa", "aaa", and so on, so I modified it a little bit. However, I don't know if it is stable - but it works for me.

Have fun:

Code:
long str_hash (STRING* str) {
	
	int c;
	long hash = 5381;
	
	char* cstr = _chr(str);
	
	int index;
	
	while (c = *cstr++, index++) {
		hash = (((hash << str_len(str)) + hash) + c + str_len(str)); 
	}

	return hash;
}


Re: str_hash [Re: HeelX] #456308
11/17/15 23:47
11/17/15 23:47
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline
Expert
WretchedSid  Offline
Expert

Joined: Apr 2007
Posts: 3,751
Canada
You could potentially save a lot of work by saving the result of the string length.

Anyway, hash functions seem a lot of hit and miss and also a lot of guess work, so if the above one doesn't work out for someone, here is the hashing function that we use in Rayne to hash strings:

Code:
void UTF8String::RecalcuateHash()
	{
		_hash = 0;
		
		const uint8 *bytes = GetBytes();
		
		for(size_t i = 0; i < _length; i ++)
		{
			HashCombine(_hash, UTF8ToUnicode(bytes));
			bytes += (UTF8TrailingBytes[*bytes] + 1);
		}
	}



And the HashCombine function looks like this:
Code:
template<class T>
	void HashCombine(size_t &seed, const T &value)
	{
		std::hash<T> hasher;
		seed ^= static_cast<size_t>(hasher(value)) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
	}



The idea is to hash each unicode character independently and then combine all of the hashes, to scramble the bits as much as possible. So you would also need a hash function for size_t, which for Rayne is std::hash<size_t>, which uses the cityhash64 function internally.

Can be simplified quite a bit, but I'll leave that as an exercise for the reader laugh

Also, UTF8TrailingBytes is a function returning the byte length of the UTF8 character. In Lite-C that would always be 0. UTF8ToUnicode simply converts the UTF8 character to unicode, which is dead simple for ASCII characters (its just a cast).

If anyone wants to pick this up and use it for Unicode, I would suggest hashing the grapheme clusters instead of the Unicode code points.


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com

Moderated by  HeelX, Lukas, rayp, Rei_Ayanami, Superku, Tobias, TWO, VeT 

Gamestudio download | chip programmers | Zorro platform | shop | Data Protection Policy

oP group Germany GmbH | Birkenstr. 25-27 | 63549 Ronneburg / Germany | info (at) opgroup.de

Powered by UBB.threads™ PHP Forum Software 7.7.1