It's not a bug per se but works as intended! The first character in a unicode text is the so called BOM (byte order mark) which is put there because you can encode a Unicode text as big endian or little endian, so the BOM is there to signal the endianess of the encoded text.

Now, you can argue that the read function should just ignore the BOM, but it's actually part of the text, just like every other control character, so you can also argue that it should be there. As a solution; Open the file, read the first character (16bit) and check if it's the BOM (because some retards write editors that don't include the BOM for whatever reason), and then either seek back one character or just continue. The BOM has the code point U+FEFF, but you should read it as two characters and compare them, unless you know how to write a function that reverses the byte order of something.


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com