I am probably wrong, but I think that one uses the brightness as heightmap, from which you can then calculate normals by taking always three pixels, calculating the direction of one to the other two and doing the cross product, which after dividing through the length gives you what you then just have to put into a texture, by multiplying with and adding 0.5.