The bilateral upsampling is now x2.3 times faster, thanks to less lookups. Instead of calculating for each HiRes-pixel the upsampled result by examining the 2x2 neighbourhood of coarse samples, I calculate on half the resolution for each coarse sample its 2x2 HiRes counterpart. The coarse neighbourhood is still examined, but only once for all 4 HiRes pixel and not each time for each HiRes Pixel.

I got on my notebook with chipset graphics an immediate performance boost of up to 10 fps (results may vary on faster machines, though).