Yeah, I've thought about something like that already. The problem with the dictionary is, though, that it's only a stochastic process. There is no way of really knowing what the original looked like.

Another idea I might try. Imagine filming something with your low-res (320x240) webcam, let's say you take a movie of a textbook page with 100 frames in total. In principle (this is, excluding noise) you now have recorded the same information as if you had taken a 3200x2400 high-resolution photo of the page. All the text should be readable (given, of course, that you moved the camera). This is already being done in crime investigation and I saw a program which raised the resolution of motion pictures by taking subsequent frames and calculating new frames from them. It all depends on how well your motion detection works, though.