A sufficient size so that I don't need to recreate it when the resolution changes would be the largest possible screen resolution... I don't think that this is a good approach
The idea with the target format array stems from the fact that you already recreate one 888(8?) target for views if the view->bmap is not set. So, for me it would be the next logical step to let the user give hints what format should be used by the engine. So, if the VIEW struct had a view->format attribute and view->format is != 0, this format would be used for the to-be-created render target (and stored in view->bmap). So if you just want to render the depth, it would be nice I we could set view->format to e.g. 12 or so. If view->format is != 0 and the resolution changes, the engine would then auto-deallocate view->bmap and recreate it with the given format (all pointers to view->bmap will become invalid then).
The next logical step would be then to extent this thinking to the remaining targets. That is, if the engine already auto-creates one (1) target, it would be nice, if the engine could do that in one step already for more than one target. That is why I suggested to use a format-array. It would be very easy to use and requires not so much changes in the engine code, I suppose.
Plus, all cumbersome approaches for auto-detecting resolution changes and complicated render-target recreation would vanish because the targets are changed engine-wise. I think this is a great idea