Hmmm...do i understand correctly that Penalty=-0.5*PIP will make fill price more advantageous then that set by Entry limit ? (and will allow setting a fixed 'slippage' in pips, rather than seconds, for stops?)

If so, then - for limits - this works counter to the 'realism' objective:
a) filling limits at better prices is a rare 'bonus' not to be counted on in development/evaluation and
b) the order will still fill (!)- even though at possibly a worse price...while it might not at all in reality...Which was the whole point of asking for a 'safety margin' on fill/no fill in testing...and training such a system.

What would be the way to use Penalty for such purpose?