How much overhead should we expect from enabling integer overflow checks? Using a compiler flag or built-in intrinsics, we should be able to do the check with a conditional branch that branches based on the overflow flag that
sub set. Code that looks like
add %esi, %edi
should turn into something like
add %esi, %edi jo <handle_overflow>
Assuming that branch is always correctly predicted (which should be the case for most code), the costs of the branch are the cost of executing that correctly predicted not-taken branch, the pollution the branch causes in the branch history table, and the cost of decoding the branch (on x86,
jno don’t fuse with
sub, which means that on the fast path, the branch will take up one of the 4 opcodes that can be come from the decoded instruction cache per cycle). That’s probably less than a 2x penalty per
sub on front-end limited in the worst case (which might happen in a tightly optmized loop, but should be rare in general), plus some nebulous penalty from branch history pollution which is really difficult to measure in microbenchmarks. Overall, we can use 2x as a pessimistic guess for the total penalty.
2x sounds like a lot, but how much time do applications spend adding and subtracting? If we look at the most commonly used benchmark of “workstation” integer workloads, specint, the composition is maybe 40% load/store ops, 10% branches, and 50% other operations. Of the 50% “other” operations, maybe 30% of those are integer add/sub ops. If we guesstimate that load/store ops are 10x as expensive as add/sub ops, and other ops are as expensive as add/sub, a 2x penalty on add/sub should result in a
(40*10+10+50 + 12) / (40*10+10+50) = 3% penalty. That the penalty for a branch is 2x, that add/sub ops are only 10x slower than load/store ops, and that add/sub ops aren’t faster than other “other” ops are all pessimistic assumptions, so this estimate should be on the high end for most workloads.