It is likely that the first instruction after the NOP is the target of a different branch/jump somewhere else. Jumping to aligned targets is normally preferable both for better i-cache utilization and for better BTB predictions:
11.5 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
If an important subroutine entry or jump
label happens to be near the end of a 16-byte block then the
microprocessor will only get a few useful bytes of code when fetching
that block of code. It may have to fetch the next 16 bytes too before
it can decode the first instructions after the label. This can be
avoided by aligning important subroutine entries and loop entries by
16.
Aligning by 8 will assure that at least 8 bytes of code can be loaded with the first instruction fetch, which may be sufficient if
the instructions are small.
We may align subroutine entries by the
cache line size (typically 64 bytes) if the subroutine is part of a
critical hot spot and the preceding code is unlikely to be executed in
the same context.
http://agner.org/optimize/optimizing_assembly.pdf#page=86
This would make that NOP just a padding to align the following instructions. As pointed out elsewhere, adding padding for this must be done carefully because adding padding blindly is likely to lead to worse i-cache usage and therefore a decrease in performance. Always measure.
note: in other architectures (i.e. not x86/x86-64) NOPs after calls are sometimes required; since the question is about x86-64 this shouldn't apply.
nop
, it's probably a debugger breakpoint. This defeats debuggers that hide by using something else than the customaryint 3
instruction for a breakpoint. – Guntram Blohm Jan 16 '15 at 15:21