Purpose of NOP immediately after CALL instruction

Question

There are a lot of

... code ...
call sub_...
nop
... code ...

patterns in an executable dump I am working on. They appear in the middle of subroutines and I believe don't serve alignment purposes. I am curious about the origins of this construct.

The program was packed, so I am not sure if call-nop pair was there initially or appeared after unpacking.

It might be part of an anti-debugger scheme - the function could check what's at its return address. If it isn't a nop, it's probably a debugger breakpoint. This defeats debuggers that hide by using something else than the customary int 3 instruction for a breakpoint. — Guntram Blohm, Jan 16 '15 at 15:21
Interesting trick, but it doesn't seem to be used here - since I spoof the return address as a part of hooking technique and the program works flawlessly. — uranix, Jan 16 '15 at 15:40
Since you already accepted the other answer, i thought the anti-debugging trick probably doesn't apply to you. But i wanted to add it in case someone googles the question in a year or two. — Guntram Blohm, Jan 16 '15 at 16:09

score 5 · Accepted Answer · answered Jan 16 '15 at 14:05

5

The packer may have replaced an indirect call to an imported function with a direct (relative) call to another function. This would make the instruction shorter by one byte, necessitating the padding with one NOP:

FF 15 ?? ?? ?? ??  call cs:[__imp_foo] ; RIP-relative offs32 indirect
E8 ?? ?? ?? ??     call foo            ; RIP-relative offs32

answered Jan 16 '15 at 14:05

DarthGizka

2,010
1
13
30

But the call is within the same module. Why use import for that? – uranix Jan 16 '15 at 14:15
Some packers ('protectors') stay resident and offer an API to the wrapped executable. The executable is built against a DLL exporting the wrapper API, hence the indirect calls emitted by the compiler, through the address slots in the IAT. But the wrapper may choose to resolve those imports to direct calls during the loading/unpacking process. That way the loaded process won't have a treacherous IAT telling tales. 'Minimal rebuild' debug builds tend to use strange thunking/reserve mechanisms as well, but debug builds are rarely wrapped and shipped... – DarthGizka Jan 16 '15 at 14:28
In 32-bit mode all indirect calls to fixed targets can be converted to direct calls (again, to ditch the IAT) but in 64-bit mode that would require trampoline thunks for distances exceeding 2^31 bytes. – DarthGizka Jan 16 '15 at 14:32
2

likely just the linker, not packer. https://blogs.msdn.microsoft.com/russellk/2005/03/20/lnk4217/ – Igor Skochinsky May 12 '18 at 11:38

score 1 · Answer 2 · answered May 12 '18 at 02:03

It is likely that the first instruction after the NOP is the target of a different branch/jump somewhere else. Jumping to aligned targets is normally preferable both for better i-cache utilization and for better BTB predictions:

11.5 Alignment of code

Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.

If an important subroutine entry or jump label happens to be near the end of a 16-byte block then the microprocessor will only get a few useful bytes of code when fetching that block of code. It may have to fetch the next 16 bytes too before it can decode the first instructions after the label. This can be avoided by aligning important subroutine entries and loop entries by 16.

Aligning by 8 will assure that at least 8 bytes of code can be loaded with the first instruction fetch, which may be sufficient if the instructions are small.

We may align subroutine entries by the cache line size (typically 64 bytes) if the subroutine is part of a critical hot spot and the preceding code is unlikely to be executed in the same context.

http://agner.org/optimize/optimizing_assembly.pdf#page=86

This would make that NOP just a padding to align the following instructions. As pointed out elsewhere, adding padding for this must be done carefully because adding padding blindly is likely to lead to worse i-cache usage and therefore a decrease in performance. Always measure.

note: in other architectures (i.e. not x86/x86-64) NOPs after calls are sometimes required; since the question is about x86-64 this shouldn't apply.

Purpose of NOP immediately after CALL instruction

2 Answers2