244 lines
9.1 KiB
Markdown
244 lines
9.1 KiB
Markdown
Introduction
|
|
============
|
|
This project aims to give a simple overview on how good various x64 hooking
|
|
engines (on windows) are. I'll try to write various functions, that are hard to
|
|
patch and then see how each hooking engine does.
|
|
|
|
I'll test:
|
|
|
|
* [EasyHook](https://easyhook.github.io/)
|
|
* [PolyHook](https://github.com/stevemk14ebr/PolyHook)
|
|
* [MinHook](https://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x-x-API-Hooking-Libra)
|
|
* [Mhook](http://codefromthe70s.org/mhook24.aspx)
|
|
|
|
(I'd like to test detours, but I'm not willing to pay for it. So that isn't
|
|
tested :( )
|
|
|
|
There are multiple things that make hooking difficult. Maybe you want to patch
|
|
while the application is running -- in that case you might get race conditions,
|
|
as the application is executing your half finished hook. Maybe the software has
|
|
some self protection features (or other software on the system provides that,
|
|
e.g. Trustee Rapport)
|
|
|
|
Evaluating how the hooking engines stack up against that is not the goal here.
|
|
Neither are non-functional criteria, like how fast it is or how much memory it
|
|
needs for each hook. This is just about the challenges the function to be
|
|
hooked itself poses.
|
|
|
|
Namely:
|
|
|
|
* Are jumps relocated?
|
|
* What about RIP adressing?
|
|
* If there's a loop at the beginning / if it's a tail recurisve function, does
|
|
the hooking engine handle it?
|
|
* How good is the dissassembler, how many instructions does it know?
|
|
* Can it hook already hooked functions?
|
|
|
|
At first I will give a short walk through of the architecture, then quickly go
|
|
over the test cases. After that come the results and an evaluation for each
|
|
engine.
|
|
|
|
I think I found a flaw in all of them; I'll publish a small POC which should at
|
|
least detect the existence of problematic code.
|
|
|
|
**A word of caution**: my results are worse than expected, so do assume I have
|
|
made a mistake in using the libraries. I went into this expecting that some
|
|
engines at least would try to detect e.g. the loops back into the first few
|
|
bytes. But none did? That's gotta be wrong.
|
|
|
|
**Another word of caution**: parts of this are rushed and/or ugly. Please
|
|
double check parts that seem suspicious. And I'd love to get patches, even for
|
|
the most trivial things -- spelling mistakes? Yes please.
|
|
|
|
Architecture
|
|
============
|
|
This project is made up of two parts. A .DLL with the test cases and an .exe
|
|
that hooks those, tests whether they still work and prints the results.
|
|
|
|
(I could have done it all in the .exe but this makes it trivial to (at some
|
|
point) force the function to be hooked and the target function to be further
|
|
apart than 2GB. Just set fixed image bases in the project settings and you're
|
|
done)
|
|
|
|
My main concern was automatically identifying whether the hook worked. I
|
|
consider a hook to work if: a) the original function can still execute
|
|
successfully *and* b) the hook was called.
|
|
|
|
The criteria a) is really similar to a unit test. Verify that a function
|
|
returns what is expected. So for a) the .exe just runs unit tests after all the
|
|
hooks have been applied. Each failing function is reported (or the program
|
|
crashes and I can look at the callstack) so I can correlate that with which
|
|
hooking engine I'm currently testing and see where those fail. I've used
|
|
Catch2 for the unit tests, because I wanted to try it anyway.
|
|
|
|
From the get-to it was clear that I wanted to test multiple hooking engines.
|
|
And they all needed to do the same steps in the same order -- so I implemented
|
|
a basic AbstractHookingEngine with a boolean for every test case and make a
|
|
child class for each engine. The children classes have to overwrite `hook_all`
|
|
and `unhook_all`. Inbetween the calls to that, the unit tests run.
|
|
|
|
Test case: Small
|
|
================
|
|
This is just a very small function; it is smaller than the hook code will be -
|
|
so how does the library react?
|
|
|
|
|
|
_small:
|
|
xor eax, eax
|
|
ret
|
|
|
|
|
|
Test case: Branch
|
|
=================
|
|
Instead of the FASM code I'll show the disassembled version, so you can see the
|
|
instruction lengths & offsets.
|
|
|
|
|
|
0026 | 48 83 E0 01 | and rax,1
|
|
002A | 74 17 | je test_cases.0043 ----+
|
|
002C | 48 31 C0 | xor rax,rax |
|
|
002F | 90 | nop |
|
|
0030 | 90 | nop |
|
|
0031 | 90 | nop |
|
|
0032 | 90 | nop |
|
|
0033 | 90 | nop |
|
|
0034 | 90 | nop |
|
|
0035 | 90 | nop |
|
|
0036 | 90 | nop |
|
|
0037 | 90 | nop |
|
|
0038 | 90 | nop |
|
|
0039 | 90 | nop |
|
|
003A | 90 | nop |
|
|
003B | 90 | nop |
|
|
003C | 90 | nop |
|
|
003D | 90 | nop |
|
|
003E | 90 | nop |
|
|
003F | 90 | nop |
|
|
0040 | 90 | nop |
|
|
0041 | 90 | nop |
|
|
0042 | 90 | nop |
|
|
0043 | C3 | ret <-----------------+
|
|
|
|
|
|
This function has a branch in the first 5 bytes. Hooking it detour-style isn't
|
|
possible without fixing that branch in the trampoline. The NOP sled is just so
|
|
the hooking engine can't cheat and just put the whole function into the
|
|
trampoline. Instead the jump in the trampoline needs to be modified so it jumps
|
|
back to the original destinations
|
|
|
|
Test case: RIP relative
|
|
=======================
|
|
One of the new things in AMD64 is RIP relative addressing. I guess the reason
|
|
to include it was to make it easier to generate PIC -- all references to data
|
|
can now be made relative, instead of absolute. So it doesn't matter anymore
|
|
where the program is loaded into memory and there's less need for the
|
|
relocation table.
|
|
|
|
A quick and dirty[1] test for this is re-implementing the well known C rand
|
|
function.
|
|
|
|
|
|
public _rip_relative
|
|
_rip_relative:
|
|
mov rax, qword[seed]
|
|
mov ecx, 214013
|
|
mul ecx
|
|
add eax, 2531011
|
|
mov [seed], eax
|
|
|
|
shr eax, 16
|
|
and eax, 0x7FFF
|
|
ret
|
|
|
|
seed dd 1
|
|
|
|
|
|
The very first instruction uses rip relative addressing, thus it needs to be
|
|
fixed in the trampoline.
|
|
|
|
Test case: AVX & RDRAND
|
|
=======================
|
|
|
|
The AMD64 instruction set is extended with every CPU generation. Becayse the
|
|
hooking engines need to know the instruction lengths and their side effects to
|
|
properly apply their hooks, they need to keep up.
|
|
|
|
The actual code in the test case is boring and doesn't matter. I'm sure there
|
|
are disagreements on whether I've picked good candidates of "exotic" or new
|
|
instructions, but those were the first that came to mind.
|
|
|
|
Test case: loop and TailRec
|
|
===========================
|
|
|
|
My hypothesis before starting this evaluation was that those two cases would
|
|
make most hooking engines fail. Back in the good ol' days of x86 detour hooking
|
|
didn't require any special thought because the prologue was exactly as big as
|
|
the hook itself -- 5 bytes for `PUSH ESP; MOV EBP, ESP` and 5 bytes for `JMP +-
|
|
2GB`[2]. That isn't so easy for AMD64: a) the hook sometimes needs to be *way*
|
|
bigger b) due to changes in the calling convention and the general architecture
|
|
of AMD64 there just isn't a common prologue, used for almost all functions,
|
|
anymore.
|
|
|
|
Those by itself arn't a problem, since the hooking engines can fix all the
|
|
instructions they would overwrite. However I hypothesized that only a few would
|
|
check whether the function contained a loop that jumps back into the
|
|
instructions that have been overwritten. Consider this:
|
|
|
|
public _loop
|
|
_loop:
|
|
mov rax, rcx
|
|
@loop_loop:
|
|
mul rcx
|
|
nop
|
|
nop
|
|
nop
|
|
loop @loop_loop ; lol
|
|
ret
|
|
|
|
There's only 3 bytes that can be safely overwritten. Right after that is the
|
|
destination of the jump backwards. This is a very simple (and kinda pointless)
|
|
function so detecting that the loop might lead to problems shouldn't be a
|
|
problem. Basically the same applies for the next example:
|
|
|
|
public _tail_recursion
|
|
_tail_recursion:
|
|
test ecx, ecx
|
|
je @is_0
|
|
mov eax, ecx
|
|
dec ecx
|
|
@loop:
|
|
test ecx, ecx
|
|
jz @tr_end
|
|
|
|
mul ecx
|
|
dec ecx
|
|
|
|
jnz @loop
|
|
jmp @tr_end
|
|
@is_0:
|
|
mov eax, 1
|
|
@tr_end:
|
|
ret
|
|
|
|
(Preliminary) Results
|
|
=====================
|
|
|
|
+----------+-----+------+------------+---+------+----+-------+
|
|
| Name|Small|Branch|RIP Relative|AVX|RDRAND|Loop|TailRec|
|
|
+----------+-----+------+------------+---+------+----+-------+
|
|
| PolyHook| X | X | X | X | | | |
|
|
| MinHook| X | X | X | | | | X |
|
|
| MHook| | | X | | | | |
|
|
+----------+-----+------+------------+---+------+----+-------+
|
|
|
|
[1] This is one of the things that could easily be improved, but haven't been
|
|
because I just couldn't motivate myself. Putting the data right after the func
|
|
meant that a section containing code needed to be writable. Which is bad. Also
|
|
I load the seed DWORD as a QWORD -- which only works because the upper half is
|
|
then thrown away by the multiplication. It's shitty code is what I'm saying.
|
|
|
|
In retrospect I should have used a jump table like a switch-case could be
|
|
compiled into. That would be read only data. Oh well.
|
|
|
|
[2] And Microsoft decided at some point to make it even easier for their code
|
|
with the advent of hotpatching. |