add my code and shit
This commit is contained in:
141
README.md
141
README.md
@@ -152,6 +152,11 @@ The actual code in the test case is boring and doesn't matter. I'm sure there
|
||||
are disagreements on whether I've picked good candidates of "exotic" or new
|
||||
instructions, but those were the first that came to mind.
|
||||
|
||||
(It's also doubtful whether you'll ever encounter functions where the first
|
||||
instructions are of this category, because most probably there's some setup
|
||||
needed before, e.g. checking that adresses are aligned, initalizing loop
|
||||
counters, yadda, yadda)
|
||||
|
||||
Test case: loop and TailRec
|
||||
===========================
|
||||
|
||||
@@ -235,6 +240,133 @@ Which isn't right and will crash horribly.
|
||||
| MHook| | | X | | | | |
|
||||
+----------+-----+------+------------+---+------+----+-------+
|
||||
|
||||
As expected nothing could correctly hook the loop. In fact I had to comment out
|
||||
those parts because even Catch2 couldn't recover from the crashes generated by
|
||||
the botched hooks. Some hooking engines are a bit lacking in their support for
|
||||
newer instruction sets, but a simple update of the dissassembler library should
|
||||
fix that.
|
||||
|
||||
I was pleasantly suprised by MinHook, both the general AIP and because it
|
||||
managed to build a trampoline that worked perfectly even for the tail
|
||||
recursion case. I'd recommend it, even though it seems theres no chance that
|
||||
the dissassembler will ever be updated.
|
||||
|
||||
Detecting tail recursive functions / loops into overwritten code
|
||||
================================================================
|
||||
|
||||
Back in 2015 I wanted to write my own hooking engine which would be able to
|
||||
hook ALL THE FUNCTIONS! And I did actually start to write it and then
|
||||
abandoded it, before I got to the interesting part. However since then I had
|
||||
the basic idea down:
|
||||
|
||||
1) Find out how long the function is
|
||||
2) Analyze it, by checking whether some jump could jump into the overwritten
|
||||
instructions
|
||||
3) Somehow fix that
|
||||
|
||||
Fixing that code probably means putting the whole function in the trampoline,
|
||||
by definition there is no space where to put the additional/longer instructions.
|
||||
|
||||
However I think that hooking engines should at least fail fast if they can't
|
||||
hook that function and give the user the ability to handle that error at that
|
||||
stage instead of waiting for unpredictable crashes. I'll post example code
|
||||
[here](https://git.free-hack.com/wacked/x64hook) and outline the general
|
||||
technique below.
|
||||
|
||||
(My x64hook hooking engine doesn't work. There's literally two interesting
|
||||
functions in it, and I give pseudocode for them below)
|
||||
|
||||
Estimate the length of a function
|
||||
---------------------------------
|
||||
|
||||
Note: This is an estimation of the function length. There's various ways to go
|
||||
about to do it, one way would be to search pro- and epilogue. Which would fail
|
||||
for all functions that -- for whatever reason -- don't have that. I'm sure this
|
||||
way also isn't perfect, but maybe it could be used as another source of
|
||||
information[5].
|
||||
|
||||
Over the years I've seen various attempts at estimating the function length.
|
||||
One of the top hits for my google history is a question on stackoverflow
|
||||
which[3] uses the same technique that I've seen in various malware strains -
|
||||
checking byte for byte until the RET opcode is found. Which won't work if
|
||||
either:
|
||||
|
||||
1) The `RET imm16` opcode is used, which is often the case for __stdcall funcs.
|
||||
2) There are multiple returns
|
||||
3) The function doesn't actually return with the RET instruction. For example
|
||||
if a function A at its end calls another function B, with A and B sharing the
|
||||
same parameters and either A or B not modifying the stack pointer it is
|
||||
perfectly possible to just jump to function B. Exectution will continue in B,
|
||||
which ends with a normal RET.
|
||||
4) The value 0xC3 appears for some other reason in the function.
|
||||
|
||||
4) can be easily solved by using a length disassember engine and just checking
|
||||
the actual instruction byte. 1) and 3) aren't that hard either, you'll just
|
||||
need to check for some additional opcodes. What about 2)?
|
||||
|
||||
The key insight I had was why a function might have multiple returns -- because
|
||||
it needed to do additional work in some cases. Which meant that there had to be
|
||||
branching, to sometimes skip some instructions or get to them.
|
||||
|
||||
If there is a branch backwards it's a loop. But a branch forwards means that
|
||||
the function extends at least up to there[4]. Or in pseudocode:
|
||||
|
||||
offsetOfInstr = 0
|
||||
funcLen = 0
|
||||
furthestJump = 0
|
||||
while(can dissasemble next instruction)
|
||||
{
|
||||
offsetOfInstr += funcLen;
|
||||
|
||||
|
||||
op = getOpcode(instruction);
|
||||
if(is_jump(op))
|
||||
{
|
||||
off = get_jump_offset(instruction);
|
||||
if(off > furthestJump)
|
||||
furthestJump = off;
|
||||
}
|
||||
|
||||
if(is_end_of_function(op, furthestJump, offsetOfInstr))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
bool is_end_of_function(opc, furthestJump, instrOffset)
|
||||
{
|
||||
if(opc == RET && furthestJump <= instrOffset)
|
||||
return true;
|
||||
else if(opc == UD_Ijmp)
|
||||
{
|
||||
if(destination is IMM || destination is register)
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
Detecting loops to the start of a function
|
||||
------------------------------------------
|
||||
|
||||
firstJumpOffset = MAX_INT
|
||||
foreach(instruction in function)
|
||||
if(instruction is a jump)
|
||||
jumpOffset = getOffset(instruction) // relative to function start
|
||||
|
||||
/* jumps to exactly the start of a function are fine, since that is
|
||||
where our overwritten code starts. Thus it doesn't jump into the middle
|
||||
of an instruction */
|
||||
if(jumpOffset == 0)
|
||||
continue
|
||||
|
||||
if(jumpOffset < firstJumpOffset)
|
||||
firstJumpOffset = jumpOffset;
|
||||
|
||||
return firstJumpOffset < lengthNeededForHook
|
||||
------------
|
||||
|
||||
[1] This is one of the things that could easily be improved, but haven't been
|
||||
because I just couldn't motivate myself. Putting the data right after the func
|
||||
meant that a section containing code needed to be writable. Which is bad. Also
|
||||
@@ -246,3 +378,12 @@ compiled into. That would be read only data. Oh well.
|
||||
|
||||
[2] And Microsoft decided at some point to make it even easier for their code
|
||||
with the advent of hotpatching.
|
||||
|
||||
[3] https://stackoverflow.com/questions/8705215/get-the-size-length-of-a-c-function
|
||||
|
||||
[4] With some caveats, e.g. one could assume that no function is longer than
|
||||
512 bytes. And obviously keeping in mind point 3
|
||||
|
||||
[5] Another heuristic would be to check for the next slide of filler
|
||||
instructions, such as INT3 or NOP. Some compilers align functions on 16byte
|
||||
boundarys and fill the gaps with those
|
||||
Reference in New Issue
Block a user