In memory patching: three approaches
(how to introduce breakpoints in an automated debugger and other marvels)

by Stone

(20 March 1997)
advanced
papers
Advanced Cracking
Papers

Courtesy of reverser's page of reverse engineering

A very good essay by Stone, a great cracker and one of the few fine reversers around that produces his own VERY GOOD TOOLS.
This essay has a very high theoretical value and should IMO be read by ALL reversers: you'll find inside it matters like "how it's possible to introduce breakpoints in an automated debugger", "making the target load a DLL for me"... and other marvels. Stone intends to update this work in fieri, therefore your contributions on all these matters are welcomed. Enjoy! (Beginners shouldn't touch this stuff IMO)

In memory patching
Three approaches
by Stone 20 March 1998
After reading MadMax's essay on kernel patching I decided that perhaps it was time for an essay on "in memory patching". Contrary to the general +HCU philosophy my approach will be purely theoretical - the sourcecode I provide will serve as an example for you to build on. Is something preventing a patch? Is your target encrypted, packed, CRC'ed or you need the program to run sometimes with the patch applied sometimes without (A game-trainer for instance).Wouldn't you just love if you could patch the program in memory after it loaded, unpacked, did the CRC checks etc.? You can. In the dos days we had TSR's to do this job. In the windows world it's a bit more difficult as the programming interface (Win32 API) is dynamic in contrast to dos's static interupt system. However new methods which in many ways are similar to TSR's are now avaible. Kernel patching as MadMax pointed out is generally a bad idea. We need a more gentle approach. Which critereas would we like our solution to conform to? The critereas I'll use are: 1) The approach should perform ok in terms of compatability. That is work on both NT and 95 and hopefully on future versions as well. 2) The operating system should not suffer any long term effects of the crack. That is after termination of the target the OS should be left unchanged. 3) Only ring 3 measures should be used. (Some of the API-functions I'll use from ring 3 will actually switch to ring 0, but atleast there will be no foreign code introduced at ring 0) Common ground Our immediate problem is that in a preemptive operating system like windows each process runs in it's own addressing space. Each time that the operating system switches to another process the virtual mapping is changed to fit that of the current process. The whole idea with memory patching is providing means of patching the target in it's addressing space at a certain time (after unpacking, CRC'ing or whatever is done). However since a criterium of the memory patch is that we can't patch the operating system nor the program itself we need to find a way of gaining access to the target addressing space from another process. The next problem we got is one of timing. Obviously the target needs to be patched after the CRC check has been performed or after it is unpacked in memory. And possibly it needs to be unpatched again to pass later checks. In other words we need a reliable trigger mechanism. It is in this respect that the three methods I'll present here differ. The loader approach The critical assumption I'll make here is that the USER of the program can tell us how to time the patch thru another program. This basically means we assume that the user can: 1) Identify when patching is appropriate. 2) Switch to another program to activate. About the first assumption it can be said - if it's a trainer this will never be a problem. Obviously the user will know when he want's to have infinate lives. Often a messagebox or some other visable sign shows itself when a patch is needed. E.g. A messagebox saying "Insert correct CD in drive and press OK" It'd be easy to write a doc saying that when this occurs the dear user should press OK in another window first, and then in the target's obnoxious messagebox. However this is a serious shortcomming. Who said the program will actually let the user make a retry? Most 30-day trials tell the user the program has expired and the just exit or get into trial mode or whatever. Perhaps many different locations has to be patch at many different time making user-controlled patching a cumbersome solution. On assumption 2 can it be said that many games don't like switching tasks and it's not likely that users will enjoy having to switch out of their game to get a new handful of bullets or whatever. Let's get a bit more technical. Windows is so nice to provide us with an interface to write in other processes addressing space. The API needed is: kernel32!WriteProcessMemory Taking a closer look at this you'll find that what it actually does utilize Windows's int 2eh interface to switch to ring 0 meaning that it has ring 0 priveledges and thus is able to override the page protection. However the interface has build in a security feature so you cannot override ring 0 data/code. (The int 2eh interface is for NT - I figure Windows 95 does something similar but I havn't checked it. Anyways the result is the same) For WriteProcessMemory to work we need to identify by handle which process we want patched. IMHO the best to find such a handle is to create the target process yourself - that is do a good old fashioned EXEC from within your patch/trainer code. The API is Kernel32!CreateProcessA Ofcause there are different means of finding process handles. To synthetize a in-memory-patcher of this kind: CreateProcessA (Target) Wait for the user to say apply patch - e.g. amessageboxWriteProcessMemory Sourcecodes at: http://www.one.se/~stone/general/trainnt.zip (or something) ---------------------------------------------------------------------- The API-Hook/Debug Approach Obviously the assumptions made for the Loader Approach can be too restrictive. For instance 30-day trials often exit prior to offering the user any obvious point of introducing a patch. So does a dongle. Players might not like to switch task out of their beloved game to get another 10 bullets or whatever. What we really need is the target to trigger the patch and this section is a way of doing this. The whole idea here is to hook an API-call, and make it perform to our desire. That can be return fake values under certain circumstances it could be to patch the main program or it's dll's in memory. In short what we wish to do is to let the api-call the program performs be surrounded by our code so that we can make it perform in every way we wish. Certain side benefits will come along as well. The code I present will show how it's possible to introduce breakpoints in an automated debugger which is indeed something very useful for the creation of for instance unpackers. Again let's get down to it. A PE-file "imports" the functions it wishes to make use of. Because MS-developers decieded on a dynamic structure for API's it's obviously neasesary for each program to declare what functions it uses. This is done in a so called import table. Let's now take a deeper look into what takes place between the importtable in the PE-file and the execution of an API call by the target. 3 basic types of information is stored in the importtable. The first is DLL names, the second is function names and the third is a Thunk-RVA. The information is stored in a structure that looks something like this: DLL1-Name Function1-from-dll1- name or ordinal Thunk-RVA of Function 1 of DLL 1 Function2-from-1dll-name or ordinal Thunk-RVA of Function 2 of DLL 1 .... DLL2-Name Function1-from-dll2- name or ordinal Thunk-RVA of Function 1 of DLL 2 Function2-from-1dl2-name or ordinal Thunk-RVA of Function 2 of DLL 2 .... ... What windows does while loading the PE-file is traverse thru this table following this "pseudo code": While more DLL's do { Load DLL into process addressing space While More Functions imported from current DLL do { Find address of Function and write this to the Thunk-VA for this Function } } END Load Imports The function may be listed by name or something called ordinal. In every DLL each function that it exports for use by other programs is listed in an export directory (which is where windows find the address of the imported function) in this list each DLL is assigned a number and usually a name too. The number is called ordinal. Importing can be done either by referencing this ordinal value or by using the name. What the program then does when it's in need of the API-function it is this: CALL Dword ptr [Thunk VA of needed function] Lets for a second imagine that we could stop execution of the target process right before it started and then inject our own code in to it's addressing space. Then we could simply replace the value at any Thunk-VA with a pointer to our own code and our code would be executed every time the program decieded to use this API. We could even save the old pointer and use this to chain the original intended API-code. Weeeeeee.. "Isn't this just great?" as Oprah Winfrey would say. "No, it is not", as I would reply. We are left with a new problem. Or rather two. The first is stopping Execution of the target process before the program runs the first instruction so that we can be sure that our new pointers are in order. Second we're left the great problem of having code in the target's addressing space. Solving a problem at the time we start by examining how we can stop our target process. Many people always state that windows is overbloated and perhaps they are right - but in this case I'd say that it's damn convinient that MS-engeneers made a full-featured debug interface while designing API calls so that we could with the greatest of ease program a debugger without having to do the low-level work ourselves. Infact they made it so that not one line of ring 0 code has to be written to make an application debugger. "Isn't this just great?" as Oprah would phrase it? "Yes it is, maam" as I would reply. Because it get's even better. Windows engineers must've actually been thinking the day they made windows. What good is a full-featured debug interface if the poor programmer has to make a PE-loader before he can even start debugging. Hey after all they already made a loader and they decieded to be helpful. CreateProcessA can open a process in Debug mode. This means that inside of most windows's procedures hides status breakpoints that'll turn over the control to our debugger thru that interface. One of these status breakpoints triggers just before windows is about to turn over control to the just loaded PE-file. Convinient! Obviously if a process is in debug mode execution is suspended everytime a debug event occurs. A debug event is any non-handled exception. Pagefaults, breakpoints, division overflows, etc. And there are 6 different types of status breakpoints inside windows that'll be triggering like Rambo in Iraq. So basically we need to send a message from our debugger process that it's ok to continiue every time we have encountered such an event. Ofcause if it's the event we've been looking for we need to do whatever it is we wish to do before giving the green light to run on. This is the reason behind the loop of kernel32!WaitForDebugEvent and kernel32!ContiniueDebugEvent in my code. So now we know how to stop the program before it actually started. If you read the previous section you'll know how to exchange pointers. This leaves us with a grave problem. Injecting our code into the target's addressing space. Now this can be done in many ways indeed. We'll just be looking the one I chose. What I'll try to obtain is making the program load a DLL for me. This ofcause isn't something th program is willing to do without force. Fortunately for the moment I'm President Clinton and the security counsil has agreed to bomb the target until it conforms to my ideas. The scene is set at the status breakpoint just before the target is about to start execution. It is fully loaded and ready to go. However we're sitting comfortably with it suspended far far away in our own addressing space. The first thing we got to agree on is how it is we actually want's the target to do. Load OUR dll, find the process address of OUR function, replace the one found at the THunk-VA of the original. We now constuct code that will do just that in deltaoffset so that it can be inserted anywhere. Prior to actually running the program we found a page within the target that allowed execution. Most pages in the target allows execution but we just need one. We now read the page out the Process space of the target into our own and stores it safely. This is done thru another subfunction of INT 2eh which ofcause also overrides pageprotection etc. The API is: kernel32!ReadProcessMemory See Natzguls essay for a more thourough breakdown of this function. Now we write our own code that loads a DLL, finds the address of our function and replaces the Thunk-VA entry of the function with ours. Now were ready to go? No. We're left with the problem that execution should be left otherwise unchanged so that we've written a page somewhere is bad news. So in addition to the code we appended we add an INT 3 which will when executed cause a debug event and once more suspend the target allowing us to restore the page. Unfortunately EIP of the target does not neassesarely point to our page, further we use all the registers and those needs to be restored too. So where do we turn? Windows internal knowledge. Upon creation that is prior to running any actual program code any one process has one and one thread only. Further windows allows debuggers to fetch the Context of a thread. That is all relevant information about the threads current status. Such a context was originally intended for preemptive multitasking so that when ever the OS suspended execution of the thread to do another the context was saved, the address space swapped and another threads context was restored it's process's address space swapped in place and it was allowed to continue. One should be aware that while a thread indeed has full context it's partly shared with that of the other threads in the process. E.g. the FPU is shared between threads in a process. Since we only got one thread in our process the terms of thread and process is incidental. We ofcause now read the context of our target's single thread, saves it then changes the EIP in it an resets it to point to our page of code in the target processspace. Ofcause our code will now execute till the int 3 we inserted is reached, then it's suspended and control is back with us. We now reset the context of the thread and restore the page we abused for our code. Then we simply let it run. There is one last unfortunate thing about letting it run. If a process was created in Debug mode it stays in debug mode till it's terminated. That means that we need to stay in a loop of WaitForDebugEvent/COntiniueDebugEvent until that time where the process is actually terminated or the program will suspend itself and wait for our instructions. This wasn't too smart MS! Practical notes on the debug approach A last side note should be mentioned here. While I was doing this code I encountered a bug in windows NT workstation 4.0 build 1381. It might exist on other versions too. Code inside windows looks like this: mov eax, [offset of Context Storeing space in debugger code] ; this is obvioulsy a parameter mov ebx, [Temporaly variable containing ring level of debugger] test eax,ebx jnz insuficient_security everything Ok. Obviously this is wrong. To overcome this bug make sure that the offset where you store your context and'ed with 3 is 0. Further finding the ChunkVA of an imported function can easily be done by dumping the PE-file with Matt Pietreks PE-dump or similar. He gives the first chunk for each DLL, if your function isn't the first you add 4 bytes each time you need to move a line down to find our function. The sourcecodes for this can be found at: http://www.one.se/~stone/general/stnapih.arj --------------------------------------------------------------------- The MessageHook Approach Forthcoming source is forthcoming --------------- Literature MadMax! (1998) - madmasu.htm: Cracking useing kernel32??, by MadMax Feb 1998. @ http://fravia.org Natzgul (1998) - natz_mp2.htm: How to access the memory of a process, a Tutorial, by Natzgul Feb 1998, @ http://fravia.org Pietrek, Matt - Windows 95 System Programming Secrets, IDG books 1995. Various sourcecodes by Me :).. all can be found on my page http://www.one.se/~stone Thanks must go to: Patriarch / PWA, friend roomate and local expert. Random / Xforce, God of the PE-format Net Walker / Brazil United Cracking Force, my personal benefactor. All of which I had many enlightning discussions with. email: stone(at)one(point)se http://www.one.se Stone/UCF'98 2nd&mi! ----- doc end kind regards Stone / United Cracking Force '98
(c) 1998 Stone All rights reversed
You are deep inside reverser's page of reverse engineering, choose your way out:

advanced cracking
Back to Advanced cracking

HCU papers
Back to the Papers section

redhomepage redlinks redanonymity +ORC redstudents' essays redacademy database
redtools redcocktails redjavascripts wars redantismut CGI-scripts redsearch_forms redmail_reverser
redIs reverse engineering legal?