Monday, September 15, 2014

NoConName 2014 - inBINcible Reversing 400 Writeup

We (Spiderz) have finished 26th at the NCN CTF this year with 2200 points and we really did enjoy playing. I was able to solve both cannaBINoid (300p) and (inBINcible 400p) .I have actually found 2 solutions for inBINcible that I'll describe separately later in this write-up.

We were given an 32-bit ELF binary compiled from Golang (GO Programming Language).
The main function is "text" (also called main.main) and it is where interesting stuff happens. While digging through this routine the os_args array caught my attention, around address 0x08048DB3 it will access the memory location pointed by os_args+4 and compare its content to 2. This value is nothing but the number of command line arguments given to the executable (argc in C), so the binary is expecting a command line argument which is in fact the key.
By looking at the next lines , I saw an interesting check :
.text:08048EFE                 mov     ebx, dword ptr ds:os_args
.text:08048F04                 add     ebx, 8
.text:08048F07                 mov     esi, [ebx+4]
.text:08048F0A                 mov     ebx, [esp+0C0h+var_54]
.text:08048F0E                 mov     ebp, [ebx+4]
.text:08048F11                 cmp     esi, ebp
.text:08048F13                 jz      loc_8049048
As it's the first time I encounter GOlang I though that it was better to use GDB alongside with IDA so I fired up my linux machine , put a breakpoint on  0x08048F11 , gave the binary a 4 bytes long command line argument (./inbincible abcd) and then I examined both ESI and EBP .
esi = 0x4
ebp = 0x10
You can easily notice tha abcd length is 4 and the right length that should be given is 16 , we can safely assume now that the flag length is 16 bytes.
Note :
The instruction which initializes the flag length to 0x10 is this one (examine instructions before it)
.text:08048D05                 mov     [ebx+4], edi

If the length check is true runtime_makechan function will be called (0x08049073) which creates a channel as its name implies. After that we'll enter immediately a loop that will get to initializing some structure fields then calling  runtime_newproc for each character in the flag (16 in total). One of the variables is initialized with  "main_func_001" and it can be seen as the handler that is called when chanrecv is called.
After breaking out of the loop, ECX is set to 1 and then we'll enter another loop (0x080490DD). This loop calls chanrecv for each character in the input (under certain circumstances). chanrecv is supplied the current index of the input and a pointer to a local variable which I named success_bool. Basically our main routine will supply a pointer to success_bool to the channel which will assign another thread (probably the one created using runtime_newproc) executing the main_func_001 to do some checks then write a TRUE or FALSE value into the variable. After returning from chanrecv the boolean will be checked. If it's TRUE ecx will keep its value ( 1 ) and we'll move to the next character. However if main_func_001 has set the boolean to false ecx will be zeroed and the other characters of the user input won't be checked (fail).
I have actually found 2 methods to approach this challenge (with bruteforcing and without bruteforcing) :

1 - Getting the flag using bruteforce :

This solution consists of automating the debugger (GDB) to supply a 16 bytes length string as an argument , the current character (we basically start with the first character) will be changed during each iteration using a charset and the next character will be left unchanged until we've found the right current character of the key and so on. To find the right character we must break after returning from chanrecv then read the local variable (boolean) value , if it's 1 then we've got the right character and we shall save it then move to the next one, else we'll keep looking until finding the right one.

Here's a python GDB script that explains it better :

flag : G0w1n!C0ngr4t5!!

2 - Getting the key by analyzing main_func_001 :

As main_func_001 is the one responsible for setting the boolean value, analyzing this routine will give us the possibility to get the flag without any bruteforcing. Let's see what it does :
main_func_001 expects the boolean variable pointer and the index of the character to be tested. This index , as mentionned earlier, is the iterator of the loop which has called chanrecv.
For the purpose of checking each character main_func_001 uses 2 arrays , I called the first 5 bytes sized array Values_Array. The second array size is 16 bytes , same length as the password.
So here's how we can get the flag using the 2 arrays :

flag : G0w1n!C0ngr4t5!!

The final key to validate the challenge is NcN_sha1(G0w1n!C0ngr4t5!!)

Binary download : Here

Follow me on Twitter : Here

See you soon :>

Friday, September 5, 2014

Windows Internals - A look into SwapContext routine

Here I am really taking advantage of my summer vacations and back again with a second part of the Windows thread scheduling articles. In the previous blog post I discussed the internals of quantum end context switching (a flowchart). However, the routine responsible for context switching itself wasn't discussed in detail and that's why I'm here today.

Here are some notes that'll help us through this post :
 1 - The routine which contains code that does context switching is SwapContext and it's called internally by KiSwapContext. There are some routines that prefer to call SwapContext directly and do the housekeeping that KiSwapContext does themselves.
 2 - The routines above (KiSwapContext and SwapContext) are implemented in ALL context switches that are performed no matter what is the reason of the context switch (preemption,wait state,termination...).
 3 - SwapContext is originally written in assembly and it doesn't have any prologue or epilogue that are normally seen in ordinary conventions, imagine it like a naked function.
 4 - Neither SwapContext or KiSwapContext is responsible for setting the CurrentThread and NextThread fields of the current KPRCB. It is the responsibility of the caller to store the new thread's KTHREAD pointer into pPrcb->CurrentThread and queue the current thread (we're still running in its context) in the ready queue before calling KiSwapContext or SwapContext which will actually perform the context-switch.
 Usually before calling KiSwapContext, the old irql (before raising it to DISPATCH_LEVEL) is stored in CurrentThread->WaitIrql , but there's an exception discussed later in this article.

So buckle up and let's get started :
Before digging through SwapContext let's first start by examining what its callers supply to it as arguments.
SwapContext expects the following arguments:
- ESI : (PKTHREAD) A pointer to the New Thread's structure.
- EDI : (PKTHREAD) A pointer to the old thread's structure.
- EBX : (PKPCR) A pointer to PCR (Processor control region) structure of the current processor.
- ECX : (KIRQL) The IRQL in which the thread was running before raising it to DISPATCH_LEVEL.
By callers, I mean the KiSwapContext routine and some routines that call SwapContext directly (ex : KiDispatchInterrupt).
Let's start by seeing what's happening inside KiSwapContext :
This routine expects 2 arguments the Current thread and New thread KTHREAD pointers in ECX and EDX respectively (__fastcall).
Before storing both argument in EDI and ESI, It first proceeds to save these and other registers in the current thread's (old thread soon) stack:
EBP : The stack frame base pointer (SwapContext only updates ESP).
EDI : The caller might be using EDI for something else ,save it.
ESI : The caller might be using ESI for something else ,save it too.
EBX : The caller might be using EBX for something else ,save it too.
Note that these registers will be popped from this same thread's stack when the context will be switched from another thread to this thread again at a later time (when it will be rescheduled to run).
After pushing the registers, KiSwapContext stores the self pointer to the PCR in EBX (fs:[1Ch]).Then it stores the CurrentThread->WaitIrql value in ECX, now that everything is set up KiSwapContext is ready to call SwapContext.

Again, before going through SwapContext let me talk about routines that actually call SwapContext directly and exactly the KiDispatchInterrupt routine that was referenced in my previous post.
Why doesn't KiDispatchInterrupt call KiSwapContext ?
Simply because it just needs to push EBP,EDI and ESI onto the current thread's stack as it already uses EBX as a pointer to PCR.

Here, we can see a really great advantage of software context switching where we just save the registers that we really need to save, not all registers.

Now , we can get to SwapContext and explain what it does in detail.
The return type of SwapContext is a boolean value that tells the caller (in the new thread's stack) whether the new thread has any APCs to deliver or not.

Let's see what SwapContext does in these 15 steps:

1 - The first thing that SwapContext does is verify that the new thread isn't actually running , this is only right when dealing with a multiprocessor system where another processor might be actually running the thread.If the new thread is running SwapContext just loops until the thread stops running. The boolean value checked is NewThread->Running and after getting out of the loop, the Running boolean is immediately set to TRUE.

2 - The next thing SwapContext does is pushing the IRQL value supplied in ECX. To spoil a bit of what's coming in the next steps (step 13) SwapContext itself pops ECX later, but after the context switch. As a result we'll be popping the new thread's pushed IRQL value (stack switched).

3 - Interrupts are disabled, and PRCB cycle time fields are updated with the value of the time-stamp counter. After the update, Interrupts are enabled again.

4 - increment the count of context switches in the PCR (Pcr->ContextSwitches++;) , and push Pcr->Used_ExceptionList which is the first element of PCR (fs:[0]). fs:[0] is actually a pointer to the last registered exception handling frame which contains a pointer to the next frame and also a pointer to the handling routine (similar to usermode), a singly linked list simply. Saving the exception list is important as each thread has its own stack and thus its own exception handling list.

5 - OldThread->NpxState is tested, if it's non-NULL, SwapContext proceeds to saving the floating-points registers and FPU related data using fxsave instruction. The location where this data is saved is in the initial stack,and exactly at (Initial stack pointer - 528 bytes) The fxsave output is 512 bytes long , so it's like pushing 512 bytes onto the initial stack , the other 16 bytes are for stack-alignment I suppose.The Initial stack is discussed later during step 8.

6 - Stack Swapping : Save the stack pointer in OldThread->KernelStack and load NewThread->KernelStack into ESP. We're now running in the new thread's stack, from now on every value that we'll pop was previously pushed the last time when the new thread was preparing for a context-switch.

7 - Virtual Address Space Swapping : The old thread process is compared with the new thread's process if they're different CR3 register (Page directory pointer table register) is updated with the value of : NewThread->ApcState.Process->DirectoryTableBase. As a result, the new thread will have access to a valid virtual address space. If the process is the same, CR3 is kept unchanged. The local descriptor table is also changed if the threads' processes are different.

8 -  TSS Esp0 Switching : Even-though I'll dedicate a future post to discuss TSS (task state segment) in detail under Windows , a brief explanation is needed here. Windows only uses one TSS per processor and uses only (another field is also used but it is out of the scope of this article) ESP0 and SS0 fields which stand for the kernel stack pointer and the kernel stack segment respectively. When a usermode to kernelmode transition must be done as a result of an interrupt,exception or system service call... as part of the transition ESP must be changed to point to the kernel stack, this kernel stack pointer is taken from TSS's ESP0 field. Logically speaking, ESP0 field of the TSS must be changed on every context-switch to the kernel stack pointer of the new thread. In order to do so, SwapContext takes the kernel stack pointer at NewThread->InitialStack (InitialStack = StackBase - 0x30) ,it substrats the space that it has used to save the floating-point registers using fxsave instruction and another additional 16 bytes for stack alignment, then it stores the resulted stack pointer in the TSS's Esp0 field : pPcr->TssCopy.Esp0 (TSS can be also accessed using the TR segment register).

9 - We've completed the context-switch now and the old thread can be finally marked as "stopped running" by setting the previously discussed boolean value "Running" to FALSE. OldThread->Running = FALSE.

10 - If fxsave was previously executed by the new thread (the last time its context was switched), the data (floating-point registers...) saved by it is loaded again using xrstor instruction.

11 - Next the TEB (Thread environment block) pointer is updated in the PCR :
pPcr->Used_Self = NewThread->Teb . So the Used_Self field of the PCR points always to the current thread's TEB.

12 - The New thread's context switches count is incremented (NewThread->ContextSwitches++).

13 - It's finally the time to pop the 2 values that SwapContext pushed , the pointer to the exception list and the IRQL from the new thread's stack. the saved IRQL value is restored in ECX and the exception list pointer is popped into its field in the PCR.

14 - A check is done to see if the context-switch was performed from a DPC routine (Entering a wait state for example) which is prohibited. If pPrcb->DpcRoutineActive boolean is TRUE this means that the current processor is currently executing a DPC routine and SwapContext will immediately call KeBugCheck which will show a BSOD : ATTEMPTED_SWITCH_FROM_DPC.

15 - This is the step where the IRQL (NewThread->WaitIrql) value stored in ECX comes to use. As mentionned earlier SwapContext returns a boolean value telling the caller if it has to deliver any pending APCs. During this step SwapContext will check the new thread's ApcState to see if there are any kernel APCs pending. If there are : a second check is performed to see if special kernel APCs are disabled , if they're not disabled ECX is tested to see if it's PASSIVE_LEVEL, if it is above PASSIVE_LEVEL an APC_LEVEL software interrupt is requested and the function returns FALSE. Actually the only case that SwapContext returns TRUE is if ECX is equal to PASSIVE_LEVEL so the caller will proceed to lowering IRQL to APC_LEVEL first to call KiDeliverApc and then lower it to PASSIVE_LEVEL afterwards.

Special Case :
This special case is actually about the IRQL value supplied to SwapContext in ECX. The nature of this value depends on the caller in such way that if the caller will lower the IRQL immediately upon returning from SwapContext or not.
Let's take 2 examples : KiQuantumEnd and KiExitDispatcher routines. (KiQuantumEnd is the special case)

If you disassemble KiExitDispatcher you'll notice that before calling KiSwapContext it stores the OldIrql (before it was raised to DISPATCH_LEVEL) in the WaitIrql of the old thread so when the thread gains execution again at a later time SwapContext will decide whether there any APCs to deliver or not. KiExitDispatcher makes use of the return value of KiSwapContext (KiSwapContext returns the same value returned by SwapContext) to lower the IRQL. (see step 15 last sentence).
However, by disassembling KiQuantumEnd you'll see that it's storing APC_LEVEL at the old thread's WaitIrql without even caring about in which IRQL the thread was running before. If you refer back to my flowchart in the previous article you'll see that KiQuantumEnd always insures that SwapContext returns FALSE , first of all because KiQuantumEnd was called as a result of calling KiDispatchInterrupt which is meant to be called when a DISPATCH_LEVEL software interrupt was requested.Thus, KiDispatchInterrupt was called by HalpDispatchSoftwareInterrupt which is normally called by HalpCheckForSoftwareInterrupt. HalpDispatchSoftwareInterrupt is the function responsible for raising the IRQL to the software interrupt level (APC_LEVEL or DISPATCH_LEVEL) and upon returning from it HalpCheckForSoftwareInterrupt recovers back the IRQL to its original value (OldIrql). So the reason why KiQuantumEnd doesn't care about KiSwapContext return value because it won't proceed to lowering the IRQL (not its responsibility) nor to deliver any APCs that's why it's supplying APC_LEVEL as an old IRQL value to SwapContext so that it will return FALSE. However, a software interrupt might be requested by SwapContext if there are any pending APCs.
KiDispatchInterrupt which calls SwapContext directly uses the same approach as KiQuantumEnd, instead of storing the value at OldThread->WaitIrql it just moves it into ECX.

Post notes :
- Based on Windows 7 32 bit :>
- For any questions or suggestions feel free to leave a comment below or send me an email :

See you again soon :)


Saturday, August 30, 2014

Windows Internals - Quantum end context switching

Lately I decided to start sharing the notes I gather , almost daily , while reverse engineering and studying Windows. As I focused in the last couple of days on studying context switching , I was able to decompile the most involved functions and study them alongside with noting the important stuff. The result of this whole process was a flowchart.

Before getting to the flowchart let's start by putting ourselves in the main plot :
As you might know, each thread runs for a period of time before another thread is scheduled to run, excluding the cases where the thread is preempted ,entering a wait state or terminated. This time period is called a quantum. Everytime a clock interval ends (mostly 15 ms) the system clock issues an interrupt.While dispatching the interrupt, the thread current cycle count is verified against its cycle count target (quantum target) to see if it has reached or exceeded its quantum so the context would be switched the next thread scheduled to run.
Note that a context-switch in Windows doesn't happen only when a thread has exceeded its quantum, it also happens when a thread enters a wait state or when a higher priority thread is ready to run and thus preempts the current thread.

As it will take some time to organize my detailed notes and share them here as an article (maybe for later),consider the previous explanation as a small introduction into the topic. However ,the flowchart goes through the details involved in quantum end context switching.

Please consider downloading the pdf  to be able to zoom as much as you like under your PDF reader because GoogleDocs doesn't provide enough zooming functionality to read the chart.

Preview (unreadable) :

PDF full size Download  : GoogleDocs Link

P.S :
- As always , this article is based is on : Windows 7 32-bit
- Note that details concerning the routine that does the context switching (SwapContext) aren't included in the chart and are left it for a next post.

See you again soon.


Wednesday, July 9, 2014

OkayToCloseProcedure callback kernel hook

Hi ,

During the last few weeks I was busy exploring the internal working of Handles under Windows , by disassembling and decompiling certain kernel (ntoskrnl.exe) functions under my Windows 7 32-bit machine.In the current time I am preparing a paper to describe and explain what I learned about Handles. But today I’m here to discuss an interesting function pointer hook that I found while decompiling and exploring the ObpCloseHandleEntry function. (Source codes below).

A function pointer hook consists of overwriting a callback function pointer so when a kernel routine will call the callback function, the hook function will be called instead . The function pointer that we will be hooking in this article is the OkayToCloseProcedure callback that exists in the _OBJECT_TYPE_INITIALIZER structure which is an element of the OBJECT_TYPE struct.

Every object in Windows has an OBJECT_TYPE structure which specifies the object type name , number of opened handles to this object type ...etc OBJECT_TYPE also stores a type info structure (_OBJECT_TYPE_INITIALIZER) that has a group of callback functions (OpenProcedure ,CloseProcedure…) . All OBJECT_TYPE structures pointers are stored in the unexported ObTypeIndexTable array.

As I said earlier , the OkayToCloseProcedure is called inside ObpCloseHandleEntry function.In general this function (if the supplied handle is not protected from being closed) frees the handle table entry , decrements the object’s handle count and reference count.
Another case when the handle will not be closed is if the OkayToCloseProcedure returned 0 , in this case the ObpCloseHandleTableEntry returns STATUS_HANDLE_NOT_CLOSABLE.
I will discuss handles in more details in my future blog posts.

So how the OkayToCloseProcedure is called ?

ObpCloseHandleTableEntry function actually gets the Object (which the handle is opened to) header (_OBJECT_HEADER). A pointer to the object type structure (_OBJECT_TYPE) is then obtained by accessing the ObTypeIndexTable array using the Object Type Index from the object header (ObTypeIndexTable[ObjectHeader->TypeIndex]).

The function will access the OkayToCloseProcedure field and check if it’s NULL , if that’s true the function will proceed to other checks (check if the handle is protected from being closed). If the OkayToCloseProcedure field isn’t NULL , the function will proceed to call the callback function. If the callback function returns 0 the handle cannot be closed and ObpCloseHandleTableEntry will return STATUS_HANDLE_NOT_CLOSABLE. If it returns a value other than 0 we will proceed to the other checks as it happens when the OkayToCloseProcedure is NULL.

An additional point is that For some reason , the OkayToCloseProcedure must always run within the context of the process that opened the handle in the first place (a call to KeStackAttachProcess). I don’t think that this would be a problem if ObpCloseHandleTableEntry is called as a result of calling ZwClose because we’ll be running in the context of the process that opened the handle.
But if ObpCloseHandleTableEntry was called from another process context and tried to close another process’s handle table entry the OkayToCloseProcedure must run in that process context. That’s why ObpCloseHandleTableEntry takes a pointer to the process object (owner of the handle) as a parameter.

Applying the hook :

Now after we had a quick overview of what’s happening , let’s try and apply the hook on the OBJECT_TYPE_INITIALIZER’s OkayToCloseProcedure field.
I applied the hook on the Process object type , we can obtain a pointer to the process object type by taking advantage of the exported PsProcessType , it’s actually a pointer to a pointer to the process’s object type.

Here’s a list containing the exported object types :
POBJECT_TYPE *ExEventObjectType;
POBJECT_TYPE *ExSemaphoreObjectType;
POBJECT_TYPE *IoFileObjectType;
POBJECT_TYPE *SeTokenObjectType;
POBJECT_TYPE *PsProcessType;
POBJECT_TYPE *TmEnlistmentObjectType;
POBJECT_TYPE *TmResourceManagerObjectType;
POBJECT_TYPE *TmTransactionManagerObjectType;
POBJECT_TYPE *TmTransactionObjectType;

A second way to get an object’s type is by getting an existing object’s pointer and then pass it to the exported kernel function ObGetObjectType which will return a pointer to the object’s type.

A third way is to get a pointer to the ObTypeIndexTable array, it’s unexported by the kernel but there are multiple functions using it including the exported ObGetObjectType function.So the address can be extracted from the function's opcodes , but that will introduce another compatibility problem. After getting the pointer to the ObTypeIndexTable you'll have to walk through the whole table and preform a string comparison to the target's object type name ("Process","Thread" ...etc) against the Name field in each _OBJECT_TYPE structure.

In my case I hooked the Process object type , and I introduced in my code the 1st and the 2nd methods (second one commented).
My hook isn’t executing any malicious code !! it’s just telling us (using DbgPrint) that an attempt to close an open handle to a process was made.
“An attempt” means that we’re not sure "yet" if the handle will be closed or not because other checks are made after a successful call to the callback.And by a successful call , I mean that the callback must return a value different than 0 that’s why the hook function is returning 1. I said earlier that the ObpCloseHandleTableEntry will proceed to check if the handle is protected from being closed  (after returning from the callback) if the OkayToCloseProcedure is null or if it exists and returns 1 , that's why it’s crucial that our hook returns 1.One more thing , I’ve done a small check to see if the object type’s OkayToCloseProcedure is already NULL before hooking it (avoiding issues).

Example :
For example when closing a handle to a process opened by OpenProcess a debug message will display the handle value and the process who opened the handle.
As you can see "TestOpenProcess.exe" just closed a handle "0x1c" to a process that it opened using OpenProcess().

P.S : The hook is version specific.

Source codes :
Decompiled ObpCloseHandleTableEntry :
Driver Source Code

Your comments are welcome.

Souhail Hammou.


Monday, June 9, 2014

MCSC CTF 2014 - Reverse 500 Write-up

Hello everyone,
Sorry for the late writeup , MCSC (Moroccan Cyber Security Challenge) was really great challenge and we really enjoyed it.
This challenge was worth 500 points and we were the only team to solve it.
Given my limited skills at linux revesing I guess my method at solving this isn't the best out there :D  but let's get to it anyway :

We were given 2 files "easyHW" and "opcodes.bin" , easyHW is an x86 ELF and opcodes.bin was a raw file which has "opcodes" as its name implies.
You can download them from this temporary link :
The binary (easyHW) takes a file name as an argument which is meant to be "opcodes.bin" for sure in this crackme.

I disassembled easyHW under IDA and started exploring its main function.
When supplying an argument the binary will call a function named "read_opcodes" which will open the supplied file , copies its content into an uninitialized 4KB space in the bss section. The uninitialized page is defined as "buffer". When returning from this function we guaranteed that the file is now copied into the buffer.
The loop can be simply decompiled to :
for(i = 0;i < 4096;i++){
    buffer[i] ^= 0xAA;
    buffer[i] -= 10;
/*call to mmap ...etc*/
All I did is replicate the loop and decrypt the "opcodes.bin" file.

Well now all we need to know is where the execution will be given to this code.
The call to mmap isn't given any file to map .If the returned virtual address is valid (mmap returned success) the returned chunk will be zeroed and then the decrypted file will be copied to (chunk_start_address + 0x100).

After doing so , the subsequent code appears to fill in a structure which pointer will be supplied to the vm86 function. This is actually the key function to solve this crackme.

vm86 will makes access possible to virtual 8086 mode which actually runs 16-bit code (used by realmode application that can't run in protected mode).
more here :
Digging a little bit more in google, this function appears to be also implemented in anti-debugging. I didn't give many importance to this as I'm doing everything in static :).

As this function executes 16-bit code , thus the opcodes that we just decrypted are meant to be executed as 16-bit opcodes. So I directly dragged the file under IDA and disassembled it as 16-bit code.

Interesting parts of the decrypted opcodes :
So the 16-bit code will write to screen the string at offset 0x200 in the mapped file, which is at offset 0x100 in our decrypted 16-bit file.
"[~] Password : "
And then it will read from the user and this input will be stored at address 0x500.
The input length must be 20 character (length calculated at sub_DF) , if it is right we will enter the following loop :

A quick manual decompilation and interpretation C :

// user_pass at 0x500
// enrypted_pass at 0x263 (file offset 0x163)
int i = 0;
while(i != 20){
    if(encrypted_pass[i] != ( (user_pass[i] ^ 0xCD) - 1 )){
        /*string at offset 0x12C in our decrypted file*/
        printf("[x] Password Invalid x(\n");
        /*exit we're done*/
/*string at offset 0x145 in our decrypted file*/
printf("[+] Congratz .... you WIN !!\n");
/*exit we're done*/
So we're taking the userpass enrypt it the  compare it to the enrypted pass at offset 0x163 in the decrypted opcode.bin file.
The enrypted pass in hex : "87 9F B7 A0 F8 B8 FC BE BD F8 BE FD AE F8 A8 F8 BD BD E2 EB"
To decrypt it we simply increment each byte and then XOR it with 0xCD.
And we'll have our flag : Emul4t0rs4r3b4d4ss.!

500 points to victory :) !!

Cheers everyone.

Souhail from Spiderz.

Sunday, May 4, 2014

Windows Heap Overflow Exploitation

Hi ,

In this article I will be talking about exploiting a custom heap : which is a big chunk of memory allocated by the usermode application using VirtualAlloc for example . The application will then work on managing 'heap' block allocations and frees (in the allocated chunk) in a custom way with complete ignorance of the Windows's heap manager. This method gives the software much more control over its custom heap, but it can result in security flaws if the manager doesn't do it's job properly , we'll see that in detail later.

To see an implementation of a custom heap manager in C/C++ please refer to my previous blog post :
Heap Manager Source code :

The vulnerability that we'll exploit together today is a 'heap' overflow vulnerability that's occuring in a custom heap built by the application. The vulnerable software is : ZipItFast 3.0 and we'll be exploiting it today and gaining code execution under Windows 7 . ASLR , DEP , SafeSEH aren't enabled by default in the application which makes it even more reliable to us . Even though , there's still some painful surprises waiting for us ...

Let's just start :
The Exploit :
I've actually got the POC from exploit-db , you can check it right here :

Oh  , and there's also a full exploit here :

Unfortunately , you won't learn much from the full exploitation since it will work only on Windows XP SP1. Why ? simply because it's using a technique that consists on overwriting the vectored exception handler node that exists in a static address under windows XP SP1. Briefly , all you have to do is find a pointer to your shellcode (buffer) in the stack. Then take the stack address which points to your pointer and after that substract 0x8 from that address and then perform the overwrite. When an exception is raised , the vectored exception handlers will be dispatched before any handler from the SEH chain, and your shellcode will be called using a CALL DWORD PTR DS: [ESI + 0x8] (ESI = stack pointer to the pointer to your buffer - 0x8). You can google the _VECTORED_EXCEPTION_NODE and check its elements.

And why wouldn't this work under later versions of Windows ? Simply because Microsoft got aware of the use of this technique and now EncodePointer is used to encode the pointer to the handler whenever a new handler is created by the application, and then DecodePointer is called to decode the pointer before the handler is invoked.

Okay, let's start building our exploit now from scratch. The POC creates a ZIP file with the largest possible file name , let's try it :
N.B : If you want to do some tests , execute the software from command line as follows :
    Cmd :> C:\blabla\ZipItFast\ZipItFast.exe C:\blabla\
  Then click on the Test button under the program.

Let's try executing the POC now :
An access violation happens at 0x00401C76 trying to access an invalid pointer (0x41414141) in our case. Let's see the registers :

Basically the FreeList used in this software is a circular doubly linked lists similar to Windows's . The circular doubly linked list head is in the .bss section at address 0x00560478 and its flink and blink pointers are pointing to the head (self pointers) when the custom heap manager is initialized by the software.
I also didn't check the full implementation of the FreeList and the free/allocate operations in this software to see if they're similar to Windows's (bitmap , block coalescing ...etc).

It's crucial also to know that in our case , the block is being unlinked from the FreeList because the manager had a 'request' to allocate a new block , and it was chosen as best block for the allocation.

Let's get back to analysing the crash :
- First I would like to mention that we'll be calling the pointer to the Freelist Entry struct : "entry".

Registers State at 0x00401C76 :
EAX = entry->Flink
EDX = entry->Blink
[EAX] = entry->Flink->Flink
[EAX+4] = entry->Flink->Blink (Next Block's Previous block)
[EDX] = entry->Blink->Flink (Previous Block's Next Block)
[EDX+4] =entry->Blink->Blink

Logically speaking : Next Block's Previous Block and Previous Block's Next Block are nothing but the current block.
So the 2 instructions that do the block unlinking from the FreeList just :
- Set the previous freelist entry's flink to the block entry's flink.
- Set the next freelist entry's blink to the block entry's blink.
By doing so , the block doesn't belong to the freelist anymore and the function simply returns after that. So it'll be easy to guess what's happening here , the software allocates a static 'heap' block to store the name of the file and it would have best to allocate the block based on the filename length from the ZIP header (this could be a fix for the bug , but heap overflows might be found elsewhere , I'll propose a better method to fix ,but not fully, this bug later in this article).

Now , we know that we're writing past our heap block and thus overwriting the custom metadata of the next heap block (flink and blink pointers). So, We'll need to find a reliable way to exploit this bug , as the 2 unlinking instructions are the only available to us and we control both EAX and EDX. (if it's not possible in another case you can see if there are other close instructions that might help), you can think of overwriting the return address or the pointer to the structured exception handler as we have a stack that won't be rebased after reboot.
This might be a working solution in another case where your buffer is stored in a static memory location.
But Under Windows 7 , it's not the case , VirtualAlloc allocates a chunk of memory with a different base in each program run. In addition , even if the address was static , the location of the freed block that we overwrite varies. So in both cases we'll need to find a pointer to our buffer.

The best place to look is the stack , remember that the software is trying to unlink (allocate) the block that follows the block where we've written the name , so likely all near pointers in the stack (current and previous stack frame) are poiting to the newly allocated block (pointer to metadata) . That's what we don't want because flink and blink pointers that we might set might not be valid opcodes and might cause exceptions , so all we need to do is try to find a pointer to the first character of the name and then figure out how to use this pointer to gain code execution , this pointer might be in previous stack frames.

And here is a pointer pointing to the beginning of our buffer : 3 stack frames away

 Remember that 0x01FB2464 will certainly be something else when restarting the program , but the pointer 0x0018F554 is always static , even when restarting the machine.

So when I was at this stage , I started thinking and thinking about a way that will help me redirect execution to my shellcode which is for sure at the address pointed by 0x0018F554 , and by using only what's available to me :
- Controlled registers : EAX and EDX.
- Stack pointer to a dynamic buffer pointer.
- 2 unlinking instructions.
- No stack rebase.

Exploiting the vulnerability and gaining code execution:

And Then I thought , why wouldn't I corrupt the SEH chain and create a Fake frame ? Because when trying to corrupt an SEH chain there are 3 things that you must know :

- SafeSEH and SEHOP are absent.
- Have a pointer to an exisiting SEH frame.
- Have a pointer to a pointer to the shellcode.

The pointer to the shellcode will be treated as the handler,and the value pointed by
((ptr to ptr to shellcode)-0x4) will be treated as the pointer to the next SEH frame.
Let's illustrate the act of corrupting the chain : (with a silly illustration , sorry)

Let me explain :

we need to achieve our goal by using these 2 instructions , right ? :


We'll need 2 pointers and we control 2 registers , but which pointer give to which register ? This must not be a random choice because you might overwrite the pointer to the shellcode if you chose EAX as a pointer to your fake SEH frame.
So we'll need to do the reverse , but with precaution of overwriting anything critical.
In addition we actually don't care about the value of "next SEH frame" of our fake frame.

So our main goal is to overwrite the "next SEH frame" pointer of an exisiting frame , to do so we need to have a pointer to our fake frame in one of the 2 registers. As [EAX+4] will overwrite the pointer to the buffer if used as a pointer to the fake SEH frame , we will use EDX instead. We must not also overwrite the original handler pointer because it will be first executed to try to handle the exception , if it fails , then our fake handler (shellcode) will be invoked then.
So : EDX = &(pointer to shellcode) - 0x4  = Pointer to Fake "Next SEH frame" element.
EDX must reside in the next frame field of the original frame which is : [EAX+4].
And EAX = SEH Frame - 0x4.

Original Frame after overwite :
Pointer to next SEH : Fake Frame
Exception Handler   : Valid Handler

Fake Frame :
Pointer to next SEH : (Original Frame) - 0x4 (we just don't care about this one)
Exception Handler   : Pointer to shellcode

The SEH frame I chose is at : 0x0018F4B4

So : EAX = 0x0018F4B4 - 0x4 = 0x0018F4B0   and EDX =0x0018F554 - 0x4 = 0x0018F550

When the overwrite is done the function will return normally to its caller , and all we have to do now is wait for an exception to occur . An exception will occur after a dozen of instructions as the metadata is badly corrupted. The original handler will be executed but it will fail to handle the access violation and then our fake handler will be called which is the shellcode .

Making the exploit work :

Now all we need to do is calculate the length between the 1st character of the name and the flink and blink pointers , and then insert our pointers in the POC.

Inserting the shellcode :
The space between the starting address of the buffer and the heap overwritten metadata is not so large , so it's best to put an unconditional jump at the start of our buffer to jump past the overwritten flink and blink pointers and then put the shellcode just after the pointers. As we can calculate the length , this won't cause any problem.

Final exploit here :

I chose a bind shellcode , which opens a connection to ( Let's try opening the ZIP file using ZipItFast and then check "netstat -an | find "4444" :

 Bingo !

A Fix for this vulnerability ??

The method I stated before which consists on allocating the block based on the filename length from the ZIP headers can be valid only to fix the vulnerability in this case , but what if the attackers were also able to cause an overflow elsewhere in the software ?
The best way to fix the bug is that : when a block is about to be allocated and it's about to be unlinked from the Freelist the first thing that must be done is checking the validity of the doubly linked list , to do so : safe unlinking must be performed and which was introduced in later versions of Windows.
Safe unlinking is done the following way :

if ( entry->flink->blink != entry->blink->flink || entry->blink->flink != entry){
    //Fail , Freelist corrupted , exit process
else {
    //Unlink then return the block to the caller

Let's see how safe unlinking is implemented under Windows 7 :
The function is that we'll look at is : RtlAllocateHeap exported by ntdll

Even if this method looks secure , there is some research published online that provides weaknesses of this technique and how can it be bypassed. I also made sure to implement this technique in my custom heap manager (Line 86) , link above.

I hope that you've enjoyed reading this paper .

See you again soon ,

Souhail Hammou.

Sunday, April 6, 2014

Retrieving an exported function address within a loaded module

Today I came with a shellcode friendly blogpost , I didn't have time to write the code in ASM but I'll do it as soon as I have some time.
The goal of this small blogpost is retrieving a pointer to an exported function of choice within a loaded module and certainly without the use of any API calls. 
This method can be really effective against ASLR, the only condition that needs to be met is that the function must be exported by the loaded module.
The process of applying this technique consists of retrieving the base address of the loaded module by walking through a doubly linked list. The linked list head pointer can be found using : &(PEB->Ldr.InMemoryOrderModuleList)
To walk the linked list we'll be just interested in the Flink pointer. Besides being a pointer to another list entry it's also a pointer to a _LDR_DATA_TABLE_ENTRY structure which contains ,among other elements, the Module name and the Module Base.
After running the search : if the Base Address was successfully found , we must get the address of the Export directory within the loaded module . In order to do that we perform some basic PE parsing to find the DataDirectory Array.
The first element of this array "DataDirectory[0]" contains the RVA to the export directory within the module , so all we have to do is add the RVA to the base to get the address.
The key elements of the Export directory that we'll need are the following :

AddressOfNames : This is an array of  RVAs to ASCII strings which we'll use to search for the target function's RVA to name.

AddressOfNameOrdinals : This is an array of exported functions ordinals used as indexes to the AddressOfFunctions array.We will use the index found from AddressOfNames to retrieve the ordinal.

AddressOfFunctions : This is an array of function's RVAs , we will access this array using the ordinal that we retrieved previously in order to get the RVA to the exported function.
Finally All we have to do is Add the base address to the RVA and we'll get a valid pointer to the exported function.

Here's an example of getting a pointer to LoadLibraryA :

See you soon ;