Saturday, November 29, 2014

Windows Thread Suspension Internals Part 2

In the last blog post I talked about both NtSuspendThread and PsSuspendThread kernel routines. If you didn't check the first part I recommend to check it first : here
This part is dedicated to KeSuspendThread and KiSuspendThread routines (fun stuff).
Let's get started by looking a KeSuspendThread : (Windows 7 32-bit SMP as usual)
(pseudo-C) :
A quick overview of KeSuspendThread shows that it's actually the one responsible of calling KiInsertQueueApc in order to queue the target thread's suspend APC in its kernel APC queue. But that's not the only thing happening here , so take it slow and go step by step into what routine does.

As you can notice we start first by raising the irql to DISPATCH_LEVEL, this means we're running in the same irql where the thread dispatcher does so our thread is guaranteed to be running on this processor until the irql drops below DISPATCH_LEVEL. As I'm on a multiprocessor machine this doesn't protect from accessing shared objects safely as another thread executing on another processor might access the object simultaneously. That's why a couple of locks must be acquired in order to continue the execution of the routine , the first lock that KeSuspendThread tries to acquire is the APC queue lock (Thread->ApcQueueLock). After acquiring the lock, execution continues and the thread's previous suspend count is saved , then it is compared with the maximum value that a suspend count might reach (0x7F). The irql is lowered to it's old value and a fatal exception is raised with status (STATUS_SUSPEND_COUNT_EXCEEDED) if the SuspendCount is equal to that value. As I mentioned in the last part PsSuspendThread calls KeSuspendThread within a try-except statement so the machine won't bugcheck as a result of that exception.
If the target thread's suspend count is lower that 0x7F (general case), a second check is done against Thread->ApcQueuable bit to check whether APCs can be queued to that thread or no. Here I want to mention that if you patch that bit using windbg or a driver of a given thread object that thread becomes immune to suspending and even termination as it is done also using an APC.
If the bit is set (generally the case also), the target thread's suspend count is incremented. Next , the routine checks if the thread isn't suspended nor frozen.
If that's also true a third check is done :
line 29 : if(Thread->SuspendApc.Inserted == TRUE) { ....
The SuspendApc is a KAPC stucture , and the Inserted field is a boolean that represents whether the APC was inserted in the APCs queue or not.
Let's start by seeing the else statement at line 38 first and get back to this check. So basically we'll be in the else statement if (SuspendApc.Inserted == FALSE) , it will simply set the APC's Inserted boolean to TRUE and then call KiInsertQueueApc to insert the suspend APC in the target's thread kernel APCs queue. KiInsertQueueApc is internally called by the exported KeInsertQueueApc.

The check at line 29 is confusing, since if the SuspendApc.Inserted is TRUE this already means that the suspend count is different than 0 so we won't even reach this if statement.As we'll see in a later article KeResumeThread is the routine that actually decrements the SuspendCount but it doesn't proceed to do so until it acquires the ApcQueue lock , so this eliminates the fact that KeResumeThread and KeSuspendThread are operating simultaneously on the same target thread (SMP machine). If this check turns out true for a reason , we acquire a lock to safely access and modify the SuspendSemaphore initialized previously by &Thread->SuspendSemaphore and then decrement the Semaphore Count to turn it into the non-signaled state apparently.
If the SuspendApc is now queued , its kernel and normal routines (KiSuspendNop and KiSuspendThread respectively) will be executed as soon as KiDeliverApc is called in the context of the target thread.
The SuspendApc is initialized in KeInitThread  this way :
Let's now take a look at KiSuspendThread normal APC routine :
It simply calls KeWaitForSingleObject to make the thread wait for the SuspendSemaphore to be in the signaled state.
The Suspend semaphore is also initialized in KeInitThread routine :
As you can see the count limit is set to 2 and the initial semaphore is 0. As we'll see later when talking about thread resumption : each synchronization object has a header structure defined as : _DISPATCHER_HEADER, this structure contains the synchronization object's Type (mutant , thread , semaphore ...) , Lock , SignalState fields and some other flags.
The SignalState field in a semaphore is the same as the semaphore count and the semaphore count must not exceed the limit. Semaphores ,when in signaled state (semaphore count > 0) , satisfy the wait for semaphore count threads and unsignal the semaphore. Means if 4 threads are waiting on a semaphore and it became in a signaled state with a semaphore count of 2 , 2 threads will satisfy the wait and the semaphore will become non-signaled. The next waiting thread won't get a chance to run until one of the released threads releases the semaphore , resulting in its semaphore count being incremented (signaled state). 

Let's get back to the SuspendSemaphore now. As I said earlier, it is initialized as non-signaled in the first place so when a thread is suspended it'll stay in the wait state until the semaphore becomes signaled. In fact KeResumeThread is the responsible routine for turning the semaphore into the signaled state and then calling KiSignalSynchronizationObject to unlink the wait block and signal the suspended thread (future posts).

As we discovered together what happens when suspending a thread in detail , the next blog posts will be dedicated to talking about what happens when we call ResumeThread or ZwResumeThread. Stay tuned.

Follow me on twitter : here
- Souhail

Thursday, November 27, 2014

Windows Thread Suspension Internals Part 1

It's been a while since I haven't shared anything concerning Windows internals and I'm back to talk in detail about how Windows thread suspension and resumption works. I'm going to discuss the mentioned topics in this blog post and incoming ones. Even though it can be discussed in one or two entries but I'm quite busy with studies.

As you might already know Windows uses APCs (Asynchronous Procedure Calls) to perform thread suspension. This may form an incomplete image of what's going on in detail as other tasks are being performed besides queuing the suspend APC. I will share throughout this article the details about what's happening and some pseudo code snippets of the reversed routines (Windows 7 32-bit SMP).

Let's say that a usermode thread 'A' wanted to suspend a second usermode thread 'B' , it has to simply call SuspendThread with a previously opened handle to the target thread.
DWORD WINAPI SuspendThread(HANDLE hThread);
Upon the call we'll be taken to kernel32.dll then to kernelbase.dll which simply provides a supplementary argument to NtSuspendThread and calls it from ntdll.dll .
NTSTATUS NtSuspendThread(HANDLE ThreadHandle,PULONG PreviousSuspendCount );
The thread's previous suspend count is basically copied from kernel to *PreviousSuspendCount.
Ntdll then takes us to kernel land where we'll be executing NtSuspendThread.

- NtSuspendThread :
 If we came from usermode (CurrentThread->PreviousMode == UserMode), probing the PreviousSuspendCount pointer for write is crucial. Next, a pointer to the target thread object is obtained by calling ObReferenceObjectByHandle , if we succeed PsSuspendThread is called ; its return type is NTSTATUS and that is the status code returned to the caller (in PreviousMode) after calling ObDereferenceObject and storing the previous count value in the OUT (PreviousSuspendCount) argument if it's not NULL.

- PsSuspendThread :
Prototype : NTSTATUS PsSuspendThread(PETHREAD Thread,PULONG PreviousSuspendCount)
Here's a pseudo C manual decompilation of the routine code :

As you can see, PsSuspendThread starts with entering a critical region and then it tries to acquire run-down protection of the target thread to suspend , acquiring run-down protection for the thread guarantees that we can access and operate on the thread object safely without it being deleted. As you might already know a present thread object in memory doesn't mean that the thread isn't terminating or wasn't terminated simply because an object isn't deleted until all the references on that object are released (reference count reaches zero). The next check of the Terminated bit explains it , so if the thread is actually terminating or was terminated PsSuspendProcess return STATUS_THREAD_IS_TERMINATING. Let's suppose that our thread is up and running. KeSuspendThread will be called as a result and ,unlike the previous routines, will returns the previous count that we've previously spoken about. As we'll see later on KeSuspendThread raises a critical exception (by calling RtlRaiseStatus) if the thread suspend limit was exceeded (0x7F) that causes a BSOD if no exception handler is in place , so the kernel calls this function within a try-except statement. Upon returning from KeSuspendThread successfully , a recheck of the target thread is done to see if the thread was terminating while suspending , if that's true the thread is forced to resume right away by calling KeForceResumeThread (we'll see this routine in detail later when talking about thread resumption) and the previous suspend count is zeroed. Finally the executing thread leaves the critical region and dereferences the PreviousSuspendCount pointer with the value returned from KeSuspendThread or 0 in the case where KeForceResumeThread was called.

That's all for this short entry , in the next parts about thread suspension I'll talk about KeSuspendThread , the suspend semaphore and the KiSuspendThread kernel APC routine.

Follow me on twitter : Here

- Souhail.

Monday, October 13, 2014

ASIS CTF Finals 2014 - Satellite Reloaded Reverse 250 Writeup

I really enjoyed playing this CTF with Spiderz team and we ended at position 23.
This reversing challenge was for 250 points , and here's a brief write-up about it :
The binary expects a string as a command line argument and it starts in the beginning by decrypting a string stored statically at .data section. If the position of the character is an even number the character is xored by 0xD4, if it's an odd number the character is xored with 0xD5.
After decrypting , we get the following large equation :

As you can see, each element a[x] is actually the character of position 'x' in the string that we'll supply. Similarly to the Satellite task we're deal with the boolean satisfiability problem .
If you take a second look at the long decrypted string we've got , you'll see that each character (with the exception of a[220] which not referenced anyway) of the string is referenced exactly 3 times and it is always tested against another character which is static in the 3 cases. So to solve this we'll just rely on studying each 2 characters couples alone. 
For example to make this statement true :
(a[12] | a[20]) & ( ! a[12] | !a[20]) & (!a[12] | a[20])
a[12] must equal :  0
a [20] must equal : 1
Another example :
(!a[22] | a[150]) & (a[22] | a[150]) & (a[22] | !a[150])
In this case both a[22] and a[150] must be equal to 1 to make this statement true.
In the string shared in pastebin above you'll see that in some cases there's a double NOT (! !) , we'll just remove it as it doesn't change anything.

So to script this we don't basically need to study the SAT algorithm any longer, we can just separate this long string into 2 arrays. Each element of the 2 arrays is a logical equation (ex : "( ! a[22]  |  a[50]  )" ).
The first array will only have the elements that have a NOT ('!') on the both chars or that doesn't have any NOTs in them (ex : ( a[55] | a[60] ) and this one ( ! a[70] | ! a[95] ) )
The other array will have all the equations that have a NOT preceding either one of the chars. (ex : ( ! a[22] | a[50] ).
The reason why I did this because in the case of the first example I gave (here it is : 
(a[12] | a[20]) & ( ! a[12] | !a[20]) & (!a[12] | a[20]) ) there will be 2 occurrences of a[12] in the first array which makes it hard to decide whether it's equal to a 0 or 1 , here comes the 2nd array that I called "decide" that will decide by this equation : (!a[12] | a[20]) whether a[12] is 0 or 1 , which is 0 in this case.
So If only one instance of a given a[x] is found in the first array we can decide it's value directly , but if we have 2 instances we'll need to rely on the decide array.

Oh , I almost forgot , there's a character which isn't referenced in this string and which is a[220] . As the flag is generated based on our input, I tested the flag with this character set to 0 and 1. And it basically worked for 0.

Here's the script I wrote and described in this write-up (got us some bonus points though :D ) :

Cheers :).

Monday, September 22, 2014

CSAW CTF 2014 - "saturn" Exploitation 400 Write-up


The description for this task was :

    You have stolen the checking program for the CSAW Challenge-Response-Authentication-Protocol system. Unfortunately you forgot to grab the challenge-response keygen algorithm ( Can you still manage to bypass the secure system and read the flag?

    nc 8888

I grabbed the binary , threw it in IDA and then started looking at the main routine. The first function that was called in main was _fillChallengeResponse and it takes two arguments . I named them : fill_arg0 and fill_arg4.
A quick check reveals that this function is imported from an external library (the one we 'forgot' to grab). Also by checking the arguments passed to the function they appear to be pointers , each pointer points to a 32 bytes array in the bss section.We can also see that the first array is directly followed by the next one.

As fillChallengeResponse is given 2 pointers , we can safely guess that its mission is to fill them with the right data.

Let's carry on :

Next, we will enter this loop. Its was previously initialized to 0 and we'll quit the loop only if the iterator is strictly above 0. In this loop, we are first prompted to supply an input in which only the first byte is read , the byte is saved at [esp+1Bh] and the switch statement only uses the highest order nibble of the read byte.
If the switch statement was supplied 0xA0 , it will lead to retrieving the original read byte (0xA2 for example) and then call a function that will access the Array1 and print the dword at the index described by the lowest order nibble of the read byte multiplied by 4 ((0xA2 & 0xF)*4 = 8 for example).
If the switch statement was supplied 0xB0 , the executed block of code will retrieve the original read byte and then call a function that will wait for user input and then compare that input to the dword indexed by the lowest orded nibble of the original byte multiplied by 4 in Array2. If the 2 values are equal another 8 sized array of bytes will be accessed and 1 is written into the same index indicated by the lowest order nibble.
If the switch statement was supplied 0x80 , it will call a function that walk through the array of bytes checking if all the elements are equal to 1. If it's the case , the function will print the contents of "flag.txt".

The trick here is to take advantage of the read_array1 function , to make it print the Array2 and then pass each dword read from Array2 to the check_array2 function. As we already know Array1 and Array2 are sticked to each other and each ones size is 16 bytes this means that supplying 0xA8 will make us read the first dword of the Array2 . So all we need to do is supply 0xA8 as an input , save the printed value from read_array1 function , supply 0xE0 as an input (switch) then supply the saved printed value as a input (in check_array2) , this will result in setting the first byte of the 8 bytes sized array to 1.
We have to  basically repeat the same 8 times , 0xA8 -> 0xAF and 0xE0 -> 0xE8. When done , we'll supply 0x80 as an input and the "target" function will print the flag for us.
Here's an automated python script which prints the flag :

Binary download : Here

Follow me on twitter : Here
- Souhail

CSAW CTF 2014 - Ish Exploitation 300 Write-up

This time with a quick writeup . Well , I took some time to reverse the binary under IDA and I soon discovered that the vulnerability was a memory leak which leaks 16 bytes from the stack and the vulnerable function was cmd_lotto, here's the full exploit :

I'll publish a writeup for exploitation 400 ( saturn ) as soon as possible.

Download binary : Here
Follow me on Twitter : Here

See you soon :).

- Souhail

Monday, September 15, 2014

NoConName 2014 - inBINcible Reversing 400 Writeup

We (Spiderz) have finished 26th at the NCN CTF this year with 2200 points and we really did enjoy playing. I was able to solve both cannaBINoid (300p) and (inBINcible 400p) .I have actually found 2 solutions for inBINcible that I'll describe separately later in this write-up.

We were given an 32-bit ELF binary compiled from Golang (GO Programming Language).
The main function is "text" (also called main.main) and it is where interesting stuff happens. While digging through this routine the os_args array caught my attention, around address 0x08048DB3 it will access the memory location pointed by os_args+4 and compare its content to 2. This value is nothing but the number of command line arguments given to the executable (argc in C), so the binary is expecting a command line argument which is in fact the key.
By looking at the next lines , I saw an interesting check :
.text:08048EFE                 mov     ebx, dword ptr ds:os_args
.text:08048F04                 add     ebx, 8
.text:08048F07                 mov     esi, [ebx+4]
.text:08048F0A                 mov     ebx, [esp+0C0h+var_54]
.text:08048F0E                 mov     ebp, [ebx+4]
.text:08048F11                 cmp     esi, ebp
.text:08048F13                 jz      loc_8049048
As it's the first time I encounter GOlang I though that it was better to use GDB alongside with IDA so I fired up my linux machine , put a breakpoint on  0x08048F11 , gave the binary a 4 bytes long command line argument (./inbincible abcd) and then I examined both ESI and EBP .
esi = 0x4
ebp = 0x10
You can easily notice tha abcd length is 4 and the right length that should be given is 16 , we can safely assume now that the flag length is 16 bytes.
Note :
The instruction which initializes the flag length to 0x10 is this one (examine instructions before it)
.text:08048D05                 mov     [ebx+4], edi

If the length check is true runtime_makechan function will be called (0x08049073) which creates a channel as its name implies. After that we'll enter immediately a loop that will get to initializing some structure fields then calling  runtime_newproc for each character in the flag (16 in total). One of the variables is initialized with  "main_func_001" and it can be seen as the handler that is called when chanrecv is called.
After breaking out of the loop, ECX is set to 1 and then we'll enter another loop (0x080490DD). This loop calls chanrecv for each character in the input (under certain circumstances). chanrecv is supplied the current index of the input and a pointer to a local variable which I named success_bool. Basically our main routine will supply a pointer to success_bool to the channel which will assign another thread (probably the one created using runtime_newproc) executing the main_func_001 to do some checks then write a TRUE or FALSE value into the variable. After returning from chanrecv the boolean will be checked. If it's TRUE ecx will keep its value ( 1 ) and we'll move to the next character. However if main_func_001 has set the boolean to false ecx will be zeroed and the other characters of the user input won't be checked (fail).
I have actually found 2 methods to approach this challenge (with bruteforcing and without bruteforcing) :

1 - Getting the flag using bruteforce :

This solution consists of automating the debugger (GDB) to supply a 16 bytes length string as an argument , the current character (we basically start with the first character) will be changed during each iteration using a charset and the next character will be left unchanged until we've found the right current character of the key and so on. To find the right character we must break after returning from chanrecv then read the local variable (boolean) value , if it's 1 then we've got the right character and we shall save it then move to the next one, else we'll keep looking until finding the right one.

Here's a python GDB script that explains it better :

flag : G0w1n!C0ngr4t5!!

2 - Getting the key by analyzing main_func_001 :

As main_func_001 is the one responsible for setting the boolean value, analyzing this routine will give us the possibility to get the flag without any bruteforcing. Let's see what it does :
main_func_001 expects the boolean variable pointer and the index of the character to be tested. This index , as mentionned earlier, is the iterator of the loop which has called chanrecv.
For the purpose of checking each character main_func_001 uses 2 arrays , I called the first 5 bytes sized array Values_Array. The second array size is 16 bytes , same length as the password.
So here's how we can get the flag using the 2 arrays :

flag : G0w1n!C0ngr4t5!!

The final key to validate the challenge is NcN_sha1(G0w1n!C0ngr4t5!!)

Binary download : Here

Follow me on Twitter : Here

See you soon :>

Friday, September 5, 2014

Windows Internals - A look into SwapContext routine

Here I am really taking advantage of my summer vacations and back again with a second part of the Windows thread scheduling articles. In the previous blog post I discussed the internals of quantum end context switching (a flowchart). However, the routine responsible for context switching itself wasn't discussed in detail and that's why I'm here today.

Here are some notes that'll help us through this post :
 1 - The routine which contains code that does context switching is SwapContext and it's called internally by KiSwapContext. There are some routines that prefer to call SwapContext directly and do the housekeeping that KiSwapContext does themselves.
 2 - The routines above (KiSwapContext and SwapContext) are implemented in ALL context switches that are performed no matter what is the reason of the context switch (preemption,wait state,termination...).
 3 - SwapContext is originally written in assembly and it doesn't have any prologue or epilogue that are normally seen in ordinary conventions, imagine it like a naked function.
 4 - Neither SwapContext or KiSwapContext is responsible for setting the CurrentThread and NextThread fields of the current KPRCB. It is the responsibility of the caller to store the new thread's KTHREAD pointer into pPrcb->CurrentThread and queue the current thread (we're still running in its context) in the ready queue before calling KiSwapContext or SwapContext which will actually perform the context-switch.
 Usually before calling KiSwapContext, the old irql (before raising it to DISPATCH_LEVEL) is stored in CurrentThread->WaitIrql , but there's an exception discussed later in this article.

So buckle up and let's get started :
Before digging through SwapContext let's first start by examining what its callers supply to it as arguments.
SwapContext expects the following arguments:
- ESI : (PKTHREAD) A pointer to the New Thread's structure.
- EDI : (PKTHREAD) A pointer to the old thread's structure.
- EBX : (PKPCR) A pointer to PCR (Processor control region) structure of the current processor.
- ECX : (KIRQL) The IRQL in which the thread was running before raising it to DISPATCH_LEVEL.
By callers, I mean the KiSwapContext routine and some routines that call SwapContext directly (ex : KiDispatchInterrupt).
Let's start by seeing what's happening inside KiSwapContext :
This routine expects 2 arguments the Current thread and New thread KTHREAD pointers in ECX and EDX respectively (__fastcall).
Before storing both argument in EDI and ESI, It first proceeds to save these and other registers in the current thread's (old thread soon) stack:
EBP : The stack frame base pointer (SwapContext only updates ESP).
EDI : The caller might be using EDI for something else ,save it.
ESI : The caller might be using ESI for something else ,save it too.
EBX : The caller might be using EBX for something else ,save it too.
Note that these registers will be popped from this same thread's stack when the context will be switched from another thread to this thread again at a later time (when it will be rescheduled to run).
After pushing the registers, KiSwapContext stores the self pointer to the PCR in EBX (fs:[1Ch]).Then it stores the CurrentThread->WaitIrql value in ECX, now that everything is set up KiSwapContext is ready to call SwapContext.

Again, before going through SwapContext let me talk about routines that actually call SwapContext directly and exactly the KiDispatchInterrupt routine that was referenced in my previous post.
Why doesn't KiDispatchInterrupt call KiSwapContext ?
Simply because it just needs to push EBP,EDI and ESI onto the current thread's stack as it already uses EBX as a pointer to PCR.

Here, we can see a really great advantage of software context switching where we just save the registers that we really need to save, not all registers.

Now , we can get to SwapContext and explain what it does in detail.
The return type of SwapContext is a boolean value that tells the caller (in the new thread's stack) whether the new thread has any APCs to deliver or not.

Let's see what SwapContext does in these 15 steps:

1 - The first thing that SwapContext does is verify that the new thread isn't actually running , this is only right when dealing with a multiprocessor system where another processor might be actually running the thread.If the new thread is running SwapContext just loops until the thread stops running. The boolean value checked is NewThread->Running and after getting out of the loop, the Running boolean is immediately set to TRUE.

2 - The next thing SwapContext does is pushing the IRQL value supplied in ECX. To spoil a bit of what's coming in the next steps (step 13) SwapContext itself pops ECX later, but after the context switch. As a result we'll be popping the new thread's pushed IRQL value (stack switched).

3 - Interrupts are disabled, and PRCB cycle time fields are updated with the value of the time-stamp counter. After the update, Interrupts are enabled again.

4 - increment the count of context switches in the PCR (Pcr->ContextSwitches++;) , and push Pcr->Used_ExceptionList which is the first element of PCR (fs:[0]). fs:[0] is actually a pointer to the last registered exception handling frame which contains a pointer to the next frame and also a pointer to the handling routine (similar to usermode), a singly linked list simply. Saving the exception list is important as each thread has its own stack and thus its own exception handling list.

5 - OldThread->NpxState is tested, if it's non-NULL, SwapContext proceeds to saving the floating-points registers and FPU related data using fxsave instruction. The location where this data is saved is in the initial stack,and exactly at (Initial stack pointer - 528 bytes) The fxsave output is 512 bytes long , so it's like pushing 512 bytes onto the initial stack , the other 16 bytes are for stack-alignment I suppose.The Initial stack is discussed later during step 8.

6 - Stack Swapping : Save the stack pointer in OldThread->KernelStack and load NewThread->KernelStack into ESP. We're now running in the new thread's stack, from now on every value that we'll pop was previously pushed the last time when the new thread was preparing for a context-switch.

7 - Virtual Address Space Swapping : The old thread process is compared with the new thread's process if they're different CR3 register (Page directory pointer table register) is updated with the value of : NewThread->ApcState.Process->DirectoryTableBase. As a result, the new thread will have access to a valid virtual address space. If the process is the same, CR3 is kept unchanged. The local descriptor table is also changed if the threads' processes are different.

8 -  TSS Esp0 Switching : Even-though I'll dedicate a future post to discuss TSS (task state segment) in detail under Windows , a brief explanation is needed here. Windows only uses one TSS per processor and uses only (another field is also used but it is out of the scope of this article) ESP0 and SS0 fields which stand for the kernel stack pointer and the kernel stack segment respectively. When a usermode to kernelmode transition must be done as a result of an interrupt,exception or system service call... as part of the transition ESP must be changed to point to the kernel stack, this kernel stack pointer is taken from TSS's ESP0 field. Logically speaking, ESP0 field of the TSS must be changed on every context-switch to the kernel stack pointer of the new thread. In order to do so, SwapContext takes the kernel stack pointer at NewThread->InitialStack (InitialStack = StackBase - 0x30) ,it substrats the space that it has used to save the floating-point registers using fxsave instruction and another additional 16 bytes for stack alignment, then it stores the resulted stack pointer in the TSS's Esp0 field : pPcr->TssCopy.Esp0 (TSS can be also accessed using the TR segment register).

9 - We've completed the context-switch now and the old thread can be finally marked as "stopped running" by setting the previously discussed boolean value "Running" to FALSE. OldThread->Running = FALSE.

10 - If fxsave was previously executed by the new thread (the last time its context was switched), the data (floating-point registers...) saved by it is loaded again using xrstor instruction.

11 - Next the TEB (Thread environment block) pointer is updated in the PCR :
pPcr->Used_Self = NewThread->Teb . So the Used_Self field of the PCR points always to the current thread's TEB.

12 - The New thread's context switches count is incremented (NewThread->ContextSwitches++).

13 - It's finally the time to pop the 2 values that SwapContext pushed , the pointer to the exception list and the IRQL from the new thread's stack. the saved IRQL value is restored in ECX and the exception list pointer is popped into its field in the PCR.

14 - A check is done to see if the context-switch was performed from a DPC routine (Entering a wait state for example) which is prohibited. If pPrcb->DpcRoutineActive boolean is TRUE this means that the current processor is currently executing a DPC routine and SwapContext will immediately call KeBugCheck which will show a BSOD : ATTEMPTED_SWITCH_FROM_DPC.

15 - This is the step where the IRQL (NewThread->WaitIrql) value stored in ECX comes to use. As mentionned earlier SwapContext returns a boolean value telling the caller if it has to deliver any pending APCs. During this step SwapContext will check the new thread's ApcState to see if there are any kernel APCs pending. If there are : a second check is performed to see if special kernel APCs are disabled , if they're not disabled ECX is tested to see if it's PASSIVE_LEVEL, if it is above PASSIVE_LEVEL an APC_LEVEL software interrupt is requested and the function returns FALSE. Actually the only case that SwapContext returns TRUE is if ECX is equal to PASSIVE_LEVEL so the caller will proceed to lowering IRQL to APC_LEVEL first to call KiDeliverApc and then lower it to PASSIVE_LEVEL afterwards.

Special Case :
This special case is actually about the IRQL value supplied to SwapContext in ECX. The nature of this value depends on the caller in such way that if the caller will lower the IRQL immediately upon returning from SwapContext or not.
Let's take 2 examples : KiQuantumEnd and KiExitDispatcher routines. (KiQuantumEnd is the special case)

If you disassemble KiExitDispatcher you'll notice that before calling KiSwapContext it stores the OldIrql (before it was raised to DISPATCH_LEVEL) in the WaitIrql of the old thread so when the thread gains execution again at a later time SwapContext will decide whether there any APCs to deliver or not. KiExitDispatcher makes use of the return value of KiSwapContext (KiSwapContext returns the same value returned by SwapContext) to lower the IRQL. (see step 15 last sentence).
However, by disassembling KiQuantumEnd you'll see that it's storing APC_LEVEL at the old thread's WaitIrql without even caring about in which IRQL the thread was running before. If you refer back to my flowchart in the previous article you'll see that KiQuantumEnd always insures that SwapContext returns FALSE , first of all because KiQuantumEnd was called as a result of calling KiDispatchInterrupt which is meant to be called when a DISPATCH_LEVEL software interrupt was requested.Thus, KiDispatchInterrupt was called by HalpDispatchSoftwareInterrupt which is normally called by HalpCheckForSoftwareInterrupt. HalpDispatchSoftwareInterrupt is the function responsible for raising the IRQL to the software interrupt level (APC_LEVEL or DISPATCH_LEVEL) and upon returning from it HalpCheckForSoftwareInterrupt recovers back the IRQL to its original value (OldIrql). So the reason why KiQuantumEnd doesn't care about KiSwapContext return value because it won't proceed to lowering the IRQL (not its responsibility) nor to deliver any APCs that's why it's supplying APC_LEVEL as an old IRQL value to SwapContext so that it will return FALSE. However, a software interrupt might be requested by SwapContext if there are any pending APCs.
KiDispatchInterrupt which calls SwapContext directly uses the same approach as KiQuantumEnd, instead of storing the value at OldThread->WaitIrql it just moves it into ECX.

Post notes :
- Based on Windows 7 32 bit :>
- For any questions or suggestions feel free to leave a comment below or send me an email :

See you again soon :)


Saturday, August 30, 2014

Windows Internals - Quantum end context switching

Lately I decided to start sharing the notes I gather , almost daily , while reverse engineering and studying Windows. As I focused in the last couple of days on studying context switching , I was able to decompile the most involved functions and study them alongside with noting the important stuff. The result of this whole process was a flowchart.

Before getting to the flowchart let's start by putting ourselves in the main plot :
As you might know, each thread runs for a period of time before another thread is scheduled to run, excluding the cases where the thread is preempted ,entering a wait state or terminated. This time period is called a quantum. Everytime a clock interval ends (mostly 15 ms) the system clock issues an interrupt.While dispatching the interrupt, the thread current cycle count is verified against its cycle count target (quantum target) to see if it has reached or exceeded its quantum so the context would be switched the next thread scheduled to run.
Note that a context-switch in Windows doesn't happen only when a thread has exceeded its quantum, it also happens when a thread enters a wait state or when a higher priority thread is ready to run and thus preempts the current thread.

As it will take some time to organize my detailed notes and share them here as an article (maybe for later),consider the previous explanation as a small introduction into the topic. However ,the flowchart goes through the details involved in quantum end context switching.

Please consider downloading the pdf  to be able to zoom as much as you like under your PDF reader because GoogleDocs doesn't provide enough zooming functionality to read the chart.

Preview (unreadable) :

PDF full size Download  : GoogleDocs Link

P.S :
- As always , this article is based is on : Windows 7 32-bit
- Note that details concerning the routine that does the context switching (SwapContext) aren't included in the chart and are left it for a next post.

See you again soon.


Wednesday, July 9, 2014

OkayToCloseProcedure callback kernel hook

Hi ,

During the last few weeks I was busy exploring the internal working of Handles under Windows , by disassembling and decompiling certain kernel (ntoskrnl.exe) functions under my Windows 7 32-bit machine.In the current time I am preparing a paper to describe and explain what I learned about Handles. But today I’m here to discuss an interesting function pointer hook that I found while decompiling and exploring the ObpCloseHandleEntry function. (Source codes below).

A function pointer hook consists of overwriting a callback function pointer so when a kernel routine will call the callback function, the hook function will be called instead . The function pointer that we will be hooking in this article is the OkayToCloseProcedure callback that exists in the _OBJECT_TYPE_INITIALIZER structure which is an element of the OBJECT_TYPE struct.

Every object in Windows has an OBJECT_TYPE structure which specifies the object type name , number of opened handles to this object type ...etc OBJECT_TYPE also stores a type info structure (_OBJECT_TYPE_INITIALIZER) that has a group of callback functions (OpenProcedure ,CloseProcedure…) . All OBJECT_TYPE structures pointers are stored in the unexported ObTypeIndexTable array.

As I said earlier , the OkayToCloseProcedure is called inside ObpCloseHandleEntry function.In general this function (if the supplied handle is not protected from being closed) frees the handle table entry , decrements the object’s handle count and reference count.
Another case when the handle will not be closed is if the OkayToCloseProcedure returned 0 , in this case the ObpCloseHandleTableEntry returns STATUS_HANDLE_NOT_CLOSABLE.
I will discuss handles in more details in my future blog posts.

So how the OkayToCloseProcedure is called ?

ObpCloseHandleTableEntry function actually gets the Object (which the handle is opened to) header (_OBJECT_HEADER). A pointer to the object type structure (_OBJECT_TYPE) is then obtained by accessing the ObTypeIndexTable array using the Object Type Index from the object header (ObTypeIndexTable[ObjectHeader->TypeIndex]).

The function will access the OkayToCloseProcedure field and check if it’s NULL , if that’s true the function will proceed to other checks (check if the handle is protected from being closed). If the OkayToCloseProcedure field isn’t NULL , the function will proceed to call the callback function. If the callback function returns 0 the handle cannot be closed and ObpCloseHandleTableEntry will return STATUS_HANDLE_NOT_CLOSABLE. If it returns a value other than 0 we will proceed to the other checks as it happens when the OkayToCloseProcedure is NULL.

An additional point is that For some reason , the OkayToCloseProcedure must always run within the context of the process that opened the handle in the first place (a call to KeStackAttachProcess). I don’t think that this would be a problem if ObpCloseHandleTableEntry is called as a result of calling ZwClose from usermode because we’ll be running in the context of the process that opened the handle.However, if ZwClose was called from kernel land and was supplied a kernel handle KeStackAttachProcess will attach the thread to the system process.
But if ObpCloseHandleTableEntry was called from another process context and tried to close another process’s handle table entry the OkayToCloseProcedure must run in that process context. That’s why ObpCloseHandleTableEntry takes a pointer to the process object (owner of the handle) as a parameter.

Applying the hook :

Now after we had a quick overview of what’s happening , let’s try and apply the hook on the OBJECT_TYPE_INITIALIZER’s OkayToCloseProcedure field.
I applied the hook on the Process object type , we can obtain a pointer to the process object type by taking advantage of the exported PsProcessType , it’s actually a pointer to a pointer to the process’s object type.

Here’s a list containing the exported object types :
POBJECT_TYPE *ExEventObjectType;
POBJECT_TYPE *ExSemaphoreObjectType;
POBJECT_TYPE *IoFileObjectType;
POBJECT_TYPE *SeTokenObjectType;
POBJECT_TYPE *PsProcessType;
POBJECT_TYPE *TmEnlistmentObjectType;
POBJECT_TYPE *TmResourceManagerObjectType;
POBJECT_TYPE *TmTransactionManagerObjectType;
POBJECT_TYPE *TmTransactionObjectType;

A second way to get an object’s type is by getting an existing object’s pointer and then pass it to the exported kernel function ObGetObjectType which will return a pointer to the object’s type.

A third way is to get a pointer to the ObTypeIndexTable array, it’s unexported by the kernel but there are multiple functions using it including the exported ObGetObjectType function.So the address can be extracted from the function's opcodes , but that will introduce another compatibility problem. After getting the pointer to the ObTypeIndexTable you'll have to walk through the whole table and preform a string comparison to the target's object type name ("Process","Thread" ...etc) against the Name field in each _OBJECT_TYPE structure.

In my case I hooked the Process object type , and I introduced in my code the 1st and the 2nd methods (second one commented).
My hook isn’t executing any malicious code !! it’s just telling us (using DbgPrint) that an attempt to close an open handle to a process was made.
“An attempt” means that we’re not sure "yet" if the handle will be closed or not because other checks are made after a successful call to the callback.And by a successful call , I mean that the callback must return a value different than 0 that’s why the hook function is returning 1. I said earlier that the ObpCloseHandleTableEntry will proceed to check if the handle is protected from being closed  (after returning from the callback) if the OkayToCloseProcedure is null or if it exists and returns 1 , that's why it’s crucial that our hook returns 1.One more thing , I’ve done a small check to see if the object type’s OkayToCloseProcedure is already NULL before hooking it (avoiding issues).

Example :
For example when closing a handle to a process opened by OpenProcess a debug message will display the handle value and the process who opened the handle.
As you can see "TestOpenProcess.exe" just closed a handle "0x1c" to a process that it opened using OpenProcess().

P.S : The hook is version specific.

Source codes :
Decompiled ObpCloseHandleTableEntry :
Driver Source Code

Your comments are welcome.

Souhail Hammou.


Sunday, May 4, 2014

Windows Heap Overflow Exploitation

Hi ,

In this article I will be talking about exploiting a heap overflow in a custom heap. A custom heap is a huge chunk of memory allocated by a usermode application and managed by it.In other words, the application manage 'heap' block allocations and frees (in the allocated chunk) in a custom way while completely ignoring the Windows's heap manager. This method gives the software much more control over its custom heap, but can result in critical security flaws.

The vulnerability that we'll be exploiting together is a heap overflow vulnerability occurring in a custom heap built by the application. The vulnerable software is : ZipItFast 3.0 and our is to gain code execution under Windows 7. ASLR , DEP , SafeSEH aren't enabled by default in the application which makes it even more reliable to us. However, there are still some painful surprises waiting for us ...

Let's start :
The Exploit :
I've actually got the POC from exploit-db , you can check it right here :

Oh  , and there's also a full exploit here :

Unfortunately , you won't learn much from the full exploitation since it will work only on Windows XP SP1. Why ? simply because it's using a technique that consists on overwriting the vectored exception handler node that exists in a static address under windows XP SP1. Briefly , all you have to do is find a pointer to your shellcode (buffer) in the stack. Then take the stack address which points to your pointer and after that substract 0x8 from that address and then perform the overwrite. When an exception is raised , the vectored exception handlers will be dispatched before any handler from the SEH chain, and your shellcode will be called using a CALL DWORD PTR DS: [ESI + 0x8] (ESI = stack pointer to the pointer to your buffer - 0x8). You can google the _VECTORED_EXCEPTION_NODE and check its elements.

And why wouldn't this work under later versions of Windows ? Simply because Microsoft got aware of the use of this technique and now EncodePointer is used to encode the pointer to the handler whenever a new handler is created by the application, and then DecodePointer is called to decode the pointer before the handler is invoked.

Okay, let's start building our exploit now from scratch. The POC creates a ZIP file with the largest possible file name , let's try it :
N.B : If you want to do some tests , execute the software from command line as follows :
    Cmd :> C:\blabla\ZipItFast\ZipItFast.exe C:\blabla\
  Then click on the Test button under the program.

Let's try executing the POC now :
An access violation happens at 0x00401C76 trying to access an invalid pointer (0x41414141) in our case. Let's see the registers :

Basically the FreeList used in this software is a circular doubly linked lists similar to Windows's . The circular doubly linked list head is in the .bss section at address 0x00560478 and its flink and blink pointers are pointing to the head (self pointers) when the custom heap manager is initialized by the software.
I also didn't check the full implementation of the FreeList and the free/allocate operations in this software to see if they're similar to Windows's (bitmap , block coalescing ...etc).

It's crucial also to know that in our case , the block is being unlinked from the FreeList because the manager had a 'request' to allocate a new block , and it was chosen as best block for the allocation.

Let's get back to analysing the crash :
- First I would like to mention that we'll be calling the pointer to the Freelist Entry struct : "entry".

Registers State at 0x00401C76 :
EAX = entry->Flink
EDX = entry->Blink
[EAX] = entry->Flink->Flink
[EAX+4] = entry->Flink->Blink (Next Block's Previous block)
[EDX] = entry->Blink->Flink (Previous Block's Next Block)
[EDX+4] =entry->Blink->Blink

Logically speaking : Next Block's Previous Block and Previous Block's Next Block are nothing but the current block.
So the 2 instructions that do the block unlinking from the FreeList just :
- Set the previous freelist entry's flink to the block entry's flink.
- Set the next freelist entry's blink to the block entry's blink.
By doing so , the block doesn't belong to the freelist anymore and the function simply returns after that. So it'll be easy to guess what's happening here , the software allocates a static 'heap' block to store the name of the file and it would have best to allocate the block based on the filename length from the ZIP header (this could be a fix for the bug , but heap overflows might be found elsewhere , I'll propose a better method to fix ,but not fully, this bug later in this article).

Now , we know that we're writing past our heap block and thus overwriting the custom metadata of the next heap block (flink and blink pointers). So, We'll need to find a reliable way to exploit this bug , as the 2 unlinking instructions are the only available to us and we control both EAX and EDX. (if it's not possible in another case you can see if there are other close instructions that might help), you can think of overwriting the return address or the pointer to the structured exception handler as we have a stack that won't be rebased after reboot.
This might be a working solution in another case where your buffer is stored in a static memory location.
But Under Windows 7 , it's not the case , VirtualAlloc allocates a chunk of memory with a different base in each program run. In addition , even if the address was static , the location of the freed block that we overwrite varies. So in both cases we'll need to find a pointer to our buffer.

The best place to look is the stack , remember that the software is trying to unlink (allocate) the block that follows the block where we've written the name , so likely all near pointers in the stack (current and previous stack frame) are poiting to the newly allocated block (pointer to metadata) . That's what we don't want because flink and blink pointers that we might set might not be valid opcodes and might cause exceptions , so all we need to do is try to find a pointer to the first character of the name and then figure out how to use this pointer to gain code execution , this pointer might be in previous stack frames.

And here is a pointer pointing to the beginning of our buffer : 3 stack frames away

 Remember that 0x01FB2464 will certainly be something else when restarting the program , but the pointer 0x0018F554 is always static , even when restarting the machine.

So when I was at this stage , I started thinking and thinking about a way that will help me redirect execution to my shellcode which is for sure at the address pointed by 0x0018F554 , and by using only what's available to me :
- Controlled registers : EAX and EDX.
- Stack pointer to a dynamic buffer pointer.
- 2 unlinking instructions.
- No stack rebase.

Exploiting the vulnerability and gaining code execution:

And Then I thought , why wouldn't I corrupt the SEH chain and create a Fake frame ? Because when trying to corrupt an SEH chain there are 3 things that you must know :

- SafeSEH and SEHOP are absent.
- Have a pointer to an exisiting SEH frame.
- Have a pointer to a pointer to the shellcode.

The pointer to the shellcode will be treated as the handler,and the value pointed by
((ptr to ptr to shellcode)-0x4) will be treated as the pointer to the next SEH frame.
Let's illustrate the act of corrupting the chain : (with a silly illustration , sorry)

Let me explain :

we need to achieve our goal by using these 2 instructions , right ? :


We'll need 2 pointers and we control 2 registers , but which pointer give to which register ? This must not be a random choice because you might overwrite the pointer to the shellcode if you chose EAX as a pointer to your fake SEH frame.
So we'll need to do the reverse , but with precaution of overwriting anything critical.
In addition we actually don't care about the value of "next SEH frame" of our fake frame.

So our main goal is to overwrite the "next SEH frame" pointer of an exisiting frame , to do so we need to have a pointer to our fake frame in one of the 2 registers. As [EAX+4] will overwrite the pointer to the buffer if used as a pointer to the fake SEH frame , we will use EDX instead. We must not also overwrite the original handler pointer because it will be first executed to try to handle the exception , if it fails , then our fake handler (shellcode) will be invoked then.
So : EDX = &(pointer to shellcode) - 0x4  = Pointer to Fake "Next SEH frame" element.
EDX must reside in the next frame field of the original frame which is : [EAX+4].
And EAX = SEH Frame - 0x4.

Original Frame after overwite :
Pointer to next SEH : Fake Frame
Exception Handler   : Valid Handler

Fake Frame :
Pointer to next SEH : (Original Frame) - 0x4 (we just don't care about this one)
Exception Handler   : Pointer to shellcode

The SEH frame I chose is at : 0x0018F4B4

So : EAX = 0x0018F4B4 - 0x4 = 0x0018F4B0   and EDX =0x0018F554 - 0x4 = 0x0018F550

When the overwrite is done the function will return normally to its caller , and all we have to do now is wait for an exception to occur . An exception will occur after a dozen of instructions as the metadata is badly corrupted. The original handler will be executed but it will fail to handle the access violation and then our fake handler will be called which is the shellcode .

Making the exploit work :

Now all we need to do is calculate the length between the 1st character of the name and the flink and blink pointers , and then insert our pointers in the POC.

Inserting the shellcode :
The space between the starting address of the buffer and the heap overwritten metadata is not so large , so it's best to put an unconditional jump at the start of our buffer to jump past the overwritten flink and blink pointers and then put the shellcode just after the pointers. As we can calculate the length , this won't cause any problem.

Final exploit here :

I chose a bind shellcode , which opens a connection to ( Let's try opening the ZIP file using ZipItFast and then check "netstat -an | find "4444" :

 Bingo !

A Fix for this vulnerability ??

The method I stated before which consists on allocating the block based on the filename length from the ZIP headers can be valid only to fix the vulnerability in this case , but what if the attackers were also able to cause an overflow elsewhere in the software ?
The best way to fix the bug is that : when a block is about to be allocated and it's about to be unlinked from the Freelist the first thing that must be done is checking the validity of the doubly linked list , to do so : safe unlinking must be performed and which was introduced in later versions of Windows.
Safe unlinking is done the following way :

if ( entry->flink->blink != entry->blink->flink || entry->blink->flink != entry){
    //Fail , Freelist corrupted , exit process
else {
    //Unlink then return the block to the caller

Let's see how safe unlinking is implemented under Windows 7 :
The function is that we'll look at is : RtlAllocateHeap exported by ntdll

Even if this method looks secure , there is some research published online that provides weaknesses of this technique and how can it be bypassed. I also made sure to implement this technique in my custom heap manager (Line 86) , link above.

I hope that you've enjoyed reading this paper .

See you again soon ,

Souhail Hammou.

Sunday, April 6, 2014

Retrieving an exported function address within a loaded module

Today I came with a shellcode friendly blogpost , I didn't have time to write the code in ASM but I'll do it as soon as I have some time.
The goal of this small blogpost is retrieving a pointer to an exported function of choice within a loaded module and certainly without the use of any API calls. 
This method can be really effective against ASLR, the only condition that needs to be met is that the function must be exported by the loaded module.
The process of applying this technique consists of retrieving the base address of the loaded module by walking through a doubly linked list. The linked list head pointer can be found using : &(PEB->Ldr.InMemoryOrderModuleList)
To walk the linked list we'll be just interested in the Flink pointer. Besides being a pointer to another list entry it's also a pointer to a _LDR_DATA_TABLE_ENTRY structure which contains ,among other elements, the Module name and the Module Base.
After running the search : if the Base Address was successfully found , we must get the address of the Export directory within the loaded module . In order to do that we perform some basic PE parsing to find the DataDirectory Array.
The first element of this array "DataDirectory[0]" contains the RVA to the export directory within the module , so all we have to do is add the RVA to the base to get the address.
The key elements of the Export directory that we'll need are the following :

AddressOfNames : This is an array of  RVAs to ASCII strings which we'll use to search for the target function's RVA to name.

AddressOfNameOrdinals : This is an array of exported functions ordinals used as indexes to the AddressOfFunctions array.We will use the index found from AddressOfNames to retrieve the ordinal.

AddressOfFunctions : This is an array of function's RVAs , we will access this array using the ordinal that we retrieved previously in order to get the RVA to the exported function.
Finally All we have to do is Add the base address to the RVA and we'll get a valid pointer to the exported function.

Here's an example of getting a pointer to LoadLibraryA :

See you soon ;

Tuesday, March 4, 2014

DEFKTHON CTF 2014 - Reversing 300 WriteUp

Hi Again,
The CTF just ended (less than 15 min) and here's the write-up , actually I prepared it yesterday.
Yesterday was a hard day at university , but I was happy to see that a CTF was taking place that night and there were some reversing tasks.
The CTF was DEFKTHON 2014 CTF , I was able to solve Reverse300 , and the challenge was quite cool.
They offered 'me' a 32-bit executable (300.exe) which will display "Password!!" when opening it
(I figured later that you can provide a command line argument and hope to get the flag if you have the right key : D which is really hard to predict btw).

The first step I took was to figure out where exactly the process will write to the console and I figured that it will create a child process that will do the job. Before that , and while stepping I saw that the parent will create a temporary folder under %temp% in which multiple "dll"s and "pyd" files will be stored (python27.dll for example) that folder name looks like _MEIxxxx where "x"s are random numbers . This gave me a clear idea about what this challenge is for, the executable was actually generated from a Py2Exe software and sooner or later the python source code will be in memory or in disk.

Now that I knew that the child process is the one that will execute the interpreted code, I tried to debug that child process. It is actually possible to do this under WinDbg (if you know a way to do it under Olly/Immunity please leave a comment below).
To switch the debugger to the Child Process whenever it's created all you have to do is type this command " .childdbg 1 " this will enable child process debugging for the current session.

eax=00000000 ebx=00000000 ecx=a1240000 edx=0009e138 esi=fffffffe edi=00000000
eip=776e103b esp=0036f9f8 ebp=0036fa24 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
776e103b cc              int     3
1:001> |
   0    id: 2fa8    create    name: image01340000
   1    id: 1e30    child    name: image01340000
(After Calling CreateProcess by the parent here we are breaking before executing the child proc)

Now all I had to do is determine what API is used to write to the console , so I've put a breakpoint on kernel32!WriteFile and kernel32!WriteConsole (there are high odds that would be one of these two), And It was WriteFile which writes the message to the console.
Now I'm at WriteFile and I need to go to the call from the child process right ? So I have kept returning from a function to its caller until reaching the original call from the ChildProcess.

Stack Trace :
0036d89c 729699ad kernel32!WriteFile
0036f3a8 72969d37 MSVCR90!lseeki64+0x56b
0036f3ec 7292f4c6 MSVCR90!write+0x9f
0036f410 729313ff MSVCR90!flsbuf+0x143
0036f438 729314ac MSVCR90!fwrite_nolock+0x11b
0036f47c 1e0b1ad8 MSVCR90!fwrite+0x61
0036f4a8 1e0abd64 python27!PyString_InsertThousandsGrouping+0x2c8
0036f4c8 1e0abdb6 python27!PyTrash_thread_destroy_chain+0x144
0036f4dc 1e094a55 python27!PyObject_Print+0x16
0036f638 0134352c python27!PyRun_SimpleString+0xc
00000000 00000000 image01340000+0x352c

And here's the call to the function from python27.dll (in the folder under %temp%)

01343525 53              push    ebx
01343526 ff1564c23501    call    dword ptr [image01340000+0x1c264 (0135c264)]

The function is supplied one argument in EBX. And You can see from the way the function is called (the function pointer is accessed from a memory location) that this call could be used to call different functions from python27.dll depending on the task that the file needs to do.
Let's rerun the crackme switch to the child proc and put a breakpoint at (push ebx) , you'll notice that we'll hit the breakpoint many times.
Taking a further look to this function I found that it's always copying each character of the interpreted script filename to the stack, leaving the target file to be executed visible for us. So each time after hitting the breakpoint at <push ebx> , we can display the stack and look for our file which is obviously ( There's a manifest file at the folder in the %temp% giving us a little hint about the filename.
Here's a snippet showing the stack just before starting to execute the interpreted code of :
Now after recognizing the file, we'll need the source to it so maybe it is stored on disk and maybe in memory . I tried to look at the folder stored at %temp% but there was no trace , hopefully the argument passed to the function in EBX is a pointer to a string (in other words pointer to the python source code) so if you dump EBX you'll see the source code right there :)

So the python source is :
from passlib.hash import cisco_pix as pix
import sys,base64
     hashus = pix.encrypt("DEFCON14-CTF-IS", user=x)
     for zz in z:
          xx+= chr(zz+(275*100001-275*1000-27225274))
     hashgen = pix.encrypt("DEFCON14-CTF-IS", user=base64.decodestring(xx))
          print "Oh Man You got it! But :( ===>    " + str(base64.encodestring(base64.decodestring(xx)))
          print "Naaaaaaa !! You are screweD"
     print "Password !!"
As it seems to be "Password!!" is written to the console because no command line arguments were given when executing the file.
I made simple changes to the code to be able to see the flag directly :) as it will be generated only based on the array "z" (so no need for the passlib.hash) :==============================================
import sys,base64
for zz in z:
    xx+= chr(zz+(275*100001-275*1000-27225274))
print "The Flag : " + str((base64.decodestring(xx)))
This will print the flag to validate on the platform.
And there it is :  300 points + fun.
See you soon,
Souhail Hammou.

Sunday, February 23, 2014

CodeGate CTF 2014 - Reverse 250 Clone Technique Write-up

The concept of this Crackme is really simple but reaching the right Track took me hours and hours because I was looking deeper into it and the answer was right in front of me :) ...
The challenge is named "Clone Technique" used in Naruto Manga ,  the process creates many instances of it (using CreateProcessW) with different command line arguments (3 values).
Example :
2nd process : clone_technique.exe 3026539702 3580248161 2
3rd process : clone_technique.exe 466510610 2867152813 3
4th process : clone_technique.exe 2580226910 609694577 4

The wrong way that I took was fortunately of some benefit because what I did is analyze how the values are generated and I also simulated the algorithm in ASM (check links below).
In the first place I didn't know how the flag would be and I couldn't get any hints so I tried to look and look until I found this weird string in the .data section :
I've put a breakpoint on access to this string and ran the program, then the debugger breaks at a function at : 00401070 (you can also check all the references to this string and you'll find the "push 00407030" to our function).
This function takes 3 arguments : - the first value in the cmdline , the second value and a pointer to the string.
The function will use both values in a loop to decrypt each byte alone of the string and store it in a seperated memory location pointed by EDX using the instruction :
004010E8  |. 8802           |MOV BYTE PTR DS:[EDX],AL
The function returns a pointer to the decrypted string in EAX (the decryption result will be non-ASCII until the used values -keys- are right).

If you have a clear idea now , you will notice that each process will be created with its own cmd line arguments , thus different keys to decrypt the string , thus different result. So we'll need to show the final decrypted string for each process until finding a readable ASCII string.
We'll need to patch the executable and create a code cave which will pop up a MessageBox whenever the string is decrypted (we can also display only the ASCII compatible string which will only be the flag) . I was lazy and tired so , I just put a MessageBox call and kept pressing enter until seeing the flag :

Jump to code cave

 We don't have to forget to NOP a "REP STOS" instruction that will cause an access violation after redirecting the execution to our cave code.
Now save the executable , run it and you should see MessageBoxes with total gibberish (invalid key decrypting the string) , keep clicking and clicking and clicking until this will popup :

The Flag.

Script which will generate the same values passed in the command line (complete waste of time :) ) :


Saturday, February 1, 2014

Anti-debugging trick - Checking for the Low Fragmentation Heap

Hi everyone,

I'll introduce you today a Anti-debugging trick which the idea came across my mind while debugging Windows Heap, I don't know if it was used before anywhere but here I am showing it today.

Check the C/C++ source code :

Short introduction to the Windows front end allocator :

Yesterday I promised writing something related to the windows heap manager and an opportunity came now to discuss some of the front end allocator aspects.
First of all let me define what a LFH (low fragmentation heap) is :
The LFH was introduced in Windows XP and Windows Server 2003 but it wasn't used as a default front end allocator until Windows Vista. The default front end allocator were the lookaside lists (LAL) , each of these 2 is a singly linked list with 128 lists.

The LFH as its name describes is implemented to guarantee that heap fragmentation will be reduced and it's strongly recommended to use for application that allocate a big number of small size blocks.
When the LFH is created first, predetermined sizes of memory will be allocated and put into buckets (LFH lists), when the application will call for an allocation the LFH will provide the smallest available block to satisfy the allocation , otherwise the request will be passed into the heap manager then to the Freelists (check explanation later).

When LAL is used as a front end allocator, a block won't reside in a list of its list until it was allocated either from the FreeLists or by committing memory then freed. All that won't apply until a list of the lookaside table can handle the freed block otherwise it will passed to the Heap manager to perform coalescing if two adjacent blocks are free, update the FreeList bitmap values (if necessary), invalidate coalesced blocks entries then insert the coalesced block into its valid list in the FreeLists. If no block coalescing is possible the block is inserted directly in the FreeLists.

Anti-Debugging Trick :

I noticed that when the executable is run under a debugger no Low Fragmentation Heap (LFH) is created for it so the pointer to the LFH equals NULL.
So we'll just have to check if the pointer to LFH is null to detect if the process was created inside a debugger.
I tried after to run the process outside the debugger then attach it and I noticed that a LFH is created for the heap and the pointer to the LFH is valid.
 The pointer to the LFH is located at "heap_handle+0xd4" under Windows 7 for WOW64 executables and at "heap_handle+0x178" for 64-bit executable.

I attached the debugger to the process :
0:001> dt _HEAP 00460000
   +0x000 Entry            : _HEAP_ENTRY
   +0x008 SegmentSignature : 0xffeeffee
   +0x00c SegmentFlags     : 0
   +0x0d0 CommitRoutine    : 0x5b16148e

   +0x0d4 FrontEndHeap     : 0x00468cf0 Void
   +0x0d8 FrontHeapLockCount : 0
   +0x0da FrontEndHeapType : 0x2 <-- Type : LFH
   +0x0dc Counters         : _HEAP_COUNTERS
   +0x130 TuningParameters : _HEAP_TUNING_PARAMETERS

When running the process from the debugger LFH won't be enabled :
0:001> dt _HEAP 00320000
   +0x000 Entry            : _HEAP_ENTRY
   +0x008 SegmentSignature : 0xffeeffee
   +0x00c SegmentFlags     : 0

   +0x0cc LockVariable     : 0x00320138 _HEAP_LOCK
   +0x0d0 CommitRoutine    : 0x6d58ec0e     long  +6d58ec0e
   +0x0d4 FrontEndHeap     : (null)
   +0x0d8 FrontHeapLockCount : 0
   +0x0da FrontEndHeapType : 0
   +0x0dc Counters         : _HEAP_COUNTERS
   +0x130 TuningParameters : _HEAP_TUNING_PARAMETERS

Remember that, The LFH isn't used by default until Windows Vista and posterior versions , so to implement the anti-debugging technique under Windows XP we'll need to enable the LFH as it's not used by default, to do so you'll simply need to call HeapSetInformation with the HEAP_INFORMATION_CLASS set to 0 and with the pointer to the information buffer pointing to "0x2" which will enable the LFH for the heap passed as the first argument.

A simple way to bypass this technique is simply by attaching the debugger to the application instead of running it from a debugger :) .

More details on the LFH : here

Thanks for your time :)
Souhail Hammou. @Dark_Puzzle

Thursday, January 30, 2014

Creating and using your own 'heap' manager

First of all, I didn't want to write a long title so I've put word heap between apostrophes, simply because in this example we're not creating or managing a heap, but a (zeroed) chunk of  memory which we reserve from the virtual address space of our application. (we'll commit from the reserved chunk as needed)

The differences between a heap and our custom heap are huge.
The only thing that you have to keep in mind for now is that a Heap is divided into Segments each segment contains heap blocks within the committed memory range in which the user/application accessible part resides.
Another thing that I want to mention is that the Windows heap manager keeps track of free blocks using the BEA (back-end-allocator or FreeLists) and the LFH/LAL (low fragmentation heap or lookaside lists).
In addition each block in use by the application or that is in the LAL is considered busy (A Flag withing each block describes its status) and each block within a free list is considered Free.
Maybe I'll write a detailed post soon about Windows Heap Management.

Back to our topic : I spent much time in the previous months trying to understand the concepts behind the Windows Heap Manager and also some of its exploitation techniques. Thus, I also added a small security check in the custom manager that does safe unlinking from the FreeList.
Our custom manager will clone only a small part of Windows heap management which is implementing a circular FreeList structure and will also use a singly linked list to keep track of all allocated and free blocks. I programmed it so no new portions will be committed from reserved memory until there are no free blocks to allocate from.
You will be able also to display details about the MAP (singly linked list) and the FreeList used while allocating and freeing to see the changes.

Full commented C++ source here :

Execution example :
I've allocated some blocks then freed some of them and displayed the map :

And after that displayed the FreeList :
P.S : 0x002A4D10 is the FreeList's head.
You can play with allocations and frees as you like then display details :)
See you soon,

Souhail Hammou. @Dark_Puzzle .

Monday, January 27, 2014

HackIM CTF 2014 - Reverse100 write-up

The challenge is a 32-bit executable : when executing it displays the following :
   Flag : )T(+,*'))$&T(Y)*#(+&#+)$%'T+&#(T
I found that  ")T(+,*'))$&T(Y)*#(+&#+)$%'T+&#(T" is passed to 3 functions, only one of them is executed when running the program (the one that will concatenate 'Flag :' and the encrypted flag then pass them to WriteFile which will write the output to the console) .
So there might be a piece of code that may decrypt the encrypted flag somewhere and was keeped there but never refered to anywhere.
When I saw where the other pushes are it turned out to be the decryption stub which is never executed.
Disassembly here :

As you can see it takes every character from  ")T(+,*'))$&T(Y)*#(+&#+)$%'T+&#(T" then adds "0xD" to it , after that it concatenates the result with another string "5F 41 4E 44 5F 4D 4F 4F 4F 4F" which is  "_AND_MOOOO" in ASCII.

Thus the flag would be : 6a589746613a5f670583086124a8305a_AND_MOOOO
Easy right ? :)

Souhail Hammou.

Sunday, January 26, 2014

Introduction To Windows API Hooking.

Today I wrote a piece of code that hooks MessageBoxA API to redirect execution to another function that will call MessageBoxA with different arguments.
This is the traditional way to hook APIs , there are other methods used for instance IAT hooking.

C++ Visual Studio Express 2010 :

The 1st step was to Hotpatch MessageBoxA function. In other words some functions in Windows DLLs are ready to be hooked that's why a "mov edi,edi" (which represent a 2 bytes NOP) which will be patched to a short jump is the first instruction executed in each of these functions.

The question that you may ask is why not simply using 2 NOPs instead of "mov edi,edi" , the answer is related to performance : each NOP takes 1 clock cycle so 2 NOPs will take 2 clock cycles. But "mov edi,edi" instruction will take only 1 clock cycle.

Each "mov edi,edi" is preceded by 5 NOPs which are never being executed. Hotpatching consists of overwriting "mov edi,edi" by a short 5 bytes jump back. And then inserting our hook by overwriting the 5 NOPs with a long jump instruction that will execute our desired function, this requires us to calculate a relative address.

When the hook is no longer needed it is sufficient to restore only "mov edi,edi" instruction as the 5 bytes before it are never executed.

See you soon :)

Souhail Hammou.

Saturday, January 25, 2014

A Quick Dive Into Windows Kernel-Mode Debugging - Protected Processes.

Hey :) ,
This is maybe a first write-up of many which will discuss windows kernel debugging.
I've mentioned in my previous post "protected processes" and I decided to share with you this writeup discussing them and showing some applied kernel debugging.
Protected processes are digitally signed processes that were first introduced in Windows Vista in order to protect Media contents (HD/Blu-ray).
The use of Protected processes wasn't hindered by Media use only . Processes such as System and Werfault.exe (Displays information about crashes and needs access to protected processes in case one of them crashes) are protected. So you can see that protected processes can interact with each other normally. In addition, even a normal process has the ability to create a Protected process BUT it will certainly fail because a special digital signature isn't present in the created process
Another point is that Protected Processes forbid users even with an administrator account to debug them,dump memory,edit memory. Furthermore, access rights to a protected process are limited for instance PROCESS_QUERY_LIMITED_INFORMATION / PROCESS_SUSPEND_RESUME / PROCESS_TERMINATE accesses are granted while PROCESS_ALL_ACCESS is not.

As you may know, each process has an executive process (EPROCESS) structure that resides in Kernel mode and which can be accessed only from it. The EPROCESS structure contains detailed information about the process and points to many other structures like KTHREAD structures list, and it also contains structures like pcb which is a KPROCESS structure and the first element of EPROCESS.

Speaking about Protected Processes , EPROCESS has a special bit that defines the nature of a process (protected or not).
In our example we will try to attach (audiodg.exe) which is a protected process to OllyDbg. The result is the following:
We have 2 choices here , the first is to make Ollydbg a protected process and then Attach it to audiodg.exe or zeroing the bit in audiodg.exe EPROCESS structure to make it unprotected then attach it normally.

We'll do the first one, we can similarly do this with "Task Manager" to create a dump of audiodg.exe or any protected process then examine that dump.
So let's set up the kernel debugger and display the EPROCESS structure for ollydbg.exe :

lkd> dt _EPROCESS 846d8030
   +0x000 Pcb              : _KPROCESS
   +0x098 ProcessLock      : _EX_PUSH_LOCK
   +0x0a0 CreateTime       : _LARGE_INTEGER 0x01cf1a21`239824ec
   +0x0a8 ExitTime         : _LARGE_INTEGER 0x0
   +0x0b0 RundownProtect   : _EX_RUNDOWN_REF
   [...CUT __ CUT...]
   +0x26c Flags2           : 0x2d014
   +0x26c JobNotReallyActive : 0y0
   +0x26c AccountingFolded : 0y0
   +0x26c NewProcessReported : 0y1
   +0x26c ExitProcessReported : 0y0
   +0x26c ReportCommitChanges : 0y1
   +0x26c LastReportMemory : 0y0
   +0x26c ReportPhysicalPageChanges : 0y0
   +0x26c HandleTableRundown : 0y0
   +0x26c NeedsHandleRundown : 0y0
   +0x26c RefTraceEnabled  : 0y0
   +0x26c NumaAware        : 0y0
   +0x26c ProtectedProcess : 0y0        <-- bit in which we are interested.
   +0x26c DefaultPagePriority : 0y101
   +0x26c PrimaryTokenFrozen : 0y1
   +0x26c ProcessVerifierTarget : 0y0
   +0x26c StackRandomizationDisabled : 0y1
   +0x26c AffinityPermanent : 0y0
   +0x26c AffinityUpdateEnable : 0y0
   +0x26c PropagateNode    : 0y0
   +0x26c ExplicitAffinity : 0y0
   [...CUT __ CUT...]
   +0x2b0 RequestedTimerResolution : 0
   +0x2b4 ActiveThreadsHighWatermark : 5
   +0x2b8 SmallestTimerResolution : 0
   +0x2bc TimerResolutionStackRecord : (null)

So all we need to do now is set that bit , in order to so we'll need to edit Flags2.
You can see that the "Protected Process" bit is the 12th bit and we'll need to set it to 1.
0x2d014 -> 101101  0  00000010100
Setting this bit will need us to set Flags2 to 0x2d814 ->101101  1  00000010100 .

lkd> ed 846d8030+0x26c 0x2d814
lkd> dt _EPROCESS 846d8030
   +0x000 Pcb              : _KPROCESS
   +0x098 ProcessLock      : _EX_PUSH_LOCK

   +0x26c Flags2           : 0x2d814
   +0x26c JobNotReallyActive : 0y0
   +0x26c AccountingFolded : 0y0
   +0x26c NewProcessReported : 0y1
   +0x26c ExitProcessReported : 0y0
   +0x26c ReportCommitChanges : 0y1
   +0x26c LastReportMemory : 0y0
   +0x26c ReportPhysicalPageChanges : 0y0
   +0x26c HandleTableRundown : 0y0
   +0x26c NeedsHandleRundown : 0y0
   +0x26c RefTraceEnabled  : 0y0
   +0x26c NumaAware        : 0y0
   +0x26c ProtectedProcess : 0y1
       <-- Ollydbg.exe is a protected process now
   +0x26c DefaultPagePriority : 0y101
   +0x26c PrimaryTokenFrozen : 0y1
   +0x26c ProcessVerifierTarget : 0y0
   +0x26c StackRandomizationDisabled : 0y1
   +0x26c AffinityPermanent : 0y0
   +0x26c AffinityUpdateEnable : 0y0
   +0x26c PropagateNode    : 0y0
   +0x26c ExplicitAffinity : 0y0

Now we'll simply try to attach audiodg.exe and it will work as expected :

I hope you enjoyed reading this short writeup.
See you later :).


Souhail Hammou . @Dark_Puzzle