Saturday, May 30, 2009

SWF Exploit Analysis - Part 2

Overview

After we obtain the shellcode from the exploited SWF in the previous post, we can understand what is its payload (action of the malware). But before that we should deobfuscate the shellcode first if not it is impossible for us to continue the malware's behavior analysis.


Analysis of shellcode

Again this is the extracted file from the exploit:


You can load the file with IDA Pro and remember to use 32-bit disassembler mode. Do a full code analysis (Highlight all the code > Press C > Select Force analysis) and here is the result and its decryption routine:



The reason why I think this is the decryption routine is very simple. Because the code make sense to me ;)

The encrypted code will call the decryption routine before executing the payload. By looking at the CALL instructions, I found this portion of code make sense to me:

000000EB call loc_F0 ; Call the decryptor code
000000F0
000000F0 loc_F0: ; DATA XREF: 000000EB
000000F0 pop ebp ; Save the original values of stack in EBP
000000F1 add ebp, 14h ; Increment the frame pointer by 14h
000000F4 mov ecx, 18Bh ; Set ECX to 18Bh as a counter
000000F9 mov al, 3Dh ; '=' ; Save the XOR key "3D" to AL
000000FB
000000FB loc_FB: ; CODE XREF: 00000100
000000FB xor [ebp+0], al ; XOR the current frame pointer value+0 against AL with "3D" as the key
000000FE inc ebp ; Increment the current frame pointer by 1
000000FF dec ecx ; Decrement the counter ECX by 1
00000100 jnz short loc_FB ; Go back to XOR instruction (offset FB) if counter is not zero
00000102 jmp short loc_104 ; THIS IS THE POINT WHERE AFTER THE DECRYPTION ROUTINE, IT WILL JUMP TO THE DECRYPTED CODE
00000104 ; ---------------------------------------------------------------------------
00000104
00000104 loc_104: ; DATA XREF: 00000102
00000104 lodsd
00000105 lodsd
00000106 lodsd
00000107 lodsd
00000108 lodsd
00000109 lodsd
0000010A lodsd
0000010B lodsd

I will be more than happy if anyone can tell me other ways to explain why this is the decryptor code if my own assumption is wrong :)

So I will use Hiew again to decrypt the code. Take note the offset 102 where the instruction jump to offset 104 which is the offset where the decryption should start from and:

Press F3 > F8 and set the XOR key as "3D":



Based on the ECX counter, we know the size of the code that needs to be decrypted is 18B that is until offset 28F:




Save the file F9 and you can see the URL that the malware trying to connect and download additional malicious file:


Shellcode Static Analysis

The shellcode is designed to be as small as possible. So it normally contains the actual malicious payloads without importing any API functions to the code itself. So how does the shellcode operate without the necessary APIs.

As we are performing a static analysis, the following assumptions have been made:

  • It will first dyamically retrieve the RVA of kernel32.dll
  • Find kernel32.dll->LoadLibrary() to load other neccessary APIs
  • Load urlmon.dll->URLDownloadToFile() to download additional malicious files
  • Load kernel32.dll->WinExec() to execute the downloaded malicious files


First of all, you might need a copy of Windows Memory Layout, User-Kernel Address Spaces and PE Format Diagram as a reference for PEB structure and PE format


seg000:0000000D pop edi
seg000:0000000E mov eax, [fs:30] ; Save PEB struc to EAX
seg000:00000014 js short loc_22
seg000:00000016 mov eax, [eax+0Ch] ; Get PEB_LDR_DATA struct and save to EAX
seg000:00000019 mov esi, [eax+1Ch] ; PEB_LDR_DATA contains 7 elements and the last elements is InInitializationOrderModuleList (@ offset 1Ch) contains the loaded modules which are linked together
seg000:0000001C lodsd
seg000:0000001D mov ebp, [eax+8] ; ; Each module is represented by LIST_ENTRY which is 4bytes long. This is the image base of kernel32.dll which is stored to ebp
seg000:00000020 jmp short loc_2B



Open Windows Memory Layout, User-Kernel Address Spaces and refer to struct_TEB. Locate offset 0x030 which points to _PEB structure.


From struct_PEB, move to offset 0x0c which is _PEB_LDR_DATA structure that contains 7 elements:


typedef struct _PEB_LDR_DATA {
ULONG Length; //0x000
BOOLEAN Initialized; //0x004
PVOID SsHandle; //0x008
LIST_ENTRY InLoadOrderModuleList; //0x00c
LIST_ENTRY InMemoryOrderModuleList; //0x014
LIST_ENTRY InInitializationOrderModuleList; //0x01c
EntryInProgress / /0x024
} PEB_LDR_DATA, *PPEB_LDR_DATA;


InInitializationOrderModuleList
is a double linked list containing pointers to LDR_MODULE structure:

typedef struct _LIST_ENTRY {
Flink; //0x000
Blink; // 0x004
} LIST_ENTRY, *PLIST_ENTRY

Using this chain, we can browse every DLL modules that are loaded by the processes and therefore we can find kernel32.dll. This is where the shellcode can find kernel32.dll as it is always located as the first item on the InInitializationOrderModuleList.


After the necessary DLLs are found, it will parse the DLL PE header to find its export table and locate the RVAs of the export function that matches the hardcoded hash calculated at 0x000000DC.

[Note: In order to understand how the shellcode parse the PE header, you need to refer to PE Format Diagram]



Final

I hope to show more detailed analysis like how the hardcoded addresses are generated and show you how the export function is actually loaded and called by the shellcode. This requires us to do dynamic analysis using OllyDBG with a "bait" file. I hope there is guideline on the Internet as I am lazy to find it now ;)

I really hope to share how it can be done. Hopefully I can motivate myself and share the knowledge here asap.


References

http://zarestel.blogspot.com/2008/06/swf-exploit-cve-2007-0071.html
http://zarestel.blogspot.com/2008/06/swf-exploit-cve-2007-0071-part-2-how-to.html
http://blog.threatexpert.com/2008/05/flash-exploit-goes-wild.html


Signing off
~x9090



No comments: