Saturday, May 30, 2009

SWF Exploit Analysis - Part 2


After we obtain the shellcode from the exploited SWF in the previous post, we can understand what is its payload (action of the malware). But before that we should deobfuscate the shellcode first if not it is impossible for us to continue the malware's behavior analysis.

Analysis of shellcode

Again this is the extracted file from the exploit:

You can load the file with IDA Pro and remember to use 32-bit disassembler mode. Do a full code analysis (Highlight all the code > Press C > Select Force analysis) and here is the result and its decryption routine:

The reason why I think this is the decryption routine is very simple. Because the code make sense to me ;)

The encrypted code will call the decryption routine before executing the payload. By looking at the CALL instructions, I found this portion of code make sense to me:

000000EB call loc_F0 ; Call the decryptor code
000000F0 loc_F0: ; DATA XREF: 000000EB
000000F0 pop ebp ; Save the original values of stack in EBP
000000F1 add ebp, 14h ; Increment the frame pointer by 14h
000000F4 mov ecx, 18Bh ; Set ECX to 18Bh as a counter
000000F9 mov al, 3Dh ; '=' ; Save the XOR key "3D" to AL
000000FB loc_FB: ; CODE XREF: 00000100
000000FB xor [ebp+0], al ; XOR the current frame pointer value+0 against AL with "3D" as the key
000000FE inc ebp ; Increment the current frame pointer by 1
000000FF dec ecx ; Decrement the counter ECX by 1
00000100 jnz short loc_FB ; Go back to XOR instruction (offset FB) if counter is not zero
00000104 ; ---------------------------------------------------------------------------
00000104 loc_104: ; DATA XREF: 00000102
00000104 lodsd
00000105 lodsd
00000106 lodsd
00000107 lodsd
00000108 lodsd
00000109 lodsd
0000010A lodsd
0000010B lodsd

I will be more than happy if anyone can tell me other ways to explain why this is the decryptor code if my own assumption is wrong :)

So I will use Hiew again to decrypt the code. Take note the offset 102 where the instruction jump to offset 104 which is the offset where the decryption should start from and:

Press F3 > F8 and set the XOR key as "3D":

Based on the ECX counter, we know the size of the code that needs to be decrypted is 18B that is until offset 28F:

Save the file F9 and you can see the URL that the malware trying to connect and download additional malicious file:

Shellcode Static Analysis

The shellcode is designed to be as small as possible. So it normally contains the actual malicious payloads without importing any API functions to the code itself. So how does the shellcode operate without the necessary APIs.

As we are performing a static analysis, the following assumptions have been made:

  • It will first dyamically retrieve the RVA of kernel32.dll
  • Find kernel32.dll->LoadLibrary() to load other neccessary APIs
  • Load urlmon.dll->URLDownloadToFile() to download additional malicious files
  • Load kernel32.dll->WinExec() to execute the downloaded malicious files

First of all, you might need a copy of Windows Memory Layout, User-Kernel Address Spaces and PE Format Diagram as a reference for PEB structure and PE format

seg000:0000000D pop edi
seg000:0000000E mov eax, [fs:30] ; Save PEB struc to EAX
seg000:00000014 js short loc_22
seg000:00000016 mov eax, [eax+0Ch] ; Get PEB_LDR_DATA struct and save to EAX
seg000:00000019 mov esi, [eax+1Ch] ; PEB_LDR_DATA contains 7 elements and the last elements is InInitializationOrderModuleList (@ offset 1Ch) contains the loaded modules which are linked together
seg000:0000001C lodsd
seg000:0000001D mov ebp, [eax+8] ; ; Each module is represented by LIST_ENTRY which is 4bytes long. This is the image base of kernel32.dll which is stored to ebp
seg000:00000020 jmp short loc_2B

Open Windows Memory Layout, User-Kernel Address Spaces and refer to struct_TEB. Locate offset 0x030 which points to _PEB structure.

From struct_PEB, move to offset 0x0c which is _PEB_LDR_DATA structure that contains 7 elements:

typedef struct _PEB_LDR_DATA {
ULONG Length; //0x000
BOOLEAN Initialized; //0x004
PVOID SsHandle; //0x008
LIST_ENTRY InLoadOrderModuleList; //0x00c
LIST_ENTRY InMemoryOrderModuleList; //0x014
LIST_ENTRY InInitializationOrderModuleList; //0x01c
EntryInProgress / /0x024

is a double linked list containing pointers to LDR_MODULE structure:

typedef struct _LIST_ENTRY {
Flink; //0x000
Blink; // 0x004

Using this chain, we can browse every DLL modules that are loaded by the processes and therefore we can find kernel32.dll. This is where the shellcode can find kernel32.dll as it is always located as the first item on the InInitializationOrderModuleList.

After the necessary DLLs are found, it will parse the DLL PE header to find its export table and locate the RVAs of the export function that matches the hardcoded hash calculated at 0x000000DC.

[Note: In order to understand how the shellcode parse the PE header, you need to refer to PE Format Diagram]


I hope to show more detailed analysis like how the hardcoded addresses are generated and show you how the export function is actually loaded and called by the shellcode. This requires us to do dynamic analysis using OllyDBG with a "bait" file. I hope there is guideline on the Internet as I am lazy to find it now ;)

I really hope to share how it can be done. Hopefully I can motivate myself and share the knowledge here asap.


Signing off

Sunday, May 24, 2009

SWF Exploit Analysis - Part 1

It has been a while since my last post dated on 4 April 2009. Today I'm gonna to tell about the technical analysis on SWF (Flash Video) exploitation file.

Tools Needed

The analysis is divided into 2 parts. The first part will explain the decompress the SWF file and extract embedded exploited SWF file. After that it will be followed by how to locate and extract the obfuscated shellcode.

The second part will explain how to deofuscate the shellcode and its payload.

Analysis of SWF

This is the screenshot of the original exploit SWF file to give you an image on how does the file looks like:

It is totally unreadable huh! ;) That is because the SWF file was compressed by looking at the first 3 bytes CWS. We can dump tag by using swfdump.exe from SWFTools:

C:\bin\swftools\swfdump.exe -atpdu flash.$wf > flash.swf.swfdump



Notice that there are a number of pushstring commands, which are the hex code of the exploited SWF files. They will generate the same exploited SWF file. Extract one of the hex string from the pushstring commands into UltraEdit, like this:

Copy and paste the hex string and press a SAPCEBAR (yes, a SPACEBAR!!!) to create a hex code 20. After that press Ctrl + H to switch to hex mode:

Double click 20 and press Ctrl + R to replace the space with hex string:

Click Replace All button and the result would be:

Under Hiew:

It is now more readable right ;)

Now we can use swfdump.exe again to see the tag and we need to find DEFINEBITS section where the shellcode is located:

C:\bin\swftools\swfdump.exe -atpdu exploit_swf > exploit_swf.swfdump


From the DEFINEBITS section, we can know the starting offset of the shellcode as well as its end offset:

[image continuation]

So we should find offset from aa 02 34 d1 to 11 67 8a 37 using your any hex editors as you like:


And the obfuscated shellcode looks like this:


We got the shellcode from the exploited SWF but we still do not know what its payloads. In the next section, I will explain how to deofuscate the shellcode by looking for the "key" in order to deobfuscate it and some common techniques used in shellcode like using PEB to find the kernel32.dll and then looking for the address of LoadLibrary to load the necessary APIs in order to execute its payload.

To be continued...