OverviewAfter we obtain the shellcode from the exploited SWF in the previous
post, we can understand what is its payload (action of the malware). But before that we should deobfuscate the shellcode first if not it is impossible for us to continue the malware's behavior analysis.
Analysis of shellcodeAgain this is the extracted file from the exploit:
You can load the file with IDA Pro and remember to use 32-bit disassembler mode. Do a full code analysis (
Highlight all the code > Press
C > Select
Force analysis) and here is the result and its decryption routine:
The reason why I think this is the decryption routine is very simple. Because the code make sense to me ;)
The encrypted code will call the decryption routine before executing the payload. By looking at the CALL instructions, I found this portion of code make sense to me:
000000EB call loc_F0
; Call the decryptor code000000F0
000000F0 loc_F0:
; DATA XREF: 000000EB000000F0 pop ebp
; Save the original values of stack in EBP000000F1 add ebp, 14h
; Increment the frame pointer by 14h000000F4 mov ecx, 18Bh
; Set ECX to 18Bh as a counter000000F9 mov al, 3Dh ; '='
; Save the XOR key "3D" to AL000000FB
000000FB loc_FB:
; CODE XREF: 00000100000000FB xor [ebp+0], al
; XOR the current frame pointer value+0 against AL with "3D" as the key000000FE inc ebp
; Increment the current frame pointer by 1000000FF dec ecx
; Decrement the counter ECX by 100000100 jnz short loc_FB
; Go back to XOR instruction (offset FB) if counter is not zero00000102 jmp short loc_104
; THIS IS THE POINT WHERE AFTER THE DECRYPTION ROUTINE, IT WILL JUMP TO THE DECRYPTED CODE00000104 ; ---------------------------------------------------------------------------
00000104
00000104 loc_104: ; DATA XREF: 00000102
00000104 lodsd
00000105 lodsd
00000106 lodsd
00000107 lodsd
00000108 lodsd
00000109 lodsd
0000010A lodsd
0000010B lodsd
I will be more than happy if anyone can tell me other ways to explain why this is the decryptor code if my own assumption is wrong :)
So I will use Hiew again to decrypt the code. Take note the offset
102 where the instruction jump to offset
104 which is the offset where the decryption should start from and:
Press
F3 >
F8 and set the XOR key as "3D":
Based on the ECX counter, we know the size of the code that needs to be decrypted is
18B that is until offset
28F:
Save the file
F9 and you can see the URL that the malware trying to connect and download additional malicious file:
Shellcode Static AnalysisThe shellcode is designed to be as small as possible. So it normally contains the actual malicious payloads without importing any API functions to the code itself. So how does the shellcode operate without the necessary APIs.
As we are performing a static analysis, the following assumptions have been made:
- It will first dyamically retrieve the RVA of kernel32.dll
- Find kernel32.dll->LoadLibrary() to load other neccessary APIs
- Load urlmon.dll->URLDownloadToFile() to download additional malicious files
- Load kernel32.dll->WinExec() to execute the downloaded malicious files
First of all, you might need a copy of
Windows Memory Layout, User-Kernel Address Spaces and
PE Format Diagram as a reference for PEB structure and PE format
seg000:0000000D pop edi
seg000:0000000E mov eax, [fs:30] ; Save PEB struc to EAX
seg000:00000014 js short loc_22
seg000:00000016 mov eax, [eax+0Ch] ; Get PEB_LDR_DATA struct and save to EAX
seg000:00000019 mov esi, [eax+1Ch] ; PEB_LDR_DATA contains 7 elements and the last elements is InInitializationOrderModuleList (@ offset 1Ch) contains the loaded modules which are linked together
seg000:0000001C lodsd
seg000:0000001D mov ebp, [eax+8] ; ; Each module is represented by LIST_ENTRY which is 4bytes long. This is the image base of kernel32.dll which is stored to ebp
seg000:00000020 jmp short loc_2B
Open
Windows Memory Layout, User-Kernel Address Spaces and refer to
struct_TEB. Locate offset
0x030 which points to
_PEB structure.
From
struct_PEB, move to offset
0x0c which is
_PEB_LDR_DATA structure that contains 7 elements:
typedef struct _PEB_LDR_DATA {
ULONG Length; //0x000
BOOLEAN Initialized; //0x004
PVOID SsHandle; //0x008
LIST_ENTRY InLoadOrderModuleList; //0x00c
LIST_ENTRY InMemoryOrderModuleList; //0x014
LIST_ENTRY InInitializationOrderModuleList; //0x01c
EntryInProgress / /0x024
} PEB_LDR_DATA, *PPEB_LDR_DATA;
InInitializationOrderModuleList is a double linked list containing pointers to
LDR_MODULE structure:
typedef struct _LIST_ENTRY {
Flink; //0x000
Blink; // 0x004
} LIST_ENTRY, *PLIST_ENTRY
Using this chain, we can browse every DLL modules that are loaded by the processes and therefore we can find kernel32.dll. This is where the shellcode can find kernel32.dll as it is always located as the first item on the
InInitializationOrderModuleList.
After the necessary DLLs are found, it will parse the DLL PE header to find its export table and locate the RVAs of the export function that matches the hardcoded hash calculated at 0x000000DC.
[Note: In order to understand how the shellcode parse the PE header, you need to refer to
PE Format Diagram]
Final
I hope to show more detailed analysis like how the hardcoded addresses are generated and show you how the export function is actually loaded and called by the shellcode. This requires us to do dynamic analysis using OllyDBG with a "bait" file. I hope there is guideline on the Internet as I am lazy to find it now ;)
I really hope to share how it can be done. Hopefully I can motivate myself and share the knowledge here asap.
References
http://zarestel.blogspot.com/2008/06/swf-exploit-cve-2007-0071.html
http://zarestel.blogspot.com/2008/06/swf-exploit-cve-2007-0071-part-2-how-to.html
http://blog.threatexpert.com/2008/05/flash-exploit-goes-wild.html
Signing off
~x9090