PMA - Lab 07-03
Questions
Background from the book:
For this lab, we obtained the malicious executable, Lab07-03.exe, and DLL, Lab07-03.dll, prior to executing. This is important to note because the malware might change once it runs. Both files were found in the same directory on the victim machine. If you run the program, you should ensure that both files are in the same directory on the analysis machine. A visible IP string beginning with 127 (a loopback address) connects to the local machine. (In the real version of this malware, this address connects to a remote machine, but we’ve set it to connect to localhost to protect you.)
WARNING: This lab may cause considerable damage to your computer and may be difficult to remove once installed. Do not run this file without a virtual machine with a snapshot taken prior to execution.
1
2
3
4
5
Questions:
1. How does this program achieve persistence to ensure that it continues running when the computer is restarted?
2. What are two good host-based signatures for this malware?
3. What is the purpose of this program?
4. How could you remove this malware once it is installed?
Answers
Analyzing Lab07-03.exe
Let’s start by analyzing the exe file.
We can see at the start that this file expects 1 parameter in order to run:
And we can see right after that it preforms a comparison between the second argv (our parameter) and a hard-coded string: "WARNING_THIS_WILL_DESTROY_YOUR_MACHINE"
- Note this comparison routine!
So to run this exe, we will need to provide this string as an argument.
After that we have this interesting section of code:
We can see the following calls:
- CreateFileA
- CreateFileMappingA
- MapViewOfFile
Important parameters for those 3 calls:
- (CreateFileA) Open the file: “C:\Windows\System32\Kernel32.dll”
- (CreateFileA) dwDesiredAccess = 0x80000000 => GENERIC_READ - So it can only read from the file.
- (CreateFileMappingA) flProtect =-2 => PAGE_READONLY - The memory mapping can only be read from.
- (MapViewOfFile) dwDesiredAccess = 4 => FILE_MAP_READ = SECTION_MAP_READ - So it maps the view of the file mapping into the address space of the calling process as read only.
And the same calls for “Lab07-03.dll”:
We can see the following calls:
- CreateFileA
- CreateFileMappingA
- MapViewOfFile
Important parameters for those 3 calls:
- (CreateFileA) Open the file: “Lab07-03.dll”
- (CreateFileA) dwDesiredAccess = 0x10000000 => GENERIC_ALL - RWE permissions.
- (CreateFileMappingA) flProtect = 4 => PAGE_READWRITE - The memory mapping can only be read from and written to.
- (MapViewOfFile) dwDesiredAccess = 0xF001F => FILE_MAP_ALL_ACCESS = SECTION_MAP_READ - So it maps the view of the file mapping into the address space of the calling process as all access.
Based on those API calls, we can infer that the files are mapped into memory and then they will be modified by this program.
Right after that, we enter a section with a bunch of calls for “sub_401040” and “sub_401070” and a bunch of memory-operation and arithmetic calculations in it, without any API calls.
Those 2 subroutines are almost the same, and both call “sub_401000”, and because it calls them after it opens both DLLs, they are probably related to calculating something or modifying the files in some way.
For now we can skip this section because of how long it is, and check it dynamicaly later, we can always come back to it if we need to dig deeper and know exactly what it does here.
Right at the end, after a lot of the sub calls, we find another section with windows API that looks interesting:
First, it calls close handle for both files, so we know the malware finished modifying those files.
We can then see a call to CopyFileA with the following parameters:
- lpExistingFileName: “Lab07-03.dll” - The mallicous DLL file.
- lpNewFileName: “C:\windows\system32\kerne132.dll” - Where “kerne132.dll” is with a “1” and not an “l”! so it copies the mallicous DLL to this path.
Lets check it at the end, when we run the malware and see what was changed in the DLL before and after, in the section that we skipped.
Right after that, we can see it passes this string “C:\\*” as an argument, as well as 0 to “sub_4011E0”, let’s understand what this sub does:
First it uses FindFirstFileA - Searches a directory for a file or subdirectory with a name that matches a specific name (or partial name if wildcards are used) so this will start from the “C:\”.
After that it checks if the file found is a directory, based on “dwFileAttributes” returned from the call:
And as we can see 0x10 stands for (From: File Attribute Constants):
Now we already recognize this chunk of code from the argument comparison (where we checked our argument and "WARNING_THIS_WILL_DESTROY_YOUR_MACHINE"
), this is strcmp:
And we compare here the file name to “.”, and after that we compare it to “..”:
Based on those 3 checks, if the file is not a directory, the “.” or the “..” - which are used to reference the current directory / parent directory, it goes to to one path, and if not, it goes to the other. That means we go one path for files, and another for directories, let’s see what happens on each path.
Before we start to check each path, we can see that both paths lead to lead to this section:
Which calls FindNextFileA to continues a file search from a previous call to the FindFirstFile, and after that we see a jump back to the start, so this function happens in a while loop until we go over all files and directories in the system.
If its a file
It checks for an .exe file extension.
So right after we check if it is an exe file, we can see there is a call to “sub_4010A0”:
Let’s check what is does. We can see it starts by calling the following APIs:
- CreateFileA
- CreateFileMappingA
- MapViewOfFile
Important parameters for those 3 calls:
- (CreateFileA) Open the file currently found.
- (CreateFileA) dwDesiredAccess = 0x10000000 => GENERIC_ALL - RWE permissions.
- (CreateFileMappingA) flProtect = 4 => PAGE_READWRITE - The memory mapping can only be read from and written to.
- (MapViewOfFile) dwDesiredAccess = 0xF001F => FILE_MAP_ALL_ACCESS = SECTION_MAP_READ - So it maps the view of the file mapping into the address space of the calling process as all access.
Based on that it is probably modifying the files, let’s continue and see what is being modified.
PE Modification
We can see that the next section of code accesses the addresses by offsets without really calling any winapi functions, so let’s go from the start and understand what is happening here.
We can see it starts by retriving the starting address of the PE mapped into memory, moving it to the esi register and then going to offset 0x3C:
To understand what is at offset 0x3C let’s take a look at the PE Format and then understand what does it check.
So we can see it is accessing 0x3C, in the format it says that at that offset is “e_lfanew” - which is the starting address of the NT Headers.
After that we can see that it compares the value at the pointer location (that we moved to ebp) to 0x4550=”EP” - which stands for the signature “PE” at the start of the header.
Note: PE are stored in little-endian order, that why when it is reads in memory to compare, it seems reversed 0x45=”E” 0x50=”P”.
Next, it adds another 0x80 to the offset:
So we can calculates that it gets to: 0x18 + 0x60 + 0x08 = 0x80
.
So as we can see we get to IMAGE_DIRECTORY_ENTRY_IMPORT.
If we want to avoid those calculations, we can use a program like “CFF Explorer” that can parse PE headers and we can easly calculate offsets there like that:
Example File
We can see that “e_lfanew” points to 0x110
And as we can see if we move to the NT Headers, the PE signature is at offset 0x110 where “e_lfanew” pointed to.
So now we can move quickly and search for offset 0x110 + 0x80 = 0x190
in the next sections and we can see we get to here:
So our calculation was right, we get to IMAGE_DIRECTORY_ENTRY_IMPORT, let’s see what it does next.
Now we can see that we are passing the IMAGE_DIRECTORY_ENTRY_IMPORT to “sub_401040”.
Note: Because where it is now points to an RVA, as a guess we can say that this subroutine probably translates the RVA to the file offset so it can access the structure pointed by it, that could make the analysis faster, but because we want to learn, let’s continue to reverse that subroutine.
RVA Translation
Now before we continue our analysis, it is important to understand what is an RVA and how do we convert it to an offset into the file.
An RVA (“Relative Virtual Address”), is used to describe a memory address if the base address is unknown.
In order to translate the RVA into a linear address in the file, we need to do the following:
First we need to find the section table, to do that, we need to find the first section which is located right after the optional header.
To calculate that address, we can start by finding the size of the optional header (found in the _IMAGE_FILE_HEADER) and add it to the offset to the start of the optional header (start of _IMAGE_OPTIONAL_HEADER).
Example calculation:
So based on those, we add 0x128 + 0xE0 = 0x208
And as we can see this is indeed the right address:
Now, we take the RVA (For example the RVA for our Import Directory) and we need to find in which section the RVA is at:
And the sections:
To do that calculation, we need to go over each section, and check if the RVA is bigger than the “Virtual Address” and smaller than the “Virtual Address” + “Virtual Size”.
To know the number of sections, we can use the NumberOfSections found in the PE Header, and each time add 0x28 (The size of a section in the table) utill we have gone over all sections.
\
When we find the section that contains the RVA, we need to do the following calculation to get the linear offset in the file:
Offset In File = RVA - “Virtual Address” + “Raw Address”
Now that we understand how to calculate an offset in the file using the RVA, we can continue to disassemble understand that subroutine.
Understanding the Subroutine
Now let’s continue to understand what sub_401040 does, and check whether our guess that it translates RVAs was right.
Let’s start by checking what parameters it passes to this subtourine:
We can see that we are passing 3 paramters: “Import Directory RVA”, “PE Header Address”, “EXE Base Address”
Now we can take a look at that subroutine, I added some comments and renamed the arguments (based on what we pass) to make things more clear:
We can see a call to sub_401000 at the start where we pass: “Import Directory RVA”, “PE Header Address”.
After that call we get some offset from this, and we then do the following calculation:
Return Value = RVA - (sub_401000 return value +0x0C)??? + (sub_401000 return value +0x14)??? + “EXE Base Address”
We know that we are missing the “Virtual Address” and the “Raw Address” based on the formula we saw before. (the “EXE Base Address” is the base address of the mapped exe, so it can be added to the formula above)
So we can guess that sub_401000 calculates and finds the “Virtual Address” and the “Raw Address”, but let’s get inside in and check if we are right.
Let’s take a look at the start of the subroutine, I added some comments for the important parts.
We can see that it accesses the fields at offsets 0x14, 0x6, those two stands for the “Size Of Optional Header” and “Number Of Sections”.
Note: PE header start offset is 0x110
Next we can see that it is xoring esi with itself which will be used as a counter in this subroutine, and we can see a calculation at the end:
eax(Size Of Optional Header) + edx(Start Of PE Header) + 0x18(PE Header Size) = Start of the Section Table.
So this calculates the “Start of the Section Table”, presumably used to check in which section the RVA is in.
After that we can see the following loop, I’ve put some comments to explain what happens:
We can see it’s accesses 2 offsets, 0x8 and 0xC which are the “Virutal Size” and “Virtual Address” of each section respectivly, as we can see here: (0x208 + 0x8 = 0x210 | 0x208 + 0xC = 0x214
)
Then, it checks if it went over every section, and if not it adds 0x28 (the size of a section in the table) and checks again in that next section.
Example decompilation of the subroutine (to make it easier to understand):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
int sub_401000(int IT_RVA, int PE_header_address)
{
int number_of_sections = 0;
int result = 0;
int counter = 0;
int virtual_address;
//Offset 0x6 into the PE Header to get the Number of Sections.
number_of_sections = *(PE_header_address + 0x6);
//First entry in Section Table = Size of Optional Header + PE Header Address + PE Header Size.
result = *(PE_header_address + 0x14) + PE_header_address + 0x18;
//Will return 0 if there are no sections.
if ( !number_of_sections )
{
return 0;
}
while ( true )
{
//Offset 0x6 into the cuurent Section Table Entry to get the Virtual Address of that current section.
virtual_address = *(result + 0xC);
//If the IT_RVA is found in the current section Virtual Address.
if (IT_RVA >= virtual_address && IT_RVA < virtual_address + *(result + 0x8)) //result + 0x8 = Virtual Size
{
//Break to return the current section.
break;
}
//Adding 1 to the counter and moving to the next entry in the section table.
counter++;
result += 0x28; // 0x28 = Section Table Entry length.
if (counter >= number_of_sections)
{
//Will return 0 if not found in any section.
return 0;
}
}
//Return the
return result;
}
Now that we know that this subroutine returns the address of the section in the section table that the RVA was in.
We can take a look and see that our guess before analyzing the subroutine was correct, and the calculation in sub_401040 is indeed: Return Value = RVA - (sub_401000 return value +0x0C)??? + (sub_401000 return value +0x14)??? + “EXE Base Address” And if we translate that: Return Value = RVA - “Virtual Address” + “Raw Address” + “EXE Base Address”
So now we know that all those functions do is translate the RVA to a linear file offset in memory.
Continue with the Analysis of sub_4010A0
Where we left of was when we called sub_401040 while passing it the Import Directory RVA, let’s rename that sub_401040 to a more readable name and continue our analysis:
Right after that we can see that we take the return value of the call, and add 0x0C to it, that will get us the “Name” field in the _IMAGE_IMPORT_DESCRIPTOR struct (which is pointed to by the IMAGE_DIRECTORY_ENTRY_IMPORT that we just translated his RVA):
After that it makes some checks to see if the field TimeDateStamp in _IMAGE_IMPORT_DESCRIPTOR is not zero as well as we have a valid name RVA value. Then, if it’s non zero, it will convert the “Name” RVA to the linear file address.
Right after that, it will start by checking if the “Name” is equal to “kernel32.dll”, “Name” being the name of the first DLL imported that it found in the file IAT. Based on if it found the “kernel32.dll”, it will do the following:
- Get the length of the import found (will be “kernel32.dll”)
- Overwrite that value with the value at “dword_403010” (first with big chunks of 4 bytes at a time, then the reminder 1 byte at a time)
- After that, it will go over to the next “Name” in the next _IMAGE_IMPORT_DESCRIPTOR.
I provided some comments to explain the flow of the instructions.
The thing that we need to understand right now is whats found at “dword_403010” - Based on my comments we can see that it is “kerne132.dll” but let’s see how I found this value. When we double click the dword we can see those values:
Because we know we write those values over the name of the import “kernel32.dll”, let’s convert those to a string and see what they stand for. To do that, we click “dword_403010” and press “A” on it:
And confirm turning it into a string, the result is:
“Kerne132.dll” - With “1” instead of an “l”.
Thats the end of the subroutine, so we can conclude what happens when the file found in an executable:
- It will map the file to memory
- Add 0x3C to get to the PE Header (NT Headers)
- Check if we reached 0x4550 (“PE”)
- Add 0x80 to get to “Import Directory RVA”
- Converting RVA to the memory address (File Memory Address = RVA - “Virtual Address” + “Raw Address” + “EXE Base Address).
- Add 0x0C to that address to get to the “name” RVA.
- Now does the following in a loop:
- Make sure the current “Name” and “DateTimeStamp” are set, if not will exit the loop.
- Conver the Name RVA to the memory address in the file. (Will point to the current import dll name)
- Comparing that to “kernel32.dll”.
- If it is, overwrite it with “kerne132.dll”
- Add 0x14 to get to the next “Name” RVA, and loop again.
If its a directory
Because this part has a lot of arithmetic operations, including “repne scasb” and “rep movsd/b” instructions, we can debug it in x64dbg instead of analyzing every instruction to make our analysis faster.
But before debugging it, let’s quickly see where we should take a look, we can see at the start it moves the folder name to edi:
After a bit, we can see a call to malloc:
We can see the return value is stored in edx, and at the end it is sent to sub_4011E0 (The subroutine we are at, a recursive call) as a parameter.
Another thing we can see right before the call to sub_4011E0, is that it adds 1 to a parameter (arg_4) that is also being sent to the sub, and if we remember it was 0 in the original call to the sub:
So let’s go over it quikly in x64dbg and see what it does, let’s go to the start after it found a folder at 0x004012A0, where we can see the name of the folder found “Documents and Settings”:
Now we know we can run untill we see the call to sub_4011E0 and see what it is passing:
As we can see, it builds the new path for the directory and then adds “\*” to the end of the path and pushes it as a parameter. The second parameter is like a counter that started at 0 and is incremented by 1 before the call to sub_4011E0 again (in here it’s 1).
If we think about it logically, this parameter (arg_4) is used to know the depth we are currently at, because at each folder, we add one and call the subroutine recursivly.
If we take a look where else it is being used, we can see it is being compared to 7 right at the start, and only it will skip everything and return immediatly if it is 7 - meaning it doesn’t go past the depth of 7.
So we can summerize what happens here, if it is a folder it will call the subroutine recursivly with the changed path, but it will not go past the depth of 7 folders deep.
Analyzing Lab07-03.dll
The DLL starts by checking for the existence of the Mutex “SADFHUHF”, if it does it will exit, if not it will continue.
Right after that we can see a call to WSAStartup - the function initiates use of the Winsock DLL by a process.
Then we can see the usage of sockets, including an ip address and port - probably of the c&c.
It then sends the string “hello”,
Then after receiving data from socket, we can see 3 options:
- If “sleep” is sent: we sleep for 393.216 seconds.
- If “exec” is sent: We can see a call to CreateProcessA.
The interesting argument for the call to CreateProcessA is the CommandLine, but if we xref it to see its value:
We can see there is no write to it, only the read in the call to CreateProcessA, weird, let’s look at the local variables and function arguments:
We can see it is after the buf local variable, and after the “var_FFF”, but if we take a look where it writes to “buf”, when it receives the data from the network, we can see:
So it writes to the buffer 4096 bytes, a lot longer than IDA gave “buf”, let’s convert it to an array to make it easier to understand:
And set the size to the right size we see in the “recv” function
And if we take a look at the local variables and function arguments again:
So now when we look at the section where it provides the lpCommandLine, we can see its the 5th byte in buf variable:
That means that if he wants to execute something, it sends “exec FULL_PATH_TO_FILE” - the “+5” is to omit “exec “ from the lpCommandLine. - If “q” is sent: it shutdowns the c2 and exit the loop.
So the dll is just a c&c that connects to the remote IP “127.26.152.13” on port 80 and can get 2 commands over the socket:
- “sleep” - sleep for 393 seconds.
- “exec” - execute something based on the patameter provided after.
- “q” - shutdowns the c2 and exit the loop.
- Anything else will also sleep for 393 seconds.
Dynamic Validation
We can run this malware to validate some of our findings and see how they look after execution.
Here is the example of an executable before and after the malware being run on the machine, we can take a look at the imports and see that “KERNEL32.DLL” was replaced with “kerne132.dll” as we saw in our analysis.
We can also see that “kerne132.dll” is in “C:\Windows\System32”:
Now let’s take a look at the dll before and after, we can see the export difference between those two files.
If we open the new one in a tool like PE-Bear or pestudio and go to the exports we can see that every export of KERNEL32.DLL was added to kerne132.dll (the renamed Lab07-03.dll) with a forwarding for every export to KERNEL32.DLL.
That means that the functionality of all files on the system wont be harmed, because althogh they load kerne132.dll instead of KERNEL32.DLL, it has the same export functionality because of the redirect, but it runs its mallicous code in DLLMain when the DLL is loaded.
1. How does this program achieve persistence to ensure that it continues running when the computer is restarted?
The program achieves persistence by changing the imports of every executable file (not deeper than 7 folders) to import the mallicous DLL: “C:\windows\system32\kerne132.dll” (a renamed Lab07-03.dll) instead of the legitimate “C:\windows\system32\kernel32.dll”.
2. What are two good host-based signatures for this malware?
- The mutex “SADFHUHF”
- The existence of the file “C:\windows\system32\kerne132.dll”
3. What is the purpose of this program?
Based on our analysis:
EXE File - “Lab07-03.exe”
This program starts by opening and mapping to memory “kernel32.dll” and the “Lab07-03.dll”, then it does some operations including: comparing memory, calculating offsets, or writing to memory, and after that it writes the “Lab07-03.dll” to “C:\windows\system32\kerne132.dll” After that this exe goes over all files on the system, finds the .exe files, and replaces an instance of “kernel32.dll” with “kerne132.dll” meaning that probably every exe on the system will load our malicious DLL instead of the real “kernel32.dll”. DLL File - “Lab07-03.dll”
The DLL is basically just a c&c that connects to the IP “127.26.152.13” on port 80 and can take 3 commands sent over the socket:
- “sleep” - sleep for 393 seconds.
- “exec” - execute something based on the patameter provided after.
- “q” - shutdowns the c2 and exit the loop.
- Anything else will also sleep for 393 seconds.
4. How could you remove this malware once it is installed?
This malware is difficult to remove because it changes all exes on the system, but some ways we can deal with it:
- Restore a backup of the machine from before the malware ran.
- Backup the important files and install windows again, clean installation from the start.
- We can modify the bad part out of the “kerne132.dll” (the executing commands part / the c&c) and keep the other things as is.
- We can delete the “kerne132.dll” and rename “kernel32.dll” to it.
- We can write a program that will go over all exes on the system and change the import back to what it was (kernel32.dll), then delete “kerne132.dll” from the system.