A few months ago a friend asked me to look at a malicious word document. He had already run it through the tools available to him, and based on what he told me, nothing relevant or of significant use was found. (Reading between the lines, there was nothing for him to look for on affected hosts) Malicious document analysis is not one of my strong points, but I like a challenge and had an hour to kill during my lunch break, so I decided to give it a try. WTF.py is reusable problem solving piece of code that resulted at the end of the exercise. Initial Analysis Due to the timing I wanted to find anything that really stuck out, that maybe of use. For me tackling an unknown problem works best by trying the simple things first, looking for strings, patterns, large voids in the data etc. First I ran the Unix/Linux program Strings against the .doc file to see if anything sticks out. Generally for me this simple yet affect tool more often then not leads me in the right direction, or gives me an idea of what I need to start looking at. Converting the Data Thats were WTF.py comes into play, deciphering hexadecimal usually is straightforward, a hex editor is all you need, then instant ASCII, but sometimes (situations like this) each byte needs to be assembled first. The resulting string is winlogin.exe which is important, because winlogon.exe and not winlogin.exe is a part of the standard Microsoft Windows installation. For the first test, assume the 0x00 are just padding, easiest way to be sure is the hex string characters should total to an even number. The table below is broken down into four columns: File Hex - The two bytes of data in the payload. File ASCII - The ACII string the two hex bytes represent. Converted Hex - The hex formatted ASCII content. Converted ASCII - The decoded ASCII character based on the previous data.
Doing something like this manually, although fun (depending on your definition) is time consuming and overflowing with opportunity to miss something or make a mistake; automating the process seems a lot more efficient. There are tools available that can do this for you, a little googling and trial/error may find what you are looking for (Do you have the time, is it faster to write something?). The purpose of this exercise is to create my own tool. For this I am using Python and the binascii module. The binascii module is feature rich and can do most of the conversions for you in about four lines of code (which a friend showed me after). For the sake of argument, we do or did not know about binascii's cleverness. For me to fully grasp the problem, I broke it apart into individual steps or problems I needed to solve. The final solution had to be written in a way to be scriptable. Individual Problems
Long before writing WTF I had searched for methods to read a string into individual characters and re-assembling them in different formats. The answers were either far too complicated for me to understand, or logically did not make sense. Thinking in terms of assembly language (simple approach) I wanted to develop each individual step. (This was the only way for me to understand how to solve this problem). Program Design and Code Wtf works by reading in a string from the command line. This allows me the flexibility to add wrappers and others scripts if needed. Suppose I have list of hex strings to convert. BASH or another scripting language can be used to feed each string to WTF until it can be updated to read from a file. Given the time constraints and current situation I wanted to make a re-usable and not perfect end-all-be-all solution. Perfection can be added later as time permits, for now we have something very efficient and usable. wtf.py <string> The payload is read in from the command line and the length is checked, if the length is not even, an exception is raised (probably forgot a digit in the copy paste process). Three lists or arrays are used to store the data: payloadArray - Holds the read in value. payloadBytes - Holds the hex byte representation of the value. payloadAscii - Holds the ASCII byte representation of the value. if len(payload) % 2 == 0: payloadArray = [] payloadBytes = [] payloadAscii = [] for x in payload: payloadArray.append(x) This next portion of the code splits the payload (read in value) into individual digits (d1 and d2) via a stack. The digits are combined to form a byte and the byte is added to an array. while len(payloadArray) > 0: d1 = payloadArray.pop(0) d2 = payloadArray.pop(0) byte = d1 + d2 payloadBytes .append(byte) The last portion of the code uses binascii to convert the bytes to ASCII and adds them to an array, ignoring non-printable content. for byte in payloadBytes: try: asciiValue = binascii.unhexlify(byte) payloadAscii.append(asciiValue) except: pass print ''.join(payloadAscii) else: print "Cannot process, sample is missing or has too much data" sys.exit() #END |
bl0g: /dev/ramble >