JFDI: WTF.py Hexadecimal Conversion

posted Jan 29, 2012, 10:37 AM by Ramece Cave
A few months ago a friend asked me to look at a malicious word document. He had already run it through the tools available to him, and based on what he told me, nothing relevant or of significant use was found. (Reading between the lines, there was nothing for him to look for on affected hosts) Malicious document analysis is not one of my strong points, but I like a challenge and had an hour to kill during my lunch break, so I decided to give it a try. WTF.py is reusable problem solving piece of code that resulted at the end of the exercise. 

Initial Analysis

Due to the timing I wanted to find anything that really stuck out, that maybe of use. For me tackling an unknown problem works best by trying the simple things first, looking for strings, patterns, large voids in the data etc. First I ran the Unix/Linux program Strings against the .doc file to see if anything sticks out. Generally for me this simple yet affect tool more often then not leads me in the right direction, or gives me an idea of what I need to start looking at. 

The image above is a simplified version of the document content. Based on the structures it was a Microsoft Word Document, but a few extra "goodies" were added for that special user.  We see lots of overflow content (0x40 : @) and the possible shellcode (more on that later), what really interested me is  the seemingly random string (00077696e6c6f67696e2e657865000). As you may already know Strings only displays the ASCII content in a file (thats plain old English for the uninitiated). Since this block of hex did not convert to ASCII in strings, this suggested it served another purpose in the document and needed a look (the previous flurry of  @ symbols also helped). What sticks out to me is, why the padding of zeros? Is this important? I think so.

Converting the Data

Thats were WTF.py comes into play, deciphering hexadecimal usually is straightforward, a hex editor is all you need, then instant ASCII, but sometimes (situations like this)  each byte needs to be assembled first. The resulting string is winlogin.exe which is important, because winlogon.exe and not winlogin.exe is a part of the standard Microsoft Windows installation. 

For the first test, assume the 0x00 are just padding, easiest way to be sure is the hex string characters should total to an even number. The table below is broken down into four columns: 

File Hex - The two bytes of data in the payload.
File ASCII - The ACII string the two hex bytes represent.
Converted Hex - The hex formatted ASCII content.
Converted ASCII - The decoded ASCII character based on the previous data.

 File Hex File ASCII Converted Hex Converted ASCII
 0x3737 77 0x77 w
 0x3639 69 0x69 i
 0x3665 6e 0x6e n
 0x3663 6c 0x6c l
 0x3666  6f 0x6f o
 0x3637 67 0x67 g
 0x3639 69 0x69 i
 0x3665 6e0x6e 
 0x3265 2e0x2e  .
 0x3635 650x65 
 0x3738 78 0x78 x
 0x3635 650x65  e

Doing something like this manually, although fun (depending on your definition) is time consuming and overflowing with opportunity to miss something or make a mistake; automating the process seems a lot more efficient.  There are tools available that can do this for you, a little googling and trial/error may find what you are looking for (Do you have the time, is it faster to write something?). The purpose of this exercise is to create my own tool. For this I am using Python and the binascii module. The binascii module is feature rich and can do most of the conversions for you in about four lines of code (which a friend showed me after). For the sake of argument, we do or did not know about binascii's cleverness. For me to fully grasp the problem, I broke it apart into individual steps or problems I needed to solve. The final solution had to be written in a way to be scriptable.

Individual Problems

  1. Read the hex string into a buffer.
  2. Determine if its an even length.
  3. Construct each individual hex byte.
  4. Convert the bytes into a readable string.

Long before writing WTF I had searched for methods to read a string into individual characters and re-assembling them in different formats. The answers were either far too complicated for me to understand, or logically did not make sense. Thinking in terms of assembly language (simple approach) I wanted to develop each individual step. (This was the only way for me to understand how to solve this problem). 

Program Design and Code

Wtf works by reading in a string from the command line. This allows me the flexibility to add wrappers and others scripts if needed. Suppose I have list of hex strings to convert. BASH or another scripting language can be used to feed each string to WTF until it can be updated to read from a file. Given the time constraints and current situation I wanted to make a re-usable and not perfect end-all-be-all solution. Perfection can be added later as time permits, for now we have something very efficient and usable.

wtf.py <string>

The payload is read in from the command line and the length is checked, if the length is not even, an exception is raised (probably forgot a digit in the copy paste process). Three lists or arrays are used to store the data:

payloadArray - Holds the read in value.
payloadBytes - Holds the hex byte representation of the value.
payloadAscii - Holds the ASCII byte representation of the value.

if len(payload) % 2 == 0:
    payloadArray = []
    payloadBytes = []
    payloadAscii = []

    for x in payload:

This next portion of the code splits the payload (read in value) into individual digits (d1 and d2) via a stack.  The digits are combined to form a byte and the byte is added to  an array.

    while len(payloadArray) > 0:
        d1 = payloadArray.pop(0)
        d2 = payloadArray.pop(0)
        byte = d1 + d2
        payloadBytes .append(byte)

The last portion of the code uses binascii to convert the bytes to ASCII and adds them to an array, ignoring non-printable content.

    for byte in payloadBytes:
            asciiValue = binascii.unhexlify(byte)
    print ''.join(payloadAscii)

    print "Cannot process, sample is missing or has too much data"