Sunday, 23 November 2014

Automated DEX Decompilation using Androguard part II: Dex2Java

So I Googled Java Construction...
The next post in the Androguard tutorial series (By the way here's part one). Here we are going to see how to construct a novel script that de-compiles an APK into full Java code using Androguard and python.

Working from the previous post about de-compiling a dex file into Dalvik byte code (which is actually little more than purely interpreting the contents of the Dex file -__-) we're actually going to use Androguard here to take the analysis one step further, actually producing readable Java Code! With one or two caveats (like import statements :P).

Okay cool so you have Androgaurd installed and you'd like to start dumping java source autonomously straight from APKs? Well here's how you do that.

My Second Androguard script : Dumping Java code

So as with the previous tutorial, here's the script:

I should probably explain that this script does a lot of things that it doesn't need to, for instance I subsequently learned that I don't need to convert the type descriptors myself. Androgaurd does have a comple methods hidden away in the Dalvik handling classes mentioned in the previous post that would be able to do this and many other cool things for you. This script was meant as a quick dirty solution to dumping Java Source from an APK ;) 

Lets look at whats going on here...
Well, I've added a number of methods to ensure that this script is practical. We have a some code that checks that the correct arguments have been given, some code actually writes the code out to a file, some other code (quite inefficently I might add) makes sure we only write paths we haven't written before. Though those aren't very interesting if you've come here to learn about Androgaurd! So lets start from a line of code that actually introduces something I haven't covered before about Androguard. 

I'm going to start with line 48. Here we see a call to a constructor of a class called VMAnalysis. Whaaaaat the hell does this call do and why do I need to make this call? Well judging from the source code of this class it looks to structure information about the DalvikVMFormat object, it collects information about the methods and the fields and passes off these objects to calls that actually analyze them, so its a bit of a wrapper class for more labor intensive calls. For instance, if you actually dig in to the VMAnalysis class you should see a call to MethodAnalysis which actually runs through the method and collects details like the exceptions registered by the method. And this is all so that you can access these details my making calls to stuff like get_tags, get_vm and get_local_variables. VMAnalysis is class that quickly runs through some attributes about methods and classes and gives you access to some of them through convenient calls. You actually just craft a call straight to the MethodAnalysis class though their might some other admin you need to take care of (like properly preparing methods to present to the MethodAnalysis class) which is exactly what the VMAnalysis class is for!

Just before I get into this code a little more I should mention again that the really cool thing about Androguard is that it works with the Dalvik format in its own format it doesn't really do much to obscure the actual format to make it more manageable or human friendly. The reason this is good (for reverse engineers and security researchers) is because it really gets you used to working with the format. For instance if you take a quick read through this script you will see a basic pattern; Grab the classes from the dex, then for each class grab the static fields (interpret the access rights for each of them) and them grab the methods and process them. If you're just getting into autonomously processing Dalvik format executables, then learning this structure is quite useful, you now don't need to read through tutorials over and over again because once you know that basic structure you can pretty much navigate the file (with Androgaurd) and start solving problems on your own! :)

Okay back to the code. One of the calls the VMAnalysis presents is the get_method call which we use to pull out methods in line 61 so we can de-compile them later. The get_method call returns a MethodAnalysis type which is what the format the decompiler excepts when it looks for methods to decompile. This is probably because the decompiler uses methods like get_length, get_basic_blocks and the other calls to structure the decompilation process.

So now we have an object that holds information about a methods analysis, and we use this to determine whether its useful to decompile a method. This is done by making sure that the get_code call (which again is probably used by the decompiler, hopefully) doesn't return a None type, indicating that the method doesn't have any code to decompile (this being python, the script will crash if we allow None types to sneak into other calls).

Once we know that the method has some code, we then pass this MethodAnalysis object to the decompilers DVMethod class which preps the MethodAnalysis object for actual decompilation and basically collects all the information the MethodAnalysis class presents through its many methods.
We then call process() on the in line 65, which does the real hard work and actually decompiles the Dalvik Executable code to Java.

Once we have our source code decompiled, we can then call the get_source method in line 66. Here I've decided to stick a tab character in front of each line so it looks a little prettier :).

And that's pretty much all you need to know really. Here's a screenshot of the script in action: 

Dumping code with
And here's a quick look at the result.
The result.