In this very brief post I'm going to share a tool I've build that does binary taint analysis using Angr. There really isn't much to talk about since the code is pretty readable and not complex but I will also walk though a quick introduction to the concept and why its cool. The post will include links to all the scripts used. I should mention that the tools used here are research tools they have bugs, they don't always run so smooth and there's a bunch of cases they can't manage; but they do give you access to a pretty nifty technology, symbolic execution and taint analysis!
What is Taint Analysis?
Taint analysis is a static analysis method computer scientists and other researchers use in order to track the flow of data in a program. Essentially one does taint analysis to see which points in the programs execution are influenced by user input. This is nifty because it helps prune down source code analysis to the most relevant sections of code. It also obviously helps guide fuzzing toward more fruitful areas of the code too!
The script we're going to develop here simply prints out any dangerous c functions, who's symbolic state is tainted by our input; this means either a register, memory value, file descriptor etc any part of the symbolic state at some point was dependent on our input.
Taint analysis comes in two variants static which is based purely on code and definition analysis; and dynamic which relies on actual execution and instrumentation to collect information. Each approach has its own draw backs, for instance dynamic analysis or any analysis that works purely by collecting live execution data risks under approximating behavior---only being aware of common input path based behavior. The opposite effect is true for static methods, because they only work on source code--although requiring only source or static definitions--can often report more bugs or events than is practically possible. The work of some research is to prune and whittle down these results through various tricks and schemes, blend methods together.
To keep things to the point, in this post we will only focus on easy static taint analysis. The good thing, this taint analysis approach is pretty accurate, it just suffers from a couple draw backs that are sometimes manageable for real world binaries.The upside of this approach is first and foremost that its easy to implement and is relatively accurate. In future research I will hopefully provide some hacks to get Angr running a bit smoother for complex binaries.
Claripy Annotations for Taint Analysis
We're doing taint analysis by using claripy's Annotations. These are basically classes that you can use to tag symbolic vectors or AST elements. It turns out there's a special parameter included in the constructors of claripy.BVS objects that accepts an annotation class. For now we're going to just use a blank instances of the base Annotation class in claripy.
Here's how you setup a symbolic execution run in Angr with an annotated ARGV input:
And then in the hooks we simply check if there's an annotated register, bare in mind when it comes to certain calling conventions rsi, rdi and other registers often hold pointers to parameters, so checking them for annotation first makes sense:
Now why would we want to use annotations? Well when AST binary operations and others involve operands that are annotated, the annotation will be transmitted to the destination operand. This means we can track the data flow of input if we set a start taint on a value we know we control. Angr will handle symbolic execution of the binary for us.The rest of the work is simply developing hooks for the functions we would like to intercept or report on, and making sure the hooks can inspect their symbolic states for annotations.
I've test SporeCrawler on real world binaries from my host machine as well as some simple litmus tests to make sure I'm not going crazy. Here's what a nice run of SporeCrawler looks like, gnuplot is the target binary here:
SporeCrawler has a couple options but it mostly serves to be a good example of implementing angr to do taint analysis, check out more about it here: https://gitlab.com/k3170makan/SporeCrawler.git
Thanks for the sharing!
ReplyDeleteI wonder that, if there is any possible way to implement byte-level taint analysis during symbolic execution with the technique you shared?
To be more specific, the goal is to track the data flow and connect to the certain bytes with input. As far as I know, claripy.Annotation() will propagate with the value, which means that we couldn't identify certain bytes of the value from the annotation.