[Symbolic Execution 0x0] Solving easy CTFs with Angr and Symbolic Execution

Hi folks, I just learned a couple nifty tricks with angr, a popular symbolic execution framework with a very slick python front end. Turns out this tool makes solving the odd crack me CTF extremely easy, I've been porting the same script around for a number of CTF challenges and it's knocking em down like nobody's business. So in the following post I'm going to give you folks a quick crash course in using the tool and show you how easy it is to solve a sample crack me.

What is Symbolic Execution

Without going into full academic detail symbolic execution is essentially hard proof that all the algebra you learned in high school is kinda nifty. Symbolic Execution engines model a program's behavior based on the inputs they are given. These engines (which may vary in their approach) operate by building a database of algebraic statements about a program, sweeping up all the possible assignments and comparisons that may result in a concrete state, after which you are then allowed to query these statements to see if a certain assignment of variables makes a given state reachable. To quote a really helpful paper on this:

More specifically, a symbolic execution engine replaces input
with “symbolic input”—analogous to an algebraic variable—and walks through code paths, “constraining” the symbolic input at each branch such that an input to the
program that satisfies all constraints will cause the program to reach that particular path. The engine can then explore many possible execution paths until it identifies a
specific path or program state of interest, at which point
it can determine the input which would trigger it.
- "Teaching with angr: A Symbolic Execution Curriculum and CTF∗" @ https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf 

Another good explination:

Symbolic execution is a technique that explores feasible paths by setting an input value to
a symbol rather than a real value. The symbolic execution was first published in King’s paper in
1975 [12]. This test technique was developed to verify that a particular area of software may be
violated by the input values. The symbolic execution is largely divided into the offline symbolic
execution and the online symbolic execution. The offline symbol execution solves by choosing only
one path to create a new input value by resolving the path predicate [13]. -  An Automated Vulnerability Detection and Remediation Method for Software Security https://www.mdpi.com/2071-1050/10/5/1652

The analogy you should reach here is that its very similar to solving for "x" in an algebraic equation. We're going to look at a couple different scenarios with Angr and see how exactly you solve for x, but to start we will use a simple approach and let Angr do most of the thinking for us, taking advantage of the clear signals indicating the desired state in a program i.e we're going to make Angr try everything until the program literally reports "Good job" or "correct" or whatever gets printed to the screen to indicate that.

Having read the above I don't want you to assume this will take all of the reversing fun away, you will need to tell Angr where to start and stop and what to look for, and this insight will come from reading the code yourself, there maybe more contrived "win-states" or more complex phases to getting the correct input into a target function so don't completely abandon you RE skills just yet hehe.

Simple Example with Angr

The following example shows a CTF challenge I got form a random site, to spare the contestants of the site I won't mention where its from in case folks are still trying to solve this one, but its a pretty easy challenge so I doubt too many will be super upset by the solution being shown here. Anyway here's what the binary looks like:

To give you a quick summary of whats going on:

  • @0x830 we can see the binary grab the argv pointer from the rsi and store it at rbp-0x20 (called var_28)
  • @0x858 argv[1] is passed to a sub routine called "checkPassword()" via rdi 
  • @0x867 the checkPassword return code (via al register) is checked against 0, if it is 0 the win state is assumed and "Jackpot" is printed to the screen

We aren't even going to cover what checkPassword does or whether there's some cool xor crib or double pad to exploit, we don't need to know! Angr will sort out all the work for us, via the Al-Kawarithmic magic of algebra! 

So now we know where to aim, we want to know how to get to the code block @0x869 by giving it a string through argv[1], lets see if we can configure Angr to solve this.

Controlling your Angr

It be a little complex just jumping straight in so lets talk about the general process of breaking down some symbolic execution with Angr. Here's how the process usually goes:
  1. Define a win condition - for the first couple times you use Angr, and for most simple CTFs this is as difficult as telling which address to look for as a reachable state or telling it to report on input that results in something being printed to the screen. 
  2. *(optional) Define a fail condition - you may not need to do this, but you can also tell Angr to avoid certain code blocks in the solutions it posts, again just a simple criteria for the constraints it will search through.
  3. Load up a binary - Not complex at all, this consits of a single API call to tell Angr which binary to analyze.
  4. Define Variables -,this tells Angr which values you are ear-marking as criteria for a win state basically warning it to keep an eye out for how these values influence execution. Also a simple set of API calls and may vary with complexity depending on if your values are in registers, coming form the command line, or are just a collection of memory addresses. 
  5. Set an initial state - Very important, this is us telling Angr's simulation manager where in the binary we want to get the party started. We can point it at any place in thee binary given it makes sense to execute from there (paying attention to stack conservation and arguments!)
  6. Solve! - this the part where you tell Angr to make the magic happen. 
So to summarize, you basically setup a start execution state, and then tell Angr to run all the goodness until it matches a given criteria.

Scripting up a solution

The following script details the solution. 

I've commented the crap outta this but I will line by line this as previous posts on the subject have.

The first line we see doing anything interesting is at 8:

project = angr.Project(elf_binary) #load up binary

This essentially tells angr which binary we want to tagret. The next line declares a Bit Vector String using Claripy so we can ear mark the argv[1] argument for constraint solving.

arg = claripy.BVS('arg',8*0x20) #set a bit vector for argv[1]

Why a bit vector? Well given that we want to represent all the possibilities for the input, should it not actually require a simple character (guessing only at character level) but some obscure value of bits, telling Claripy we want a vector of individual bit values makes sure we don't miss any details. Modeling the input this way is much more fine grained an realistic, imagine we are modeling some linux driver input here, you definitely want all the possible bit values since some will represent over/under-flowed integers.

Claripy is the constraint solver for Angr, again we need not delve too deep into how it works; but for claripy's sake you can imagine that this builds Abstract Syntax Trees based on the disassemble'd source in order to structure the arguments stuffed into sub-routines and math procedures, the arguments can be modeled as a couple of different things in order encompass different aspect possible of the input. Better coverage of this can be found at Angr's documentation page (https://docs.angr.io/advanced-topics/claripy ).

I've made it way larger than it need be, and if you'd like to model yours a bit tighter you may use some information in the disassembly to declare a smaller BVS for your script.

We now need to tell Angr to pay attention to this value, just grabbing an instance of a bit vector is not enough, we need to stuff it into a call that associates it to our project, that happens in line 11:

initial_state = project.factory.entry_state(args=[elf_binary,arg])

Nothing to hard here, we then take this and use it to setup our simulation manager, the thing that steps through our program and checks for the goodness, from line 12-13:

simulation = project.factory.simgr(initial_state)

Angr's simulation manager is pretty nifty, i recommend taking a look at the other amazing options and api calls fleshed out for it here (https://docs.angr.io/core-concepts/pathgroups). For our purposes here we will stick to just telling it where we want to start and when to consider its exploration a success.  The last part we need to make sure is defined is the win state, in line 13 we gave the explore function a parameter named "is_successfull" this is a method name that Angr will call on each state it calculates matches our solution, we need to tell it when to return True or False so it knows what we want. For us this means checking the standard output for a certain string "Jackpot":

def is_successful(state):
output = state.posix.dumps(sys.stdout.fileno())
if b'Jackpot' in output:
return True
return False

This is a typical function call handler style API, you stuff this method name in somwhere as a parameter and in the internal magic of Angr it gets called over and over until a true is reached. What you should pay attention to here is the parameter passed to it, later on you may want to be a bit more creative than just string matching the standard output, so check out what other properites a "state" has in terms of angr's docs and it will give you other ideas you can configure the simulation manager to halt on. Obvious examples being, register values (eip, eax, etc etc) or values in a given memory region or a function of how many times a given code block is hit, the possibilities are endless (in exception of whatever the halting problem prevents you from using as a criteria lol).

Checking this for a solution we see it quickly finds the right string:

Okay so that's pretty solid proof we have the correct answer, I think if you've just started out with Angr you may want to grab this script as a starting point and add small modifications to it to see if you can pump out solutions for other CTF challenges.

happy hacking!

Reading and References

  1. Introduction to Angr (part 1) https://blog.notso.pro/2019-03-25-angr-introduction-part1/
  2. Angr offical documentation https://docs.angr.io/ 
  3. Teaching with angr : a Symbolic Execution Curriculum and CTF* - https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf