[Symbolic Execution 0x1] Modeling registers and setting constraints

Hi folks, in the previous post I covered a simple example showing how Angr can speed up solving keygen / crackme type challenge. In this one I'm covering an explanation of how symbolic modeling of registers works with Angr and throwing in a weird little problem that required argv constraints to solve.

If you're joining us at this post and find yourself a little lost, then please check out the previous one in the series available here:

In the series I'm covering some tricks you can pull of with Angr to model execution states and get some quick solutions to a few novel CTF challenges. As for this post; we move on to modeling register values as part of our initial state and setting constraints on argv or any parameter as part of a solution or initial state.

I'm finding that the key to getting really good at Angr is learning the different parts of its vocabulary for describing execution states, it provides ton's of ways to setup an execution state at an arbitrary place in a binary and then crunch away until exhausts all the potential value for the variables you mark of in the "equation". Obviously the more obscure an execution state you can describe using Angr the better you'll be able to apply to Malware, Rootkits or other contrived binaries.

Setting Constraints with Angr

As we know by now the aim of the game is to describe an execution state that models our targeted place in the binary, part of the power you have over this initial state is placing constraints on the values we want to search through for our solution. The following example shows just that for the argument passed to a binary via argv.

To start, here's what our binary looks like:

Ensuring we are on the same page with the analysis, what you should note is:

  • First couple instructions @0x6d2 and @0x6d5 pull the argv pointer and the value for argc. It then checks that we have at most (jg instruction 0x6dd) 1 argument.
  • The next code block of interest @0x6f5 ensures that the argv[1] value is a string of length 5
  • Then the most crucial check for the purpose of our example happens: 0x718 to 0x72c we can see that the binary ensures that: the first character value of the argv[1] string is less than 0x40 - 1; which implies then that its looking for 0x3F as a character value.
Then it does some weird crap with the argv[1] string from lines 0x738 to 0x77b, finally making decision (based on eax) if we win or not. Again, because of the magic of Angr we can ignore the entire code block and focus solely on the fact that:

  1. It's a "function" of rbp-0x10 or argv[1] value.
  2. It requires some attributes to be met for argv[1] if we are to reach the "weird crap".

We now know enough to model an execution state that will solve for the argv[1] value.Here's how you do it:

If you've been working through some Angr examples yourself you shouldn't need a detailed breakdown of every line, I'm not going to bloat this post by reworking through them either, if you need the catch up work, please check out the previous post. The one addition you can see here though is in line 14:

initial_state.add_constraints(argv.get_byte(0) == 0x3F)

To spare you the mystery this ensures that whatever value's it trudges through for our ear-marked argv[1] variable; the first byte will be hard set to 0x3F.

Constraints like this can really speed up your search and avoid Angr running through tons of options that will never produce a solution; so if you're in the business of solving problems quickly and you'd like to show of your skills with Angr (especially because it runs in python); look out for obvious checks like this, every single character your can squeeze out as an optimization will save you a butt load of time!

Okay so you probably don't believe me that this will find the solution, I need provide hard proof! Here's the run:

I know super obscure value lol and I'm not even sure if the designer of the problem meant for this very strange value to be the actual solution - but that might not matter because the binary clearly likes it and it definitely gets us the flag!

Quite a random place to start but I though it eased us into things well, a slightly more complex example follows.

Modeling Registers

Registers are an import part of calling conventions. What this means is depending on how you model, and to what fine a grain you'd like to model a binary, registers will frequently play a huge role in which  function gets a certain argument.

To start off, lets look at our example binary:

and to expand on the get_user_input function:

This one comes straight from the examples Angr provides and as boring as it is to use this an example; I hafta admit the examples they provide are good, they are even provided in some form of a grammar; so you can work out how to grow more examples from the one's the Angr folks provide (those of you dedicated mastering this will appreciate that). It does mean you will be able to get these explanations from other posts but the upside is: it also gives you different working and explanations for the same stuff, and often that redundancy really helps speed up cracking the strange enigmas involved in understanding symbolic execution (or anything for that matter!). 

Analyzing our binary we see the following important things:
  • After getting the values via scanf, in lines 0x8048934 to 0x8048944 it transfers them via the pointer arguments given to scanf, to the registers eax,ebx and edx.
  • In the main, after it calls get_user_input it transfers the register values eax,ebx and edx to memory pointers, from line 0x8048980 to 0x8048986
  • We can also see that at line 0x804898c the binary passes the eax value to complex_function_1, in the same suite the ebx and edx are passed to complex_function_2 and 3 respectively (at line 0x804899f and 0x80489b2).
Skipping some unsurprising analysis, complex_function_1,2 and 3 are crucial in determining our success and they are obviously functions of our eax,ebx and edx registers. We also probably don't need to model the memory pointer's themselves since they caught nice and neatly by 3 separate registers.

We want to therefore use Angr to model these registers while avoid involving ourselves with the internals of scanf or any unnecessarily stack pointer/memory politics (thanks to other blog posts on this subject, see "reading and references"). 

Here's our solution script:

We can see some new hotness in the script in lines 16-18:

init_state.regs.eax = arg0
init_state.regs.ebx = arg1
init_state.regs.edx = arg2

Pretty straight forward here, it gives the initial state a heads up and ties some claripy bit vector values to the registers mentioned. This is a very simple example and doesn't involve much drama except for the analysis needed to spot the registers we need to target.

And as for the proof:

Gets to the right solution! Anyway that's it for this one look out for more Angr goodness in future posts.

Reading and References:

  1. "Introduction to angr Part 1" - https://blog.notso.pro/2019-03-25-angr-introduction-part1/