pyvex — Binary Translator

PyVEX provides an interface that translates binary code into the VEX intermediate representation (IR). For an introduction to VEX, take a look here: https://docs.angr.io/advanced-topics/ir

class pyvex.IRSB(data, mem_addr, arch, max_inst=None, max_bytes=None, bytes_offset=0, traceflags=0, opt_level=1, num_inst=None, num_bytes=None, strict_block_end=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True)[source]

Bases: VEXObject

The IRSB is the primary interface to pyvex. Constructing one of these will make a call into LibVEX to perform a translation.

IRSB stands for Intermediate Representation Super-Block. An IRSB in VEX is a single-entry, multiple-exit code block.

Variables:
  • arch – The architecture this block is lifted under. Must duck-type as archinfo.arch.Arch

  • statements (list of IRStmt) – The statements in this block

  • next (IRExpr) – The expression for the default exit target of this block

  • offsIP (int) – The offset of the instruction pointer in the VEX guest state

  • stmts_used (int) – The number of statements in this IRSB

  • jumpkind (str) – The type of this block’s default jump (call, boring, syscall, etc) as a VEX enum string

  • direct_next (bool) – Whether this block ends with a direct (not indirect) jump or branch

  • size (int) – The size of this block in bytes

  • addr (int) – The address of this basic block, i.e. the address in the first IMark

Parameters:

arch (Arch)

MAX_EXITS = 400
MAX_DATA_REFS = 2000
MAX_CONST_VALS = 1000
__init__(data, mem_addr, arch, max_inst=None, max_bytes=None, bytes_offset=0, traceflags=0, opt_level=1, num_inst=None, num_bytes=None, strict_block_end=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True)[source]
Parameters:
  • data (str or bytes or cffi.FFI.CData or None) – The bytes to lift. Can be either a string of bytes or a cffi buffer object. You may also pass None to initialize an empty IRSB.

  • mem_addr (int) – The address to lift the data at.

  • arch (Arch) – The architecture to lift the data as.

  • max_inst – The maximum number of instructions to lift. (See note below)

  • max_bytes – The maximum number of bytes to use.

  • num_inst – Replaces max_inst if max_inst is None. If set to None as well, no instruction limit is used.

  • num_bytes – Replaces max_bytes if max_bytes is None. If set to None as well, no byte limit is used.

  • bytes_offset – The offset into data to start lifting at. Note that for ARM THUMB mode, both mem_addr and bytes_offset must be odd (typically bytes_offset is set to 1).

  • traceflags – The libVEX traceflags, controlling VEX debug prints.

  • opt_level – The level of optimization to apply to the IR, -1 through 2. -1 is the strictest unoptimized level, 0 is unoptimized but will perform some lookahead/lookbehind optimizations, 1 performs constant propogation, and 2 performs loop unrolling, which honestly doesn’t make much sense in the context of pyvex. The default is 1.

  • strict_block_end – Should the LibVEX arm-thumb split block at some instructions, for example CB{N}Z.

Note

Explicitly specifying the number of instructions to lift (max_inst) may not always work exactly as expected. For example, on MIPS, it is meaningless to lift a branch or jump instruction without its delay slot. VEX attempts to Do The Right Thing by possibly decoding fewer instructions than requested. Specifically, this means that lifting a branch or jump on MIPS as a single instruction (max_inst=1) will result in an empty IRSB, and subsequent attempts to run this block will raise SimIRSBError(‘Empty IRSB passed to SimIRSB.’).

Note

If no instruction and byte limit is used, pyvex will continue lifting the block until the block ends properly or until it runs out of data to lift.

addr
arch: Arch
statements: list[IRStmt]
next: IRExpr | None
jumpkind: str | None
default_exit_target
data_refs
const_vals
static empty_block(arch, addr, statements=None, nxt=None, tyenv=None, jumpkind=None, direct_next=None, size=None)[source]
property tyenv: IRTypeEnv
property has_statements: bool
property exit_statements: tuple[tuple[int, int, IRStmt], ...]
copy()[source]
Return type:

IRSB

extend(extendwith)[source]

Appends an irsb to the current irsb. The irsb that is appended is invalidated. The appended irsb’s jumpkind and default exit are used. :type extendwith: :param extendwith: The IRSB to append to this IRSB :vartype extendwith: IRSB

Return type:

None

invalidate_direct_next()[source]
Return type:

None

pp()[source]

Pretty-print the IRSB to stdout.

Return type:

None

typecheck()[source]
Return type:

bool

static from_c(c_irsb, mem_addr, arch)[source]
Return type:

IRSB

static from_py(tyenv, stmts, next_expr, jumpkind, mem_addr, arch)[source]
Return type:

IRSB

property stmts_used: int
property offsIP: int
property direct_next
property expressions

Return an iterator of all expressions contained in the IRSB.

property instructions

The number of instructions in this block

property instruction_addresses: tuple[int, ...]

Addresses of instructions in this block.

property size

The size of this block, in bytes

property operations

A list of all operations done by the IRSB, as libVEX enum names

property all_constants

Returns all constants in the block (including incrementing of the program counter) as pyvex.const.IRConst.

property constants

The constants (excluding updates of the program counter) in the IRSB as pyvex.const.IRConst.

property constant_jump_targets

A set of the static jump targets of the basic block.

property constant_jump_targets_and_jumpkinds

A dict of the static jump targets of the basic block to their jumpkind.

class pyvex.IRTypeEnv(arch, types=None)[source]

Bases: VEXObject

An IR type environment.

Variables:

types (list of str) – A list of the types of all the temporaries in this block as VEX enum strings. types[3] is the type of t3.

__init__(arch, types=None)[source]
types
wordty
lookup(tmp)[source]

Return the type of temporary variable tmp as an enum string

sizeof(tmp)[source]
add(ty)[source]

Add a new tmp of type ty to the environment. Returns the number of the new tmp.

property types_used
typecheck()[source]
pyvex.get_type_size(ty)[source]

Returns the size, in BITS, of a VEX type specifier e.g., Ity_I16 -> 16

Parameters:

ty

Returns:

pyvex.get_type_spec_size(ty)[source]

Get the width of a “type specifier” like I16U or F16 or just 16 (Yes, this really just takes the int out. If we must special-case, do it here. :param tyspec: :return:

pyvex.tag_to_const_class(tag)[source]
class pyvex.IRCallee(regparms, name, mcx_mask)[source]

Bases: VEXObject

Describes a helper function to call.

__init__(regparms, name, mcx_mask)[source]
regparms
name
mcx_mask
class pyvex.IRRegArray(base, elemTy, nElems)[source]

Bases: VEXObject

A section of the guest state that we want te be able to index at run time, so as to be able to describe indexed or rotating register files on the guest.

Variables:
  • base (int) – The offset into the state that this array starts

  • elemTy (str) – The types of the elements in this array, as VEX enum strings

  • nElems (int) – The number of elements in this array

__init__(base, elemTy, nElems)[source]
base
elemTy
nElems
class pyvex.VEXObject[source]

Bases: object

The base class for Vex types.

pyvex.default_vex_archinfo()[source]
Return type:

dict[str, Any]

pyvex.get_enum_from_int(i)[source]
pyvex.get_int_from_enum(e)[source]
pyvex.vex_endness_from_string(endness_str)[source]
exception pyvex.PyVEXError[source]

Bases: Exception

pyvex.get_op_retty(op)[source]
pyvex.lift(data, addr, arch, max_bytes=None, max_inst=None, bytes_offset=0, opt_level=1, traceflags=0, strict_block_end=True, inner=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True, load_from_ro_regions=False, const_prop=False)[source]

Recursively lifts blocks using the registered lifters and postprocessors. Tries each lifter in the order in which they are registered on the data to lift.

If a lifter raises a LiftingException on the data, it is skipped. If it succeeds and returns a block with a jumpkind of Ijk_NoDecode, all of the lifters are tried on the rest of the data and if they work, their output is appended to the first block.

Parameters:
  • arch – The arch to lift the data as.

  • addr – The starting address of the block. Effects the IMarks.

  • data (Union[bytes, bytearray, memoryview, None]) – The bytes to lift as either a python string of bytes or a cffi buffer object.

  • max_bytes – The maximum number of bytes to lift. If set to None, no byte limit is used.

  • max_inst – The maximum number of instructions to lift. If set to None, no instruction limit is used.

  • bytes_offset – The offset into data to start lifting at.

  • opt_level – The level of optimization to apply to the IR, -1 through 2. -1 is the strictest unoptimized level, 0 is unoptimized but will perform some lookahead/lookbehind optimizations, 1 performs constant propogation, and 2 performs loop unrolling, which honestly doesn’t make much sense in the context of pyvex. The default is 1.

  • traceflags – The libVEX traceflags, controlling VEX debug prints.

Note

Explicitly specifying the number of instructions to lift (max_inst) may not always work exactly as expected. For example, on MIPS, it is meaningless to lift a branch or jump instruction without its delay slot. VEX attempts to Do The Right Thing by possibly decoding fewer instructions than requested. Specifically, this means that lifting a branch or jump on MIPS as a single instruction (max_inst=1) will result in an empty IRSB, and subsequent attempts to run this block will raise SimIRSBError(‘Empty IRSB passed to SimIRSB.’).

Note

If no instruction and byte limit is used, pyvex will continue lifting the block until the block ends properly or until it runs out of data to lift.

Translation Interface

class pyvex.block.IRSB(data, mem_addr, arch, max_inst=None, max_bytes=None, bytes_offset=0, traceflags=0, opt_level=1, num_inst=None, num_bytes=None, strict_block_end=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True)[source]

Bases: VEXObject

The IRSB is the primary interface to pyvex. Constructing one of these will make a call into LibVEX to perform a translation.

IRSB stands for Intermediate Representation Super-Block. An IRSB in VEX is a single-entry, multiple-exit code block.

Variables:
  • arch – The architecture this block is lifted under. Must duck-type as archinfo.arch.Arch

  • statements (list of IRStmt) – The statements in this block

  • next (IRExpr) – The expression for the default exit target of this block

  • offsIP (int) – The offset of the instruction pointer in the VEX guest state

  • stmts_used (int) – The number of statements in this IRSB

  • jumpkind (str) – The type of this block’s default jump (call, boring, syscall, etc) as a VEX enum string

  • direct_next (bool) – Whether this block ends with a direct (not indirect) jump or branch

  • size (int) – The size of this block in bytes

  • addr (int) – The address of this basic block, i.e. the address in the first IMark

Parameters:

arch (Arch)

MAX_EXITS = 400
MAX_DATA_REFS = 2000
MAX_CONST_VALS = 1000
__init__(data, mem_addr, arch, max_inst=None, max_bytes=None, bytes_offset=0, traceflags=0, opt_level=1, num_inst=None, num_bytes=None, strict_block_end=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True)[source]
Parameters:
  • data (str or bytes or cffi.FFI.CData or None) – The bytes to lift. Can be either a string of bytes or a cffi buffer object. You may also pass None to initialize an empty IRSB.

  • mem_addr (int) – The address to lift the data at.

  • arch (Arch) – The architecture to lift the data as.

  • max_inst – The maximum number of instructions to lift. (See note below)

  • max_bytes – The maximum number of bytes to use.

  • num_inst – Replaces max_inst if max_inst is None. If set to None as well, no instruction limit is used.

  • num_bytes – Replaces max_bytes if max_bytes is None. If set to None as well, no byte limit is used.

  • bytes_offset – The offset into data to start lifting at. Note that for ARM THUMB mode, both mem_addr and bytes_offset must be odd (typically bytes_offset is set to 1).

  • traceflags – The libVEX traceflags, controlling VEX debug prints.

  • opt_level – The level of optimization to apply to the IR, -1 through 2. -1 is the strictest unoptimized level, 0 is unoptimized but will perform some lookahead/lookbehind optimizations, 1 performs constant propogation, and 2 performs loop unrolling, which honestly doesn’t make much sense in the context of pyvex. The default is 1.

  • strict_block_end – Should the LibVEX arm-thumb split block at some instructions, for example CB{N}Z.

Note

Explicitly specifying the number of instructions to lift (max_inst) may not always work exactly as expected. For example, on MIPS, it is meaningless to lift a branch or jump instruction without its delay slot. VEX attempts to Do The Right Thing by possibly decoding fewer instructions than requested. Specifically, this means that lifting a branch or jump on MIPS as a single instruction (max_inst=1) will result in an empty IRSB, and subsequent attempts to run this block will raise SimIRSBError(‘Empty IRSB passed to SimIRSB.’).

Note

If no instruction and byte limit is used, pyvex will continue lifting the block until the block ends properly or until it runs out of data to lift.

addr
arch: Arch
statements: list[IRStmt]
next: IRExpr | None
jumpkind: str | None
default_exit_target
data_refs
const_vals
static empty_block(arch, addr, statements=None, nxt=None, tyenv=None, jumpkind=None, direct_next=None, size=None)[source]
property tyenv: IRTypeEnv
property has_statements: bool
property exit_statements: tuple[tuple[int, int, IRStmt], ...]
copy()[source]
Return type:

IRSB

extend(extendwith)[source]

Appends an irsb to the current irsb. The irsb that is appended is invalidated. The appended irsb’s jumpkind and default exit are used. :type extendwith: :param extendwith: The IRSB to append to this IRSB :vartype extendwith: IRSB

Return type:

None

invalidate_direct_next()[source]
Return type:

None

pp()[source]

Pretty-print the IRSB to stdout.

Return type:

None

typecheck()[source]
Return type:

bool

static from_c(c_irsb, mem_addr, arch)[source]
Return type:

IRSB

static from_py(tyenv, stmts, next_expr, jumpkind, mem_addr, arch)[source]
Return type:

IRSB

property stmts_used: int
property offsIP: int
property direct_next
property expressions

Return an iterator of all expressions contained in the IRSB.

property instructions

The number of instructions in this block

property instruction_addresses: tuple[int, ...]

Addresses of instructions in this block.

property size

The size of this block, in bytes

property operations

A list of all operations done by the IRSB, as libVEX enum names

property all_constants

Returns all constants in the block (including incrementing of the program counter) as pyvex.const.IRConst.

property constants

The constants (excluding updates of the program counter) in the IRSB as pyvex.const.IRConst.

property constant_jump_targets

A set of the static jump targets of the basic block.

property constant_jump_targets_and_jumpkinds

A dict of the static jump targets of the basic block to their jumpkind.

class pyvex.block.IRTypeEnv(arch, types=None)[source]

Bases: VEXObject

An IR type environment.

Variables:

types (list of str) – A list of the types of all the temporaries in this block as VEX enum strings. types[3] is the type of t3.

__init__(arch, types=None)[source]
types
wordty
lookup(tmp)[source]

Return the type of temporary variable tmp as an enum string

sizeof(tmp)[source]
add(ty)[source]

Add a new tmp of type ty to the environment. Returns the number of the new tmp.

property types_used
typecheck()[source]

IR Components

class pyvex.stmt.IRStmt[source]

Bases: VEXObject

IR statements in VEX represents operations with side-effects.

tag: str | None = None
tag_int = 0
pp()[source]
property child_expressions: Iterator[IRExpr]
property expressions
property constants
typecheck(tyenv)[source]
replace_expression(replacements)[source]

Replace child expressions in-place.

Parameters:

replacements (Dict[IRExpr, IRExpr]) – A mapping from expression-to-find to expression-to-replace-with

Returns:

None

pp_str(reg_name=None, arch=None, tyenv=None)[source]
Return type:

str

class pyvex.stmt.NoOp[source]

Bases: IRStmt

A no-operation statement. It is usually the result of an IR optimization.

tag: str | None = 'Ist_NoOp'
pp_str(reg_name=None, arch=None, tyenv=None)[source]
tag_int = 0
class pyvex.stmt.IMark(addr, length, delta)[source]

Bases: IRStmt

An instruction mark. It marks the start of the statements that represent a single machine instruction (the end of those statements is marked by the next IMark or the end of the IRSB). Contains the address and length of the instruction.

Parameters:
  • addr (int)

  • length (int)

  • delta (int)

tag: str | None = 'Ist_IMark'
__init__(addr, length, delta)[source]
Parameters:
  • addr (int)

  • length (int)

  • delta (int)

addr
len
delta
pp_str(reg_name=None, arch=None, tyenv=None)[source]
tag_int = 1
class pyvex.stmt.AbiHint(base, length, nia)[source]

Bases: IRStmt

An ABI hint, provides specific information about this platform’s ABI.

tag: str | None = 'Ist_AbiHint'
__init__(base, length, nia)[source]
base
len
nia
pp_str(reg_name=None, arch=None, tyenv=None)[source]
tag_int = 2
class pyvex.stmt.Put(data, offset)[source]

Bases: IRStmt

Write to a guest register, at a fixed offset in the guest state.

Parameters:

data (IRExpr)

tag: str | None = 'Ist_Put'
__init__(data, offset)[source]
Parameters:

data (IRExpr)

data
offset
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 3
class pyvex.stmt.PutI(descr, ix, data, bias)[source]

Bases: IRStmt

Write to a guest register, at a non-fixed offset in the guest state.

tag: str | None = 'Ist_PutI'
__init__(descr, ix, data, bias)[source]
descr
ix
data
bias
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 4
class pyvex.stmt.WrTmp(tmp, data)[source]

Bases: IRStmt

Assign a value to a temporary. Note that SSA rules require each tmp is only assigned to once. IR sanity checking will reject any block containing a temporary which is not assigned to exactly once.

Parameters:

data (IRExpr)

tag: str | None = 'Ist_WrTmp'
__init__(tmp, data)[source]
Parameters:

data (IRExpr)

tmp
data
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 5
class pyvex.stmt.Store(addr, data, end)[source]

Bases: IRStmt

Write a value to memory..

Parameters:
tag: str | None = 'Ist_Store'
__init__(addr, data, end)[source]
Parameters:
addr
data
end
property endness
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 6
class pyvex.stmt.CAS(addr, dataLo, dataHi, expdLo, expdHi, oldLo, oldHi, end)[source]

Bases: IRStmt

an atomic compare-and-swap operation.

tag: str | None = 'Ist_CAS'
__init__(addr, dataLo, dataHi, expdLo, expdHi, oldLo, oldHi, end)[source]
addr
dataLo
dataHi
expdLo
expdHi
oldLo
oldHi
end
property endness
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 7
class pyvex.stmt.LLSC(addr, storedata, result, end)[source]

Bases: IRStmt

Either Load-Linked or Store-Conditional, depending on STOREDATA. If STOREDATA is NULL then this is a Load-Linked, else it is a Store-Conditional.

tag: str | None = 'Ist_LLSC'
__init__(addr, storedata, result, end)[source]
addr
storedata
result
end
property endness
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 8
class pyvex.stmt.MBE(event)[source]

Bases: IRStmt

tag: str | None = 'Ist_MBE'
__init__(event)[source]
event
pp_str(reg_name=None, arch=None, tyenv=None)[source]
tag_int = 9
class pyvex.stmt.Dirty(cee, guard, args, tmp, mFx, mAddr, mSize, nFxState)[source]

Bases: IRStmt

tag: str | None = 'Ist_Dirty'
__init__(cee, guard, args, tmp, mFx, mAddr, mSize, nFxState)[source]
cee
guard
args
tmp
mFx
mAddr
mSize
nFxState
pp_str(reg_name=None, arch=None, tyenv=None)[source]
property child_expressions
tag_int = 10
class pyvex.stmt.Exit(guard, dst, jk, offsIP)[source]

Bases: IRStmt

A conditional exit from the middle of an IRSB.

tag: str | None = 'Ist_Exit'
__init__(guard, dst, jk, offsIP)[source]
guard
dst
offsIP
jk
property jumpkind
pp_str(reg_name=None, arch=None, tyenv=None)[source]
property child_expressions
typecheck(tyenv)[source]
tag_int = 11
class pyvex.stmt.LoadG(end, cvt, dst, addr, alt, guard)[source]

Bases: IRStmt

A guarded load.

tag: str | None = 'Ist_LoadG'
__init__(end, cvt, dst, addr, alt, guard)[source]
addr
alt
guard
dst
cvt
end
cvt_types
property endness
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 12
class pyvex.stmt.StoreG(end, addr, data, guard)[source]

Bases: IRStmt

A guarded store.

tag: str | None = 'Ist_StoreG'
__init__(end, addr, data, guard)[source]
addr
data
guard
end
property endness
pp_str(reg_name=None, arch=None, tyenv=None)[source]
typecheck(tyenv)[source]
tag_int = 13
pyvex.stmt.tag_to_stmt_class(tag)[source]
pyvex.stmt.enum_to_stmt_class(tag_enum)[source]
class pyvex.expr.IRExpr[source]

Bases: VEXObject

IR expressions in VEX represent operations without side effects.

tag: str | None = None
tag_int = 0
pp()[source]
property child_expressions: list[IRExpr]

A list of all of the expressions that this expression ends up evaluating.

property constants

A list of all of the constants that this expression ends up using.

result_size(tyenv)[source]
result_type(tyenv)[source]
replace_expression(replacements)[source]

Replace child expressions in-place.

Parameters:

replacements (Dict[IRExpr, IRExpr]) – A mapping from expression-to-find to expression-to-replace-with

Returns:

None

typecheck(tyenv)[source]
class pyvex.expr.Binder(binder)[source]

Bases: IRExpr

Used only in pattern matching within Vex. Should not be seen outside of Vex.

tag: str | None = 'Iex_Binder'
__init__(binder)[source]
binder
result_type(tyenv)[source]
tag_int = 0
class pyvex.expr.VECRET[source]

Bases: IRExpr

tag: str | None = 'Iex_VECRET'
result_type(tyenv)[source]
tag_int = 1
class pyvex.expr.GSPTR[source]

Bases: IRExpr

tag: str | None = 'Iex_GSPTR'
result_type(tyenv)[source]
tag_int = 2
class pyvex.expr.GetI(descr, ix, bias)[source]

Bases: IRExpr

Read a guest register at a non-fixed offset in the guest state.

tag: str | None = 'Iex_GetI'
__init__(descr, ix, bias)[source]
descr
ix
bias
property description
property index
result_type(tyenv)[source]
tag_int = 3
class pyvex.expr.RdTmp(tmp)[source]

Bases: IRExpr

Read the value held by a temporary.

tag: str | None = 'Iex_RdTmp'
__init__(tmp)[source]
property tmp
static get_instance(tmp)[source]
replace_expression(replacements)[source]

Replace child expressions in-place.

Parameters:

replacements (Dict[IRExpr, IRExpr]) – A mapping from expression-to-find to expression-to-replace-with

Returns:

None

result_type(tyenv)[source]
tag_int = 4
class pyvex.expr.Get(offset, ty, ty_int=None)[source]

Bases: IRExpr

Read a guest register, at a fixed offset in the guest state.

Parameters:
  • ty (str)

  • ty_int (int | None)

tag: str | None = 'Iex_Get'
__init__(offset, ty, ty_int=None)[source]
Parameters:
  • ty (str)

  • ty_int (int | None)

offset
ty_int
property ty
property type
pp_str_with_name(reg_name)[source]

pp_str_with_name is used to print the expression with the name of the register instead of the offset

Parameters:

reg_name (str)

result_type(tyenv)[source]
tag_int = 5
class pyvex.expr.Qop(op, args)[source]

Bases: IRExpr

A quaternary operation (4 arguments).

tag: str | None = 'Iex_Qop'
__init__(op, args)[source]
op
args
property child_expressions

A list of all of the expressions that this expression ends up evaluating.

result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 6
class pyvex.expr.Triop(op, args)[source]

Bases: IRExpr

A ternary operation (3 arguments)

tag: str | None = 'Iex_Triop'
__init__(op, args)[source]
op
args
property child_expressions

A list of all of the expressions that this expression ends up evaluating.

result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 7
class pyvex.expr.Binop(op, args, op_int=None)[source]

Bases: IRExpr

A binary operation (2 arguments).

tag: str | None = 'Iex_Binop'
__init__(op, args, op_int=None)[source]
op_int
args
property op
property child_expressions

A list of all of the expressions that this expression ends up evaluating.

result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 8
class pyvex.expr.Unop(op, args)[source]

Bases: IRExpr

A unary operation (1 argument).

tag: str | None = 'Iex_Unop'
__init__(op, args)[source]
op
args
property child_expressions

A list of all of the expressions that this expression ends up evaluating.

result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 9
class pyvex.expr.Load(end, ty, addr)[source]

Bases: IRExpr

A load from memory.

tag: str | None = 'Iex_Load'
__init__(end, ty, addr)[source]
end
ty
addr
property endness
property type
result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 10
class pyvex.expr.Const(con)[source]

Bases: IRExpr

A constant expression.

Parameters:

con (IRConst)

tag: str | None = 'Iex_Const'
__init__(con)[source]
Parameters:

con (IRConst)

property con: IRConst
static get_instance(con)[source]
result_type(tyenv)[source]
tag_int = 11
class pyvex.expr.ITE(cond, iffalse, iftrue)[source]

Bases: IRExpr

An if-then-else expression.

tag: str | None = 'Iex_ITE'
__init__(cond, iffalse, iftrue)[source]
cond
iffalse
iftrue
result_type(tyenv)[source]
typecheck(tyenv)[source]
tag_int = 12
class pyvex.expr.CCall(retty, cee, args)[source]

Bases: IRExpr

A call to a pure (no side-effects) helper C function.

tag: str | None = 'Iex_CCall'
__init__(retty, cee, args)[source]
retty
cee
args
property ret_type
property callee
property child_expressions

A list of all of the expressions that this expression ends up evaluating.

result_type(tyenv)[source]
tag_int = 13
pyvex.expr.get_op_retty(op)[source]
exception pyvex.expr.PyvexOpMatchException[source]

Bases: Exception

exception pyvex.expr.PyvexTypeErrorException[source]

Bases: Exception

pyvex.expr.int_type_for_size(size)[source]
pyvex.expr.unop_signature(op)[source]
pyvex.expr.binop_signature(op)[source]
pyvex.expr.shift_signature(op)[source]
pyvex.expr.cmp_signature(op)[source]
pyvex.expr.mull_signature(op)[source]
pyvex.expr.half_signature(op)[source]
pyvex.expr.cast_signature(op)[source]
pyvex.expr.op_arg_types(op)[source]
pyvex.expr.tag_to_expr_class(tag)[source]

Convert a tag string to the corresponding IRExpr class type.

Parameters:

tag (str) – The tag string.

Returns:

A class.

Return type:

type

pyvex.expr.enum_to_expr_class(tag_enum)[source]

Convert a tag enum to the corresponding IRExpr class.

Parameters:

tag_enum (int) – The tag enum.

Returns:

A class.

Return type:

type

class pyvex.const.IRConst[source]

Bases: VEXObject

type: str
size: int | None = None
tag: str
c_constructor = None
pp()[source]
property value: int
class pyvex.const.U1(value)[source]

Bases: IRConst

type: str = 'Ity_I1'
size: int | None = 1
tag: str = 'Ico_U1'
op_format = '1'
c_constructor = <cdata 'IRConst *(*)(unsigned char)' 0x7f3b8a02f0cb>
__init__(value)[source]
class pyvex.const.U8(value)[source]

Bases: IRConst

type: str = 'Ity_I8'
size: int | None = 8
tag: str = 'Ico_U8'
op_format = '8'
c_constructor = <cdata 'IRConst *(*)(unsigned char)' 0x7f3b8a02f14f>
__init__(value)[source]
class pyvex.const.U16(value)[source]

Bases: IRConst

type: str = 'Ity_I16'
size: int | None = 16
tag: str = 'Ico_U16'
op_format = '16'
c_constructor = <cdata 'IRConst *(*)(unsigned short)' 0x7f3b8a02f189>
__init__(value)[source]
class pyvex.const.U32(value)[source]

Bases: IRConst

Parameters:

value (int)

type: str = 'Ity_I32'
size: int | None = 32
tag: str = 'Ico_U32'
op_format = '32'
c_constructor = <cdata 'IRConst *(*)(unsigned int)' 0x7f3b8a02f1c5>
__init__(value)[source]
Parameters:

value (int)

class pyvex.const.U64(value)[source]

Bases: IRConst

type: str = 'Ity_I64'
size: int | None = 64
tag: str = 'Ico_U64'
op_format = '64'
c_constructor = <cdata 'IRConst *(*)(unsigned long long)' 0x7f3b8a02f1fc>
__init__(value)[source]
pyvex.const.vex_int_class(size)[source]
class pyvex.const.F32(value)[source]

Bases: IRConst

type: str = 'Ity_F32'
tag: str = 'Ico_F32'
op_format = 'F32'
c_constructor = <cdata 'IRConst *(*)(float)' 0x7f3b8a02f236>
__init__(value)[source]
class pyvex.const.F32i(value)[source]

Bases: IRConst

type: str = 'Ity_F32'
tag: str = 'Ico_F32i'
op_format = 'F32'
c_constructor = <cdata 'IRConst *(*)(unsigned int)' 0x7f3b8a02f273>
__init__(value)[source]
class pyvex.const.F64(value)[source]

Bases: IRConst

type: str = 'Ity_F64'
tag: str = 'Ico_F64'
op_format = 'F64'
c_constructor = <cdata 'IRConst *(*)(double)' 0x7f3b8a02f2aa>
__init__(value)[source]
class pyvex.const.F64i(value)[source]

Bases: IRConst

type: str = 'Ity_F64'
tag: str = 'Ico_F64i'
op_format = 'F64'
c_constructor = <cdata 'IRConst *(*)(unsigned long long)' 0x7f3b8a02f2e7>
__init__(value)[source]
class pyvex.const.V128(value)[source]

Bases: IRConst

type: str = 'Ity_V128'
tag: str = 'Ico_V128'
op_format = 'V128'
c_constructor = <cdata 'IRConst *(*)(unsigned short)' 0x7f3b8a02f321>
__init__(value)[source]
class pyvex.const.V256(value)[source]

Bases: IRConst

type: str = 'Ity_V256'
tag: str = 'Ico_V256'
op_format = 'V256'
c_constructor = <cdata 'IRConst *(*)(unsigned int)' 0x7f3b8a02f35d>
__init__(value)[source]
pyvex.const.is_int_ty(ty)[source]
pyvex.const.is_int_tag(tag)[source]
pyvex.const.get_tag_size(tag)[source]
pyvex.const.get_type_size(ty)[source]

Returns the size, in BITS, of a VEX type specifier e.g., Ity_I16 -> 16

Parameters:

ty

Returns:

pyvex.const.get_type_spec_size(ty)[source]

Get the width of a “type specifier” like I16U or F16 or just 16 (Yes, this really just takes the int out. If we must special-case, do it here. :param tyspec: :return:

pyvex.const.ty_to_const_class(ty)[source]
pyvex.const.tag_to_const_class(tag)[source]
class pyvex.enums.VEXObject[source]

Bases: object

The base class for Vex types.

class pyvex.enums.IRCallee(regparms, name, mcx_mask)[source]

Bases: VEXObject

Describes a helper function to call.

__init__(regparms, name, mcx_mask)[source]
regparms
name
mcx_mask
class pyvex.enums.IRRegArray(base, elemTy, nElems)[source]

Bases: VEXObject

A section of the guest state that we want te be able to index at run time, so as to be able to describe indexed or rotating register files on the guest.

Variables:
  • base (int) – The offset into the state that this array starts

  • elemTy (str) – The types of the elements in this array, as VEX enum strings

  • nElems (int) – The number of elements in this array

__init__(base, elemTy, nElems)[source]
base
elemTy
nElems
pyvex.enums.get_enum_from_int(i)[source]
pyvex.enums.get_int_from_enum(e)[source]
pyvex.enums.vex_endness_from_string(endness_str)[source]
pyvex.enums.default_vex_archinfo()[source]
Return type:

dict[str, Any]

Lifting System

pyvex.data_ref.data_ref_type_str(dref_enum)[source]

Translate an enum DataRefTypes value into a string representation.

class pyvex.data_ref.DataRef(data_addr, data_size, data_type, stmt_idx, ins_addr)[source]

Bases: object

A data reference object. Indicates a data access in an IRSB.

Variables:
  • data_addr – The address of the data being accessed

  • data_size – The size of the data being accessed, in bytes

  • data_type – The type of the data, a DataRefTypes enum.

  • stmt_idx – The IRSB statement index containing the data access

  • ins_addr – The address of the instruction performing the data access

__init__(data_addr, data_size, data_type, stmt_idx, ins_addr)[source]
data_addr
data_size
data_type
stmt_idx
ins_addr
property data_type_str

The data ref type as a string, “unknown” “integer” “fp” or “INVALID”

classmethod from_c(r)[source]
class pyvex.lifting.Lifter(arch, addr)[source]

Bases: object

Parameters:
  • arch (Arch)

  • addr (int)

REQUIRE_DATA_C = False
REQUIRE_DATA_PY = False
__init__(arch, addr)[source]
Parameters:
  • arch (Arch)

  • addr (int)

arch: Arch
addr: int
lift(data, bytes_offset=None, max_bytes=None, max_inst=None, opt_level=1, traceflags=None, allow_arch_optimizations=None, strict_block_end=None, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True, load_from_ro_regions=False, const_prop=False, disasm=False, dump_irsb=False)[source]

Wrapper around the _lift method on Lifters. Should not be overridden in child classes.

Parameters:
  • data (Union[bytes, bytearray, memoryview, None]) – The bytes to lift as either a python string of bytes or a cffi buffer object.

  • bytes_offset (Optional[int]) – The offset into data to start lifting at.

  • max_bytes (Optional[int]) – The maximum number of bytes to lift. If set to None, no byte limit is used.

  • max_inst (Optional[int]) – The maximum number of instructions to lift. If set to None, no instruction limit is used.

  • opt_level (int | float) – The level of optimization to apply to the IR, 0-2. Most likely will be ignored in any lifter other then LibVEX.

  • traceflags (Optional[int]) – The libVEX traceflags, controlling VEX debug prints. Most likely will be ignored in any lifter other than LibVEX.

  • allow_arch_optimizations (Optional[bool]) – Should the LibVEX lifter be allowed to perform lift-time preprocessing optimizations (e.g., lookback ITSTATE optimization on THUMB) Most likely will be ignored in any lifter other than LibVEX.

  • strict_block_end (Optional[bool]) – Should the LibVEX arm-thumb split block at some instructions, for example CB{N}Z.

  • skip_stmts (bool) – Should the lifter skip transferring IRStmts from C to Python.

  • collect_data_refs (bool) – Should the LibVEX lifter collect data references in C.

  • cross_insn_opt (bool) – If cross-instruction-boundary optimizations are allowed or not.

  • disasm (bool) – Should the GymratLifter generate disassembly during lifting.

  • dump_irsb (bool) – Should the GymratLifter log the lifted IRSB.

  • load_from_ro_regions (bool)

  • const_prop (bool)

data
bytes_offset
opt_level
traceflags
allow_arch_optimizations
strict_block_end
collect_data_refs
max_inst
max_bytes
skip_stmts
irsb
cross_insn_opt
load_from_ro_regions
const_prop
disasm
dump_irsb
class pyvex.lifting.Postprocessor(irsb)[source]

Bases: object

__init__(irsb)[source]
postprocess()[source]

Modify the irsb

All of the postprocessors will be used in the order that they are registered

pyvex.lifting.lift(data, addr, arch, max_bytes=None, max_inst=None, bytes_offset=0, opt_level=1, traceflags=0, strict_block_end=True, inner=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True, load_from_ro_regions=False, const_prop=False)[source]

Recursively lifts blocks using the registered lifters and postprocessors. Tries each lifter in the order in which they are registered on the data to lift.

If a lifter raises a LiftingException on the data, it is skipped. If it succeeds and returns a block with a jumpkind of Ijk_NoDecode, all of the lifters are tried on the rest of the data and if they work, their output is appended to the first block.

Parameters:
  • arch – The arch to lift the data as.

  • addr – The starting address of the block. Effects the IMarks.

  • data (Union[bytes, bytearray, memoryview, None]) – The bytes to lift as either a python string of bytes or a cffi buffer object.

  • max_bytes – The maximum number of bytes to lift. If set to None, no byte limit is used.

  • max_inst – The maximum number of instructions to lift. If set to None, no instruction limit is used.

  • bytes_offset – The offset into data to start lifting at.

  • opt_level – The level of optimization to apply to the IR, -1 through 2. -1 is the strictest unoptimized level, 0 is unoptimized but will perform some lookahead/lookbehind optimizations, 1 performs constant propogation, and 2 performs loop unrolling, which honestly doesn’t make much sense in the context of pyvex. The default is 1.

  • traceflags – The libVEX traceflags, controlling VEX debug prints.

Note

Explicitly specifying the number of instructions to lift (max_inst) may not always work exactly as expected. For example, on MIPS, it is meaningless to lift a branch or jump instruction without its delay slot. VEX attempts to Do The Right Thing by possibly decoding fewer instructions than requested. Specifically, this means that lifting a branch or jump on MIPS as a single instruction (max_inst=1) will result in an empty IRSB, and subsequent attempts to run this block will raise SimIRSBError(‘Empty IRSB passed to SimIRSB.’).

Note

If no instruction and byte limit is used, pyvex will continue lifting the block until the block ends properly or until it runs out of data to lift.

pyvex.lifting.register(lifter, arch_name)[source]

Registers a Lifter or Postprocessor to be used by pyvex. Lifters are are given priority based on the order in which they are registered. Postprocessors will be run in registration order.

Parameters:

lifter – The Lifter or Postprocessor to register

class pyvex.lifting.ZeroDivisionPostProcessor(irsb)[source]

Bases: Postprocessor

A postprocessor for adding zero-division checks to VEX.

For “div rcx”, will turn:

00 | —— IMark(0x8000, 3, 0) —— 01 | t0 = GET:I64(rcx) 02 | t1 = GET:I64(rax) 03 | t2 = GET:I64(rdx) 04 | t3 = 64HLto128(t2,t1) 05 | t4 = DivModU128to64(t3,t0) 06 | t5 = 128to64(t4) 07 | PUT(rax) = t5 08 | t6 = 128HIto64(t4) 09 | PUT(rdx) = t6 NEXT: PUT(rip) = 0x0000000000008003; Ijk_Boring

into:

00 | —— IMark(0x8000, 3, 0) —— 01 | t0 = GET:I64(rcx) 02 | t4 = GET:I64(rax) 03 | t5 = GET:I64(rdx) 04 | t3 = 64HLto128(t5,t4) 05 | t9 = CmpEQ(t0,0x0000000000000000) 06 | if (t9) { PUT(pc) = 0x8000; Ijk_SigFPE_IntDiv } 07 | t2 = DivModU128to64(t3,t0) 08 | t6 = 128to64(t2) 09 | PUT(rax) = t6 10 | t7 = 128HIto64(t2) 11 | PUT(rdx) = t7 NEXT: PUT(rip) = 0x0000000000008003; Ijk_Boring

postprocess()[source]

Modify the irsb

All of the postprocessors will be used in the order that they are registered

pyvex.lifting.lift_function.lift(data, addr, arch, max_bytes=None, max_inst=None, bytes_offset=0, opt_level=1, traceflags=0, strict_block_end=True, inner=False, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True, load_from_ro_regions=False, const_prop=False)[source]

Recursively lifts blocks using the registered lifters and postprocessors. Tries each lifter in the order in which they are registered on the data to lift.

If a lifter raises a LiftingException on the data, it is skipped. If it succeeds and returns a block with a jumpkind of Ijk_NoDecode, all of the lifters are tried on the rest of the data and if they work, their output is appended to the first block.

Parameters:
  • arch – The arch to lift the data as.

  • addr – The starting address of the block. Effects the IMarks.

  • data (Union[bytes, bytearray, memoryview, None]) – The bytes to lift as either a python string of bytes or a cffi buffer object.

  • max_bytes – The maximum number of bytes to lift. If set to None, no byte limit is used.

  • max_inst – The maximum number of instructions to lift. If set to None, no instruction limit is used.

  • bytes_offset – The offset into data to start lifting at.

  • opt_level – The level of optimization to apply to the IR, -1 through 2. -1 is the strictest unoptimized level, 0 is unoptimized but will perform some lookahead/lookbehind optimizations, 1 performs constant propogation, and 2 performs loop unrolling, which honestly doesn’t make much sense in the context of pyvex. The default is 1.

  • traceflags – The libVEX traceflags, controlling VEX debug prints.

Note

Explicitly specifying the number of instructions to lift (max_inst) may not always work exactly as expected. For example, on MIPS, it is meaningless to lift a branch or jump instruction without its delay slot. VEX attempts to Do The Right Thing by possibly decoding fewer instructions than requested. Specifically, this means that lifting a branch or jump on MIPS as a single instruction (max_inst=1) will result in an empty IRSB, and subsequent attempts to run this block will raise SimIRSBError(‘Empty IRSB passed to SimIRSB.’).

Note

If no instruction and byte limit is used, pyvex will continue lifting the block until the block ends properly or until it runs out of data to lift.

pyvex.lifting.lift_function.register(lifter, arch_name)[source]

Registers a Lifter or Postprocessor to be used by pyvex. Lifters are are given priority based on the order in which they are registered. Postprocessors will be run in registration order.

Parameters:

lifter – The Lifter or Postprocessor to register

class pyvex.lifting.libvex.VexRegisterUpdates[source]

Bases: object

VexRegUpd_INVALID = 1792
VexRegUpdSpAtMemAccess = 1793
VexRegUpdUnwindregsAtMemAccess = 1794
VexRegUpdAllregsAtMemAccess = 1795
VexRegUpdAllregsAtEachInsn = 1796
VexRegUpdLdAllregsAtEachInsn = 1797
class pyvex.lifting.libvex.LibVEXLifter(arch, addr)[source]

Bases: Lifter

Parameters:
  • arch (Arch)

  • addr (int)

REQUIRE_DATA_C = True
static get_vex_log()[source]
class pyvex.lifting.lifter.Lifter(arch, addr)[source]

Bases: object

Parameters:
  • arch (Arch)

  • addr (int)

REQUIRE_DATA_C = False
REQUIRE_DATA_PY = False
__init__(arch, addr)[source]
Parameters:
  • arch (Arch)

  • addr (int)

arch: Arch
addr: int
lift(data, bytes_offset=None, max_bytes=None, max_inst=None, opt_level=1, traceflags=None, allow_arch_optimizations=None, strict_block_end=None, skip_stmts=False, collect_data_refs=False, cross_insn_opt=True, load_from_ro_regions=False, const_prop=False, disasm=False, dump_irsb=False)[source]

Wrapper around the _lift method on Lifters. Should not be overridden in child classes.

Parameters:
  • data (Union[bytes, bytearray, memoryview, None]) – The bytes to lift as either a python string of bytes or a cffi buffer object.

  • bytes_offset (Optional[int]) – The offset into data to start lifting at.

  • max_bytes (Optional[int]) – The maximum number of bytes to lift. If set to None, no byte limit is used.

  • max_inst (Optional[int]) – The maximum number of instructions to lift. If set to None, no instruction limit is used.

  • opt_level (int | float) – The level of optimization to apply to the IR, 0-2. Most likely will be ignored in any lifter other then LibVEX.

  • traceflags (Optional[int]) – The libVEX traceflags, controlling VEX debug prints. Most likely will be ignored in any lifter other than LibVEX.

  • allow_arch_optimizations (Optional[bool]) – Should the LibVEX lifter be allowed to perform lift-time preprocessing optimizations (e.g., lookback ITSTATE optimization on THUMB) Most likely will be ignored in any lifter other than LibVEX.

  • strict_block_end (Optional[bool]) – Should the LibVEX arm-thumb split block at some instructions, for example CB{N}Z.

  • skip_stmts (bool) – Should the lifter skip transferring IRStmts from C to Python.

  • collect_data_refs (bool) – Should the LibVEX lifter collect data references in C.

  • cross_insn_opt (bool) – If cross-instruction-boundary optimizations are allowed or not.

  • disasm (bool) – Should the GymratLifter generate disassembly during lifting.

  • dump_irsb (bool) – Should the GymratLifter log the lifted IRSB.

  • load_from_ro_regions (bool)

  • const_prop (bool)

data
bytes_offset
opt_level
traceflags
allow_arch_optimizations
strict_block_end
collect_data_refs
max_inst
max_bytes
skip_stmts
irsb
cross_insn_opt
load_from_ro_regions
const_prop
disasm
dump_irsb
class pyvex.lifting.post_processor.Postprocessor(irsb)[source]

Bases: object

__init__(irsb)[source]
postprocess()[source]

Modify the irsb

All of the postprocessors will be used in the order that they are registered

class pyvex.lifting.util.Type[source]

Bases: object

ieee_float_16 = 'Ity_F16'
ieee_float_32 = 'Ity_F32'
ieee_float_64 = 'Ity_F64'
ieee_float_128 = 'Ity_F128'
decimal_float_32 = 'Ity_D32'
decimal_float_64 = 'Ity_D64'
decimal_float_128 = 'Ity_D128'
simd_vector_128 = 'Ity_V128'
simd_vector_256 = 'Ity_V256'
class pyvex.lifting.util.JumpKind[source]

Bases: object

Boring = 'Ijk_Boring'
Call = 'Ijk_Call'
Ret = 'Ijk_Ret'
Segfault = 'Ijk_SigSEGV'
Exit = 'Ijk_Exit'
Syscall = 'Ijk_Sys_syscall'
Sysenter = 'Ijk_Sys_sysenter'
Invalid = 'Ijk_INVALID'
NoDecode = 'Ijk_NoDecode'
class pyvex.lifting.util.VexValue(irsb_c, rdt, signed=False)[source]

Bases: object

Parameters:
__init__(irsb_c, rdt, signed=False)[source]
Parameters:
property value
property signed
widen_unsigned(ty)[source]
cast_to(ty, signed=False, high=False)[source]
widen_signed(ty)[source]
narrow_high(ty)[source]
narrow_low(ty)[source]
set_bit(idx, bval)[source]
set_bits(idxsandvals)[source]
ite(iftrue, iffalse)[source]
sar(right)[source]

v.sar(r) should do arithmetic shift right of v by r

:param right:VexValue value to shift by :return: VexValue - result of a shift

classmethod Constant(irsb_c, val, ty)[source]

Creates a constant as a VexValue :type irsb_c: :param irsb_c: The IRSBCustomizer to use :type val: :param val: The value, as an integer :type ty: :param ty: The type of the resulting VexValue :return: a VexValue

exception pyvex.lifting.util.ParseError[source]

Bases: Exception

class pyvex.lifting.util.Instruction(bitstrm, arch, addr)[source]

Bases: object

Base class for an Instruction.

You should make a subclass of this for each instruction you want to lift. These classes will contain the “semantics” of the instruction, that is, what it _does_, in terms of the VEX IR.

You may want to subclass this for your architecture, and add arch-specific handling for parsing, argument resolution, etc., and have instructions subclass that instead.

The core parsing functionality is done via bin_format. Each instruction should be a subclass of Instruction and will be parsed by comparing bits in the provided bitstream to symbols in the bin_format member of the class. “Bin formats” are strings of symbols, like those you’d find in an ISA document, such as “0010rrrrddddffmm” 0 or 1 specify hard-coded bits that must match for an instruction to match. Any letters specify arguments, grouped by letter, which will be parsed and provided as bitstrings in the data member of the class as a dictionary. So, in our example, the bits 0010110101101001, applied to format string 0010rrrrddddffmm will result in the following in self.data:

{‘r’: ‘1101’,

‘d’: ‘0110’, ‘f’: ‘10’, ‘m’: ‘01’}

Implement compute_result to provide the “meat” of what your instruction does. You can also implement it in your arch-specific subclass of Instruction, to handle things common to all instructions, and provide instruction implementations elsewhere.

We provide the VexValue syntax wrapper to make expressing instruction semantics easy. You first convert the bitstring arguments into VexValue``s using the provided convenience methods (``self.get/put/load/store/etc.) This loads the register from the actual registers into a temporary value we can work with. You can then write it back to a register when you’re done. For example, if you have the register in r, as above, you can make a VexValue like this:

r = int(self.data[‘r’], 2) # we get bits corresponding to r bits and convert it to an int r_vv = self.get(r, Type.int_32)

If you then had an instruction to increment r, you could simply:

return r_vv += 1

You could then write it back to the register like this:

self.put(r_vv, r)

Note that most architectures have special flags that get set differently for each instruction, make sure to implement those as well (override set_flags() )

Override parse() to extend parsing. For example, in MSP430, this allows us to grab extra words from the bitstream when extra immediate words are present.

All architectures are different enough that there’s no magic recipe for how to write a lifter. See the examples provided by gymrat for ideas of how to use this to build your own lifters quickly and easily.

irsb_c: IRSBCustomizer
__init__(bitstrm, arch, addr)[source]

Create an instance of the instruction

Parameters:
  • irsb_c – The IRSBCustomizer to put VEX instructions into

  • bitstrm – The bitstream to decode instructions from

  • addr – The address of the instruction to be lifted, used only for jumps and branches

data: dict[str, str]
abstract property bin_format: str

Read the documentation of the class to understand what a bin format string is

Returns:

str bin format string

abstract property name: str

Name of the instruction

Can be useful to name the instruction when there’s an error related to it

mark_instruction_start()[source]
fetch_operands()[source]

Get the operands out of memory or registers Return a tuple of operands for the instruction

lift(irsb_c, past_instructions, future_instructions)[source]

This is the main body of the “lifting” for the instruction. This can/should be overridden to provide the general flow of how instructions in your arch work. For example, in MSP430, this is:

  • Figure out what your operands are by parsing the addressing, and load them into temporary registers

  • Do the actual operation, and commit the result, if needed.

  • Compute the flags

Parameters:

irsb_c (IRSBCustomizer)

commit_result(res)[source]

This where the result of the operation is written to a destination. This happens only if compute_result does not return None, and happens before compute_flags is called. Override this to specify how to write out the result. The results of fetch_operands can be used to resolve various addressing modes for the write outward. A common pattern is to return a function from fetch_operands which will be called here to perform the write.

Parameters:

args – A tuple of the results of fetch_operands and compute_result

compute_result(*args)[source]

This is where the actual operation performed by your instruction, excluding the calculation of flags, should be performed. Return the VexValue of the “result” of the instruction, which may be used to calculate the flags later. For example, for a simple add, with arguments src and dst, you can simply write:

return src + dst:

Parameters:

args

Returns:

A VexValue containing the “result” of the operation.

compute_flags(*args)[source]

Most CPU architectures have “flags” that should be computed for many instructions. Override this to specify how that happens. One common pattern is to define this method to call specifi methods to update each flag, which can then be overriden in the actual classes for each instruction.

match_instruction(data, bitstrm)[source]

Override this to extend the parsing functionality. This is great for if your arch has instruction “formats” that have an opcode that has to match.

Parameters:
  • data

  • bitstrm

Returns:

data

parse(bitstrm)[source]
property bytewidth
disassemble()[source]

Return the disassembly of this instruction, as a string. Override this in subclasses.

Returns:

The address (self.addr), the instruction’s name, and a list of its operands, as strings

load(addr, ty)[source]

Load a value from memory into a VEX temporary register.

Parameters:
  • addr – The VexValue containing the addr to load from.

  • ty – The Type of the resulting data

Returns:

a VexValue

constant(val, ty)[source]

Creates a constant as a VexValue

Parameters:
  • val – The value, as an integer

  • ty – The type of the resulting VexValue

Returns:

a VexValue

get(reg, ty)[source]

Load a value from a machine register into a VEX temporary register. All values must be loaded out of registers before they can be used with operations, etc and stored back into them when the instruction is over. See Put().

Parameters:
  • reg – Register number as an integer, or register string name

  • ty – The Type to use.

Returns:

A VexValue of the gotten value.

put(val, reg)[source]

Puts a value from a VEX temporary register into a machine register. This is how the results of operations done to registers get committed to the machine’s state.

Parameters:
  • val – The VexValue to store (Want to store a constant? See Constant() first)

  • reg – The integer register number to store into, or register name

Returns:

None

put_conditional(cond, valiftrue, valiffalse, reg)[source]

Like put, except it checks a condition to decide what to put in the destination register.

Parameters:
  • cond – The VexValue representing the logical expression for the condition (if your expression only has constants, don’t use this method!)

  • valiftrue – the VexValue to put in reg if cond evals as true

  • validfalse – the VexValue to put in reg if cond evals as false

  • reg – The integer register number to store into, or register name

Returns:

None

store(val, addr)[source]

Store a VexValue in memory at the specified loaction.

Parameters:
  • val – The VexValue of the value to store

  • addr – The VexValue of the address to store into

Returns:

None

jump(condition, to_addr, jumpkind='Ijk_Boring', ip_offset=None)[source]

Jump to a specified destination, under the specified condition. Used for branches, jumps, calls, returns, etc.

Parameters:
  • condition – The VexValue representing the expression for the guard, or None for an unconditional jump

  • to_addr – The address to jump to.

  • jumpkind – The JumpKind to use. See the VEX docs for what these are; you only need them for things aren’t normal jumps (e.g., calls, interrupts, program exits, etc etc)

Returns:

None

ite(cond, t, f)[source]
ccall(ret_type, func_name, args)[source]

Creates a CCall operation. A CCall is a procedure that calculates a value at runtime, not at lift-time. You can use these for flags, unresolvable jump targets, etc. We caution you to avoid using them when at all possible though.

Parameters:
  • ret_type – The return type of the CCall

  • func_obj – The name of the helper function to call. If you’re using angr, this should be added (or monkeypatched) into angr.engines.vex.claripy.ccall.

  • args – List of arguments to the function

Returns:

A VexValue of the result.

dirty(ret_type, func_name, args)[source]

Creates a dirty call operation.

These are like ccalls (clean calls) but their implementations are theoretically allowed to read or write to or from any part of the state, making them a nightmare for static analysis to reason about. Avoid their use at all costs.

Parameters:
  • ret_type – The return type of the dirty call, or None if the dirty call doesn’t return anything.

  • func_name – The name of the helper function to call. If you’re using angr, this should be added (or monkeypatched) into angr.engines.vex.heavy.dirty.

  • args – List of arguments to the function

Return type:

VexValue

Returns:

A VexValue of the result.

class pyvex.lifting.util.GymratLifter(arch, addr)[source]

Bases: Lifter

This is a base class for lifters that use Gymrat. For most architectures, all you need to do is subclass this, and set the property “instructions” to be a list of classes that define each instruction. By default, a lifter will decode instructions by attempting to instantiate every class until one works. This will use an IRSBCustomizer, which will, if it succeeds, add the appropriate VEX instructions to a pyvex IRSB. pyvex, when lifting a block of code for this architecture, will call the method “lift”, which will produce the IRSB of the lifted code.

Parameters:
  • arch (Arch)

  • addr (int)

REQUIRE_DATA_PY = True
instrs: list[type[Instruction]]
__init__(arch, addr)[source]
bitstrm
errors
thedata
disassembly
create_bitstrm()[source]
decode()[source]
pp_disas()[source]
error()[source]
disassemble()[source]
pyvex.lifting.util.syntax_wrapper.checkparams(rhstype=None)[source]
pyvex.lifting.util.syntax_wrapper.vvifyresults(f)[source]
class pyvex.lifting.util.syntax_wrapper.VexValue(irsb_c, rdt, signed=False)[source]

Bases: object

Parameters:
__init__(irsb_c, rdt, signed=False)[source]
Parameters:
property value
property signed
widen_unsigned(ty)[source]
cast_to(ty, signed=False, high=False)[source]
widen_signed(ty)[source]
narrow_high(ty)[source]
narrow_low(ty)[source]
set_bit(idx, bval)[source]
set_bits(idxsandvals)[source]
ite(iftrue, iffalse)[source]
sar(right)[source]

v.sar(r) should do arithmetic shift right of v by r

:param right:VexValue value to shift by :return: VexValue - result of a shift

classmethod Constant(irsb_c, val, ty)[source]

Creates a constant as a VexValue :type irsb_c: :param irsb_c: The IRSBCustomizer to use :type val: :param val: The value, as an integer :type ty: :param ty: The type of the resulting VexValue :return: a VexValue

class pyvex.lifting.util.vex_helper.JumpKind[source]

Bases: object

Boring = 'Ijk_Boring'
Call = 'Ijk_Call'
Ret = 'Ijk_Ret'
Segfault = 'Ijk_SigSEGV'
Exit = 'Ijk_Exit'
Syscall = 'Ijk_Sys_syscall'
Sysenter = 'Ijk_Sys_sysenter'
Invalid = 'Ijk_INVALID'
NoDecode = 'Ijk_NoDecode'
class pyvex.lifting.util.vex_helper.TypeMeta[source]

Bases: type

typemeta_re = re.compile('int_(?P<size>\\d+)$')
class pyvex.lifting.util.vex_helper.Type[source]

Bases: object

ieee_float_16 = 'Ity_F16'
ieee_float_32 = 'Ity_F32'
ieee_float_64 = 'Ity_F64'
ieee_float_128 = 'Ity_F128'
decimal_float_32 = 'Ity_D32'
decimal_float_64 = 'Ity_D64'
decimal_float_128 = 'Ity_D128'
simd_vector_128 = 'Ity_V128'
simd_vector_256 = 'Ity_V256'
pyvex.lifting.util.vex_helper.get_op_format_from_const_ty(ty)[source]
pyvex.lifting.util.vex_helper.make_format_op_generator(fmt_string)[source]

Return a function which generates an op format (just a string of the vex instruction)

Functions by formatting the fmt_string with the types of the arguments

pyvex.lifting.util.vex_helper.mkbinop(fstring)[source]
pyvex.lifting.util.vex_helper.mkunop(fstring)[source]
pyvex.lifting.util.vex_helper.mkcmpop(fstring_fragment, signedness='')[source]
class pyvex.lifting.util.vex_helper.IRSBCustomizer(irsb)[source]

Bases: object

op_add(expr_a, expr_b)
op_sub(expr_a, expr_b)
op_umul(expr_a, expr_b)
op_smul(expr_a, expr_b)
op_sdiv(expr_a, expr_b)
op_udiv(expr_a, expr_b)
op_mod(expr_a, expr_b)
op_or(expr_a, expr_b)
op_and(expr_a, expr_b)
op_xor(expr_a, expr_b)
op_shr(expr_a, expr_b)
op_shl(expr_a, expr_b)
op_sar(expr_a, expr_b)
op_not(expr_a)
op_cmp_eq(expr_a, expr_b)
op_cmp_ne(expr_a, expr_b)
op_cmp_slt(expr_a, expr_b)
op_cmp_sle(expr_a, expr_b)
op_cmp_ult(expr_a, expr_b)
op_cmp_ule(expr_a, expr_b)
op_cmp_sge(expr_a, expr_b)
op_cmp_uge(expr_a, expr_b)
op_cmp_sgt(expr_a, expr_b)
op_cmp_ugt(expr_a, expr_b)
__init__(irsb)[source]
get_type(rdt)[source]
imark(int_addr, int_length, int_delta=0)[source]
get_reg(regname)[source]
put(expr_val, tuple_reg)[source]
store(addr, expr)[source]
noop()[source]
add_exit(guard, dst, jk, ip)[source]

Add an exit out of the middle of an IRSB. (e.g., a conditional jump) :type guard: :param guard: An expression, the exit is taken if true :type dst: :param dst: the destination of the exit (a Const) :type jk: :param jk: the JumpKind of this exit (probably Ijk_Boring) :type ip: :param ip: The address of this exit’s source

goto(addr)[source]
ret(addr)[source]
call(addr)[source]
rdreg(reg, ty)[source]
load(addr, ty)[source]
op_ccall(retty, funcstr, args)[source]
dirty(retty, funcstr, args)[source]
ite(condrdt, iftruerdt, iffalserdt)[source]
mkconst(val, ty)[source]
op_generic(Operation, op_generator)[source]
op_binary(op_format_str)[source]
op_unary(op_format_str)[source]
cast_to(rdt, tydest, signed=False, high=False)[source]
op_to_one_bit(rdt)[source]
op_narrow_int(rdt, tydest, high_half=False)[source]
op_widen_int(rdt, tydest, signed=False)[source]
op_widen_int_signed(rdt, tydest)[source]
op_widen_int_unsigned(rdt, tydest)[source]
get_msb(tmp, ty)[source]
get_bit(rdt, idx)[source]
op_extract_lsb(rdt)[source]
set_bit(rdt, idx, bval)[source]
set_bits(rdt, idxsandvals)[source]
get_rdt_width(rdt)[source]
pyvex.lifting.util.lifter_helper.is_empty(bitstrm)[source]
exception pyvex.lifting.util.lifter_helper.ParseError[source]

Bases: Exception

class pyvex.lifting.util.lifter_helper.GymratLifter(arch, addr)[source]

Bases: Lifter

This is a base class for lifters that use Gymrat. For most architectures, all you need to do is subclass this, and set the property “instructions” to be a list of classes that define each instruction. By default, a lifter will decode instructions by attempting to instantiate every class until one works. This will use an IRSBCustomizer, which will, if it succeeds, add the appropriate VEX instructions to a pyvex IRSB. pyvex, when lifting a block of code for this architecture, will call the method “lift”, which will produce the IRSB of the lifted code.

REQUIRE_DATA_PY = True
instrs: list[type[Instruction]]
__init__(arch, addr)[source]
bitstrm
errors
thedata
disassembly
create_bitstrm()[source]
decode()[source]
pp_disas()[source]
error()[source]
disassemble()[source]
class pyvex.lifting.util.instr_helper.Instruction(bitstrm, arch, addr)[source]

Bases: object

Base class for an Instruction.

You should make a subclass of this for each instruction you want to lift. These classes will contain the “semantics” of the instruction, that is, what it _does_, in terms of the VEX IR.

You may want to subclass this for your architecture, and add arch-specific handling for parsing, argument resolution, etc., and have instructions subclass that instead.

The core parsing functionality is done via bin_format. Each instruction should be a subclass of Instruction and will be parsed by comparing bits in the provided bitstream to symbols in the bin_format member of the class. “Bin formats” are strings of symbols, like those you’d find in an ISA document, such as “0010rrrrddddffmm” 0 or 1 specify hard-coded bits that must match for an instruction to match. Any letters specify arguments, grouped by letter, which will be parsed and provided as bitstrings in the data member of the class as a dictionary. So, in our example, the bits 0010110101101001, applied to format string 0010rrrrddddffmm will result in the following in self.data:

{‘r’: ‘1101’,

‘d’: ‘0110’, ‘f’: ‘10’, ‘m’: ‘01’}

Implement compute_result to provide the “meat” of what your instruction does. You can also implement it in your arch-specific subclass of Instruction, to handle things common to all instructions, and provide instruction implementations elsewhere.

We provide the VexValue syntax wrapper to make expressing instruction semantics easy. You first convert the bitstring arguments into VexValue``s using the provided convenience methods (``self.get/put/load/store/etc.) This loads the register from the actual registers into a temporary value we can work with. You can then write it back to a register when you’re done. For example, if you have the register in r, as above, you can make a VexValue like this:

r = int(self.data[‘r’], 2) # we get bits corresponding to r bits and convert it to an int r_vv = self.get(r, Type.int_32)

If you then had an instruction to increment r, you could simply:

return r_vv += 1

You could then write it back to the register like this:

self.put(r_vv, r)

Note that most architectures have special flags that get set differently for each instruction, make sure to implement those as well (override set_flags() )

Override parse() to extend parsing. For example, in MSP430, this allows us to grab extra words from the bitstream when extra immediate words are present.

All architectures are different enough that there’s no magic recipe for how to write a lifter. See the examples provided by gymrat for ideas of how to use this to build your own lifters quickly and easily.

irsb_c: IRSBCustomizer
__init__(bitstrm, arch, addr)[source]

Create an instance of the instruction

Parameters:
  • irsb_c – The IRSBCustomizer to put VEX instructions into

  • bitstrm – The bitstream to decode instructions from

  • addr – The address of the instruction to be lifted, used only for jumps and branches

data: dict[str, str]
abstract property bin_format: str

Read the documentation of the class to understand what a bin format string is

Returns:

str bin format string

abstract property name: str

Name of the instruction

Can be useful to name the instruction when there’s an error related to it

mark_instruction_start()[source]
fetch_operands()[source]

Get the operands out of memory or registers Return a tuple of operands for the instruction

lift(irsb_c, past_instructions, future_instructions)[source]

This is the main body of the “lifting” for the instruction. This can/should be overridden to provide the general flow of how instructions in your arch work. For example, in MSP430, this is:

  • Figure out what your operands are by parsing the addressing, and load them into temporary registers

  • Do the actual operation, and commit the result, if needed.

  • Compute the flags

Parameters:

irsb_c (IRSBCustomizer)

commit_result(res)[source]

This where the result of the operation is written to a destination. This happens only if compute_result does not return None, and happens before compute_flags is called. Override this to specify how to write out the result. The results of fetch_operands can be used to resolve various addressing modes for the write outward. A common pattern is to return a function from fetch_operands which will be called here to perform the write.

Parameters:

args – A tuple of the results of fetch_operands and compute_result

compute_result(*args)[source]

This is where the actual operation performed by your instruction, excluding the calculation of flags, should be performed. Return the VexValue of the “result” of the instruction, which may be used to calculate the flags later. For example, for a simple add, with arguments src and dst, you can simply write:

return src + dst:

Parameters:

args

Returns:

A VexValue containing the “result” of the operation.

compute_flags(*args)[source]

Most CPU architectures have “flags” that should be computed for many instructions. Override this to specify how that happens. One common pattern is to define this method to call specifi methods to update each flag, which can then be overriden in the actual classes for each instruction.

match_instruction(data, bitstrm)[source]

Override this to extend the parsing functionality. This is great for if your arch has instruction “formats” that have an opcode that has to match.

Parameters:
  • data

  • bitstrm

Returns:

data

parse(bitstrm)[source]
property bytewidth
disassemble()[source]

Return the disassembly of this instruction, as a string. Override this in subclasses.

Returns:

The address (self.addr), the instruction’s name, and a list of its operands, as strings

load(addr, ty)[source]

Load a value from memory into a VEX temporary register.

Parameters:
  • addr – The VexValue containing the addr to load from.

  • ty – The Type of the resulting data

Returns:

a VexValue

constant(val, ty)[source]

Creates a constant as a VexValue

Parameters:
  • val – The value, as an integer

  • ty – The type of the resulting VexValue

Returns:

a VexValue

get(reg, ty)[source]

Load a value from a machine register into a VEX temporary register. All values must be loaded out of registers before they can be used with operations, etc and stored back into them when the instruction is over. See Put().

Parameters:
  • reg – Register number as an integer, or register string name

  • ty – The Type to use.

Returns:

A VexValue of the gotten value.

put(val, reg)[source]

Puts a value from a VEX temporary register into a machine register. This is how the results of operations done to registers get committed to the machine’s state.

Parameters:
  • val – The VexValue to store (Want to store a constant? See Constant() first)

  • reg – The integer register number to store into, or register name

Returns:

None

put_conditional(cond, valiftrue, valiffalse, reg)[source]

Like put, except it checks a condition to decide what to put in the destination register.

Parameters:
  • cond – The VexValue representing the logical expression for the condition (if your expression only has constants, don’t use this method!)

  • valiftrue – the VexValue to put in reg if cond evals as true

  • validfalse – the VexValue to put in reg if cond evals as false

  • reg – The integer register number to store into, or register name

Returns:

None

store(val, addr)[source]

Store a VexValue in memory at the specified loaction.

Parameters:
  • val – The VexValue of the value to store

  • addr – The VexValue of the address to store into

Returns:

None

jump(condition, to_addr, jumpkind='Ijk_Boring', ip_offset=None)[source]

Jump to a specified destination, under the specified condition. Used for branches, jumps, calls, returns, etc.

Parameters:
  • condition – The VexValue representing the expression for the guard, or None for an unconditional jump

  • to_addr – The address to jump to.

  • jumpkind – The JumpKind to use. See the VEX docs for what these are; you only need them for things aren’t normal jumps (e.g., calls, interrupts, program exits, etc etc)

Returns:

None

ite(cond, t, f)[source]
ccall(ret_type, func_name, args)[source]

Creates a CCall operation. A CCall is a procedure that calculates a value at runtime, not at lift-time. You can use these for flags, unresolvable jump targets, etc. We caution you to avoid using them when at all possible though.

Parameters:
  • ret_type – The return type of the CCall

  • func_obj – The name of the helper function to call. If you’re using angr, this should be added (or monkeypatched) into angr.engines.vex.claripy.ccall.

  • args – List of arguments to the function

Returns:

A VexValue of the result.

dirty(ret_type, func_name, args)[source]

Creates a dirty call operation.

These are like ccalls (clean calls) but their implementations are theoretically allowed to read or write to or from any part of the state, making them a nightmare for static analysis to reason about. Avoid their use at all costs.

Parameters:
  • ret_type – The return type of the dirty call, or None if the dirty call doesn’t return anything.

  • func_name – The name of the helper function to call. If you’re using angr, this should be added (or monkeypatched) into angr.engines.vex.heavy.dirty.

  • args – List of arguments to the function

Return type:

VexValue

Returns:

A VexValue of the result.

Builtin IR Processors

class pyvex.lifting.zerodivision.ZeroDivisionPostProcessor(irsb)[source]

Bases: Postprocessor

A postprocessor for adding zero-division checks to VEX.

For “div rcx”, will turn:

00 | —— IMark(0x8000, 3, 0) —— 01 | t0 = GET:I64(rcx) 02 | t1 = GET:I64(rax) 03 | t2 = GET:I64(rdx) 04 | t3 = 64HLto128(t2,t1) 05 | t4 = DivModU128to64(t3,t0) 06 | t5 = 128to64(t4) 07 | PUT(rax) = t5 08 | t6 = 128HIto64(t4) 09 | PUT(rdx) = t6 NEXT: PUT(rip) = 0x0000000000008003; Ijk_Boring

into:

00 | —— IMark(0x8000, 3, 0) —— 01 | t0 = GET:I64(rcx) 02 | t4 = GET:I64(rax) 03 | t5 = GET:I64(rdx) 04 | t3 = 64HLto128(t5,t4) 05 | t9 = CmpEQ(t0,0x0000000000000000) 06 | if (t9) { PUT(pc) = 0x8000; Ijk_SigFPE_IntDiv } 07 | t2 = DivModU128to64(t3,t0) 08 | t6 = 128to64(t2) 09 | PUT(rax) = t6 10 | t7 = 128HIto64(t2) 11 | PUT(rdx) = t7 NEXT: PUT(rip) = 0x0000000000008003; Ijk_Boring

postprocess()[source]

Modify the irsb

All of the postprocessors will be used in the order that they are registered

Errors

exception pyvex.errors.PyVEXError[source]

Bases: Exception

exception pyvex.errors.SkipStatementsError[source]

Bases: PyVEXError

exception pyvex.errors.LiftingException[source]

Bases: Exception

exception pyvex.errors.NeedStatementsNotification[source]

Bases: LiftingException

A post-processor may raise a NeedStatementsNotification if it needs to work with statements, but the current IRSB is generated without any statement available (skip_stmts=True). The lifter will re-lift the current block with skip_stmts=False upon catching a NeedStatementsNotification, and re-run the post-processors.

It’s worth noting that if a post-processor always raises this notification for every basic block without statements, it will essentially disable the skipping statement optimization, and it is bad for performance (especially for CFGFast, which heavily relies on this optimization). Post-processor authors are encouraged to at least filter the IRSBs based on available properties (jumpkind, next, etc.). If a post-processor must work with statements for the majority of IRSBs, the author should implement it in PyVEX in C for the sake of a better performance.

Utilities

pyvex.utils.stable_hash(t)[source]
Return type:

int

Parameters:

t (tuple)