radare2/doc/types.md

6.0 KiB

Types profiles

Type matching algorithms needs help of compiled types profiles to work properly, types profiles are important because they hold information about both data types and functions for imported libraries. At time of writing this doc, tcc doesn't parse C files into sdb format correctly, so one will have to do all the parsing manually. What will be described in this document is how to create sdbs for types profiles, where to place them, and lastly naming conventions for integrating them with r2 source.

Available Constructs

At the current time the following C constructs are supported:

  • primitive types
  • Structs
  • Unions
  • functions prototypes

Primitive types

Defining primitive types requires understanding of basic pf formats, you can find the whole list of format specifier in pf??:

-----------------------------------------------------------------
|  format specifier  | explanation                              |
|---------------------------------------------------------------|
|         b          |  byte (unsigned)                         |
|         c          |  char (signed byte)                      |
|         d          |  0x%%08x hexadecimal value (4 bytes)     |
|         f          |  float value (4 bytes)                   |
|         i          |  %%i integer value (4 bytes)             |
|         o          |  0x%%08o octal value (4 byte)            |
|         p          |  pointer reference (2, 4 or 8 bytes)     |
|         q          |  quadword (8 bytes)                      |
|         s          |  32bit pointer to string (4 bytes)       |
|         S          |  64bit pointer to string (8 bytes)       |
|         t          |  UNIX timestamp (4 bytes)                |
|         T          |  show Ten first bytes of buffer          |
|         u          |  uleb128 (variable length)               |
|         w          |  word (2 bytes unsigned short in hex)    |
|         x          |  0x%%08x hex value and flag (fd @ addr)  |
|         X          |  show formatted hexpairs                 |
|         z          |  \0 terminated string                    |
|         Z          |  \0 terminated wide string               |
-----------------------------------------------------------------

there are basically 3 mandatory keys for defining Primitive data types: X=type type.X=format_specifier type.X.size=size_in_bits For example, lets define UNIT, according to Microsoft documentation UINT is just equivalent of standard C unsigned int It will be defined as:

UINT=type
type.UINT=d
type.UINT.size=32

Now Their is forth entry that is optional:

X.type.pointto=Y

This one may only be used in case of pointer type.X=p, one good example is LPFILETIME definition, it is pointer to _FILETIME which happens to be a struct. Assuming that we are targeting only 32bit windows machine, it will be defined as the following:

LPFILETIME=type
type.LPFILETIME=p
type.LPFILETIME.size=32
type.LPFILETIME.pointto=_FILETIME

that last field is not mandatory because some times the data structure internals will be property, and we will not have a clean representation for it.

Structures

Those are the basic keys for structs (with just two elements):

X=struct
struct.X=a,b
struct.X.a=a_type,a_offset,a_number_of_elements
struct.X.b=b_type,b_offset,b_number_of_elements

The first line is used to define a structure called X, second line defines the elements of X as comma separated values. After that we just define each element info.

for example we can have struct like this one:

struct _FILETIME {
	DWORD dwLowDateTime;
	DWORD dwHighDateTime;
}

assuming we have DWORD defined, the struct will look like this

 _FILETIME=struct
struct._FILETIME=dwLowDateTime,dwHighDateTime
struct._FILETIME.dwLowDateTime=DWORD,0,0
struct._FILETIME.dwHighDateTime=DWORD,4,0

Note that the number of elements filed is used in case of arrays only to identify how many elements are in arrays, other than that it is zero by default.

Unions

Unions are defined exactly like structs the only difference is that you will replace the word struct with the word union.

Function prototypes

Function prototypes representation is the most detail oriented and the most important one one of them all. Actually this is the one used directly for type matching

X=func
func.X.args=NumberOfArgs
func.x.arg0=Arg_type,arg_name
.
.
.
func.X.ret=Return_type
func.X.cc=calling_convention

It should be self explanatory lets do strncasecmp as an example for x86 arch for linux machines According to man pages, strncasecmp is defined as the following:

int strcasecmp(const char *s1, const char *s2);

when converting it into its sdb representation it will looks like the following:

strcasecmp=func
func.strcasecmp.args=3
func.strcasecmp.arg0=char *,s1
func.strcasecmp.arg1=char *,s2
func.strcasecmp.arg2=size_t,n
func.strcasecmp.ret=int
func.strcasecmp.cc=cdecl

Note that the .cc part is optional and if it didn't exist the default calling convention for your target architecture will be used instead. Their is one extra optional key

func.x.noreturn=true/false

This key is used to mark functions that will not return once called like exit and _exit.

Integrating with r2 source

in order to add definitions to r2 source there is very flexible naming convention. First the file should be located in path/to/r2/libr/anal/d. Then you should add an entry for it in Makefile that exist at the same directory. Make sure that the name follow this convention:

types[-arch][-OS][-bits]

All parts in square brackets are optional, but order is important, they are there to help you to create fine granularity type profiles. One extra note, It is not a must that all keys/value pairs for the one data types exist in the same file for example general windows datatypes exists in types-windows while only size of pointers are in types-x86-windows-32 and types-x86-windows-64.