Introduction to the ELF Format (Part V) : Understanding C start up .init_array and .fini_array sections



This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:

  1. (Part I) : ELF Header  https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html
  2.  (Part II) : Program Headers  https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html 
  3. (Part III) : Section Header Table  https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html 
  4. (Part IV) : Section Types and Special Sections https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html
  5. this

In this post I'm going to cover how some of the aspects of C start up and mess around with the .init_array and .fini_array sections to show how they work.

C Start Up


So something must happen to get your code in the main function running. This process is called the C start up and it essentially involves running all the initialize code, setting up pointers to some important arrays and then branching over to main.

What the _start method needs to do essentially is perform a function call to __libc_start_main which is the function that will actually call main().

Now if you haven't guessed, this means we need a pointer to the main function as an argument to __libc_start_main. It has a couple other parameters here they are:

LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec,
#endif
__typeof (main) init,
void (*fini) (void),
void (*rtld_fini) (void), void *stack_end)

update: I realized that the original version of the post had the wrong function header for start_main, I grabbed this one straight from glibc (https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L129)
for an alternative explanation of this check out - http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html (sorry no https :( . . . <-- those are my tears for your unborn TLS packets *sniff snff* lol.

So what we have here is:
  • int (*main) - no guessing here this is a pointer to the main method in the binary.
    • int argc -  the number of arguments passed to the binary from the command line, including the binary's name (we will show this later). 
    • char **argv the array holding the actual strings its important to remember some terms here, argv is passed to the _start function via the stack pointers essentially.
  • __typeof (main) init  - This is a pointer to the function (__libc_csu_init) that handles calling the initializer or constructor functions. I'm going to call this a constructor  function "call handler"[see footnote 1]. 
  • void (*fini) (void) - this is the analogous function pointer to the one that handles calling destructor functions.
  • void (*rtld_fini) (void) - The destructor function call handler for the dynamic linker, this value is passed to _start  via edx from the loader (we will see this being used soon). -  I won't get into how the destructor function call handler here works too much, its really a little off track for this discussion but when I cover dynamic linking I'll expand on it more ;) 
  • void *stack_end end of stack marker.
Just to re-iterate all of these wonderful things must be prepared by _start for the call to  __libc_start_main, and we also know that rtld_fini is passed to _start via edx.

Beyond that _start is loaded with a very helpful stack layout that makes locating the argv and argc easy to find. Lets how this is done in a real world example.

Reverse Engineering glibc _start

Here's what start looks like for one of my binaries during execution:

To clarify what is happening in the figure above. I am here setting a break point to the _start function. I'm highlighting the instruction that was just executed (note the arrow pointing at 0x400455 <+5>, this means gdb is currently sitting on that instruction).

Digging into the assembler here the first instruction is essentially to clear out ebp. After this it passes the pointer to rtld_fini from rdx to r9; this is actually prepping it already for its cozy position for the important __libc_start_main call. It also saves the value from being destroyed when rdx is used later on.

What the screen shot above also confirms is that the rdx register does indeed contain a pointer to the dl_fini function; this is shown in the x/64ib $rdx instruction which says: "read 64 instruction bytes from the address stored at rdx" (if you're not super clued up on how gdb's memory examining function x/ works feel free to git guuuuuud by reading through this documentation) . You can of course do this equivalently on r9 it will no doubt at this point in execution show the same value - I'm just picking rdx coz I'm used to dealing with it more. Before we dig into this dl_fini function[see footnote 1] lets look at the rest of the instructions in the _start code.


The next instruction at 0x400455 <+5> is a pop into rsi which contains a pointer to the argc.How do we know this? Well we know that this part of the stack contains a pointer to argc because when the program enters for the first time and _start gets called (under the ABI I am running - your's might differ) the stack essentially contains argc, argv and envp we can see this in the following screen dump:

 

So  from this figure we can see the arguments being passed to the binary is "1 2 3 4 5".  We can also see that the first entry in argv is the name of the binary itself, this means the length of argv should be 6, as is shown at the first address on the stack at 0x7fffffffddb0.  Next argument on the stack is the start of the actual argv array, and after that we have a null terminator and the start of the envp array.

Back to the _start method. After first pop off of the stack; the top of the stack holds a pointer to argv and at instruction <_start+6> we save that to rdx. After this at the <_start+9> instruction we use a bit mask to clear a few bits from the stack value to ensure its aligned properly and then proceed to prep it for the call to __libc_start_main (the reason this is done is essentially to ensure that we are increment the stack and accessing it in neat chunks - it also makes all the tools dump nice groupings of stack information).

Once the stack is aligned it pushes rax onto the stack according to some stuff I've read on this says this is purely to preserve memory alignment boundaries as well, and that this value in rax isn't used and doesn't mean anything.

I've dumped the register values when the call to _libc_start_main happens  just to check out what is actually being passed to it:




We are clearly using the SystemV ABI for x86_64 calling convention here. This is since instead of pushing all parameters onto the stack in a given order, we do the following:

Parameters to functions are passed in the registers rdi, rsi, rdx, rcx, r8, r9, and further values are passed on the stack in reverse order.


And as we see the registers contain the following:
  • rdi - pointer to first instruction in int (*main) function
  • rsi - argc value
  • rdx - argv pointer
  • rcx - pointer to first instruction in libc_csu_init - the program's constructor call handler again . 
  • r8 - pointer to __libc_csu_fini  
  • r9 - pointer to rtld_init the mysterious dymanic linker desctuctor call handler. 
And in case you don't believe me check out this dank documentation in the glibc libary confirming that we reverse engineered this correctly (or that it actually works as the code intends) - coming through strong with the documentation once again [see footnote 1] The following extract is from


There are some other interesting details to what __libc_start_main does after this, some of which involves deep Elf sorcery like reading past the value of argv to find envp.  There are wonderful articles on this on the internet and the code for __libc_start_main is also available. I take it you folks would enjoy the exercise of confirming it works as described.

To summarize __libc_start_main, and bring the .init_array and .fini_array in to context. Essentially what start_main does is stuff like:


Some other cool stuff and of course eventually makes the call to (*init) which in the context of start_main, means __libc_csu_init. This is the function that as we see in the footnotes actually makes the call to the init functions we define. Here's confirmation of that call chain from gdb:



foo_constructor is obviously our constructor and we can see it indeed does get call first from __libc_csu_init. These constructors are saved in the sections marked .init_array and the analogous array for deconstructors is called .fini_array. Next section covers how they work.

.init_array and .fini_array Sections and hex sorcery

I'd like to get straight into deconstructing how the .init_array and .fini_array sections work. Lets see what they look like in the section header table and annotate all their fields in an honest hexdump:




What we can see here is that the .init_array section points into the ELF file at 0x0e00, which holds two addresses:
  • 0x0e00 (.init_array)
    • 0x400540 (frame_dummy) - not going to dig into this too much, but what I glean about this for now is that this sets up things to be able to do exception handling and reconstructing stack frames to aid debugging and stack forensics. More on this here and here.
    • 0x400440 (foo_constructor) - our constructor!
We also have the .fini_array section at 0x0e10 which is holds these entries:
  • 0x0e10 (.fini_array)
    • 0x400520 (__do_global_dtors_aux) - handles destructors when .fini_array is not defined according to this.
    •  0x400430 (foo_destructor) - our destructor!
So we know where to find the pointers to our desctructor and constructor functions and we know when they will be called, lets see if we can force the binary to call another function instead.

So if I were to make the .init_array point to the function never_call, which as in the previous example is never called under normal execution here's what the hexdump would look, like:




Win! We can control the flow of execution by redirecting the entries in the .init_array section! This works of course the same way for fini_array I'm going to leave that for you folks to figure out if you'd like to.

Thanks for reading this one, more posts on deep Elf sorcery and other wonderful linuxy things comings soon!

References and Recommended Reading

  1. How C Programs get run https://lwn.net/Articles/631631/ 
  2. System V intel https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf 
  3. System V ABI https://wiki.osdev.org/System_V_ABI 
  4. Examining Memory with GDB https://sourceware.org/gdb/onlinedocs/gdb/Memory.html 
  5. https://stackoverflow.com/questions/34966097/what-functions-does-gcc-add-to-the-linux-elf 
  6. http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html 

Footnotes

1 -  why _dl_fini should be refered to as the desctructor "function call handler" in my opinion


This is since, though some folks refer to this as THE [de/con]-structor function,  in reality it is only the standardized function that finds the pointers TO the user defined [de/con]-structor functions. Here's why I say so, extract from https://github.com/lattera/glibc/blob/master/elf/dl-fini.c#L137


What can I say folks that glibc comment game is solid though. Code speaks for its Elf around here ;)
So I take it, this makes it obvious that the pointer to the dl_fini function can actually be refereed to as more of a destructor "call handler", no? To close my point lets look at dl_init.c for the definition of __dl_init as well:


Pretty much the same thing, it uses some link map type object (  ElfW(Dyn) *preinit_array = main_map->l_info[DT_PREINIT_ARRAY];  )  loaded with the offsets and all the ELF format goodies. Uses this to calculate an offset to the init function array, and then just runs through them calling them with pointers to argv, argc and envp.

Anyway, while make that heavily egotistical point we actually traversed some pretty important code in the Elf world, this is the very definition of the _dl_fini function that handles your binary. If you wanna unlock the s e c r e t s  you should spend some time digging through that /elf/ directory.


Comments