Introduction to the ELF Format (Part VII): Dynamic Linking / Loading and the .dynamic section

This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:
  1. (Part I) : ELF Header
  2.  (Part II) : Program Headers 
  3. (Part III) : Section Header Table 
and many more!

So in this one I'm going to talk a little bit about how dynamic linking works. I'll unpack some useful things to know about how functions are executed when dynamic linking/loading is in effect.

Overview of dynamic linking

As you would imagine; there are some ingredients to the dynamic linking magic, namely the procedure linkage table,the global offset table and the .dynamic section. I'm going to layout some basic GOT and PLT theory, and then later on in the post I'll back up all this wonder full theory with some disassembled code and gdb screen dumps! So anyway, getting back into it...

The Procedure Linkage Table (PLT) (its actually more like a list of code stubs) is a rough landing area for function calls to hit as a first stop in their dynamic linking journey. The PLT either branches directly to the function definition it needs (by referencing the relevant entry in the Global Offset Table) or sets up a call to the run time to sort it out (along with some other parameters we will see later on!).  A better name would be something like a "Procedure linkage function chain" because its actually just a contiguous region of code with a little run time invoking stub at its "head".

The Global Offset Table (GOT) holds values that are meant to point directly to the intended definition - its essentially the "final destination" of a function call. As mentioned above this table is used as kind of a de-coupled reference table for the PLT. This is amazing for exploit deve-uh I mean compiler extension development; because it means if you can achieve simple address wide overwrites you can do a lot by targeting the GOT, in terms of possessing execution flow.

The runtime's end goal is replacing the GOT entry for the called function with its correct value. The PLT entries that trigger when a function calls; preps some arguments the runtime needs to resolve the particular GOT entry. These arguments include the link_map for the given object and its index in the dynamic symbol table.

So we're going to look at how each of these data-structures work and show simply where you can replace values to subvert execution flow (depending on how you achieve the write of course).

ELF Link Maps 

As much as I wish this was literally a map of elf's named link, (breath of the wild reaccs only); link_maps are essentially small data structures that hold a couple pointers to some meta-data needed for completing some dynamic linking action. They are essentially shuffled around the internals of the runtime and dynamic linker; and other shared object handling things. link_map structs are passed directly to the function that invokes the dynamic linking action _dl_runtime_resolve_* (there are some caveats to this depending on os and arch I believe). So they are actually more like little maps that link in the ELF symbol gods. Anyway here's what they look like:
extract from elf/link.h:

The fields are pretty much documented well,  as far as I can see they really do behave as described.
We can though confirm some of these details through some light data collection and debugging. Here's a demonstration of how the l_next and l_prev field's work:

So essentially each link_map ends its record with these values, they contain address for finding the next element in the list and the previous. Don't see anything just yet but; I'm looking out for things that make use of the l_next and l_prev elements in a turing completey way ;) 

There is one other field would like to expand on here namely the l_ld, this is the reference to the .dynamic section entry for this function. And as you guessed it means we will probably need to talk about how the .dynamic section works.

The .dynamic section

The dynamic section essentially holds a number of arguments that inform on and influence parts of the dynamic linker's behavior.  This is because as a component of the runtime, the dynamic linker does many other things besides just relocate functions it also executes other house keeping functions like INIT and FINI. Here's what the entries of the dynamic section look like according to glibc:
extract from elf/elf.h:

This is simply a list of two address values, one for indicating the type of dynamic section entry (d_tag) and one for the actual value of the entry (d_un). We have some strange union type here because it allows arbitrary information instead of just addresses. Take a look at this hexdump example to see how the value's can vary for the d_un field:

Okay so that's the link_map and .dynamic section done we can  move onto looking at what happens when a function is resolved and how this affects the GOT.

Runtime lazy loading up close

To get functions resolved without preparing all the relocations up front, the ELF format and dynamic linker use a mechanism called lazy loading. Lazy loading essentially means resolving and patching up the GOT entries for a function when it is called. This is obviously so that subsequent function calls do not need to involve the dynamic linker / runtime (in a previous post i showed explicitly how the dynamic linker kicks in again if you mess with some other meta-data).

Okay so lets see if all this cool theory is true in practice. How are we going to see what the runtime does with the GOT? Well to lay out a simple methodology:
  1. Find a pointer to the top of the PLT (I will also cover some structuring of the PLT to show you where the "top" is)
  2. Once we have the PLT we can then find two things 1) the GOT entry for the function being called and 2) a break point to set before the GOT is edited (namely the entry point of the runtime)
  3. Set a break point to a function 
  4. Compare the GOT values before and after. 

First step is to find a pointer to the top of the PLT, lets take a look at an annotated dump of a binary's _start and PLT sections (I disassembled _start because in order to call _start_main it needs to involve the PLT as well):

So we can see from the picture that at instruction 0x400534 a call to the PLT entry of __libc_start_main is made. This then ends up doing a couple things:
  1. 0x4004e0 jumping to 0x601030 the GOT entry for __libc_start_main. This is because when the linker is does lazy loading; the first instruction will hit the function directly if the GOT has been patched but upon first call this is always the next instruction after the jump - so its effectively a jump to the next position in the PLT. 
  2. 0x4004e6 pushing a number onto the stack - this is the index of the relocation entry that applies to this action, the dynamic linker needs this to do its job.
  3. 0x4004e6 jumping to the head of the PLT which invokes the dynamic linker directly.
Okay lets see what the PLT looks like in its full glory:

And so we can see a format for the PLT forming, namely every entry has these base elements:
  • jump to the GOT
  • push reloc index
  • jump to PLT head (_dl_resolve_runtime*)
The head contains some interesting code. We can see at instruction 0x4004a0 some value gets pushed onto the stack before it jump's off to the dl_runtime_resolve at instruction 0x4004a6. Whats happening here is the link_map for the object (, libsecurity etc etc) that holds the symbol involved in the lazy loading is being passed to the dl_runtime_resolve function as an argument.

We can dissect this link_map through different calls to the dl_runtime_resolve to see that it is actually always the link_map object. Knowing that the link_map must contain a pointer into the dynamic section; so if we see dynamic section approximating values in the area round the pointer being passed to dl_runtime_resolve it is most likely a link_map object. Or I should rather say: if it appears there whatever it is - dl_runtime_resolve will treat it like a link_map object.

So lets see what these values look like as they are flying into the resolve call:

I can also show that the GOT in fact does get patched with new values as the runtime gets called. Here's a screenshot showing this for the puts resolution:

After the second break point at 0x4004a0 hits (which is the setup code for the call to dl_runtime_resolve) we can clearly see some new entry in the GOT at address 0x601020; the update adds the address 0x7ffff7a7c690 which we can see from symbol information in the debugger is the _IO_puts function! GOT entry correctly updated.

Okay that's pretty much it for this post. In later posts I may talk a little about how to abuse this lazy loading mechanism to achieve execution of other functions - some cool tricks. For now I thought I'd keep it short and only explain some main concepts here and leave the advance sorcery and ELF black magic for future posts. Stay tuned folks!

References and Reading

Some stuff I read and relied on to make this post. Very useful information here!