Introduction to the ELF Format Part II : Understanding Program Headers


Welcome back folks! In the previous post I covered pretty much the most trivial parts of the ELF file format. In this post we are actually going to work with one of the most interesting mechanisms in the file - the program headers!  I skipped some parts of the ELF header in the previous post and decided to cover them here specifically because they inform on the Program Headers anyway. Lets get started!

Introduction : What are Program Headers?


I mentioned in part 1 that the ELF format performs two tasks. A recipe for how to sublimate dead files into living processes and adds the bells and whistles needed to make the file look pretty to gdb, the dynamic loader and a bunch of other tools. Program Headers (among other functions) are more often for telling the memory loader where to put stuff. It also has some house keeping functions.

We'll get into how these memory loading powers and formats work a little later for now its just important to keep in mind a good idea of what to expect in terms of the purpose of these fields.

ELF Header continued

The ELF header covered in the previous post holds some fields specific to the program headers these are the:
  • e_phoff - indicates the offset in the file where the start of the program headers (technically speaking this "needs" to always point to a PHDR section but that's not entirely true - stay tuned!
  • e_phentsize - indicates the byte size of program header entries
  • e_phnum - indicates the number of program header entries
One can imagine that the way these functions are used is probably to help logically limit traversal of the headers.
Lets take a look at what program headers look like in some raw hex:



I had to block out part of my terminal when I made this because i sometimes run a .bashrc that displays some network stuff in my terminal prompt. 

If you want to check out the program headers for an elf file these are the magic commands you need:

readelf -l ./compile.elf 

As a fun experiment we  can play with the e_phoff field to make the program skip some of the program headers. Right now the program headers are shown to start at 0x40 which is 64 bytes into the file - usually they will start there right after the ELF header, but there's no strict reason they need to! Lets see what happens if we shift the e_phoff address down one program header.

So the first program header appears at 0x40, the next one (The INTERP section) at 0x78, which is exactly 0x38 = 56 bytes down from the start of the program headers; as indicated by the e_phentsize field in the ELF header.

Editing the raw binary so that e_phoff points to 0x78 results in this readelf output:



You might wonder if this ELF without its PHDR program header still runs? YES! No one cares about your PHDR program header!

There are a number of types of program headers. Each of them with a different purpose:
  1. 0x00000006 PHDR - Indicates the beginning of the program header table itself. This section according to documentation requires a loadable segment entry, but here we see that it being proceeded with PT_INTERP means this is not true! More than that its not even needed for the ELF to run (according to the sample I'm using here! Of course you may be running on a system or architecture that actually takes this field seriously).
  2. 0x00000003 PT_INTERP - this section indicates the program path name that will be invoked as the interpreter of the ELF should it be an executable. It of course will be ignored if the ELF is not executable. You can try pointing this to other programs to see what happens :)
  3. 0x00000001 PT_LOAD - the most important program header type. Defines how a portion of the file that must be placed in memory. This leverages the other attributes of the program header and changes their meaning slightly because they appear in this context (see below how p_vaddr, p_paddr etc are explained in in the context of PT_LOAD)
The PT_INTERP is a little strange in that it points to an offset in the file especially for holding a string is the file path of the program meant to interpret the file (this is why ours points to ld-linux the "loader dynamic" ).

Here's what this actually looks like in the raw hex:



There are a number of other program header types, I've only expanded on a couple of the most important ones for this post. Its best to check out the documentation if you want to grasp the full p_type range of values.

Other than this, the program header format has a few more attributes, these are important to understand if you're going to pull off the PT_LOAD wizardy later on in the post.

  • p_offset - the offset into the ELF file where this segments content is defined later on we will point this value to different places. 
  • p_vaddr - the virtual address that this segment will be mapped to, should it be mapped into memory (again this only really applies to PT_LOAD type headers)
  • p_paddr - the physical address the segment will be mapped to should the OS running this use a memory loader standard that wants straight up physical address targeting.
  • p_filesz - this is the size of the segment in the file, basically tells the loader how many bytes to suck out of the ELF.
  • p_memsz - this is the size of the segment in memory, some portions of the process image may want of course a different in memory size to be able to host expansion or dynamic usage perhaps.
  • p_flags - the permissions under which this field will be mapped (should it be mapped into memory)
  • p_align - This field is to make sure the segments when mapped in are aligned to memory properly. For a proper explanation please see the documentation. 
So just to recap, each program header has these p_* fields but whether the p_type is PT_LOAD or not decides whether the content described by the program header will actually end up as part of the memory image. The emphasis in the above sentence is because sometimes (due to the chunk based loading style of the kernel) the entire header table can end up in memory.

Anyway moving on, we should for interests sake fiddle with some p_type values and see what happens.

If we throw some crazy bytes at the program header type field readelf spits out some interesting stuff:





There are a couple more types to explore, some of which can sometimes be neat places to stuff things you need during an exploit. Either way its great to get to know the full set of behaviors the file is capable of - this way we can learn to describe more epic exploits with it!

Okay so PT_LOAD commands must be pretty interesting to mess around with so lets get that going next.

PT_LOAD commands


PT_LOAD commands as covered above, tell the loader where to stick what, with which permissions. Lets try something simple that will not immediately affect execution, but allow us to see the effect of our influence on the file. A good idea for this would be flipping some bits in the segment p_flags field.
They are pretty easy to spot in raw hex, here's me flipping the permissions on a PT_LOAD segment to full exec, read and write these permissions are defined according to popular linux standards 0x01 exec, 0x4 for read, etc (please see documentation for the full spec) we are going to give it the value 0x07:


If we're going to understand how things end up in memory from the interpretation of the ELF file we need to confirm our projections by looking at actual memory.

This is a pretty easy thing to do in linux the /proc/[PID]/maps device spits out the current memory map (which will show you a good summary of where things are, what permissions they have etc etc), in addition we can fiddle with some PT_LOAD commands and then scratch in the processes memory using gdb. Here's the general methodology to testing PT_LOAD options and confirm them:

  1. Mangle the headers as above
  2. Open the file in gdb using `gdb ./compile_me.elf`
  3. Set a break point for _start , it should still execute _start since all this involves is pointing the rip there once the program is loaded and uhm well, letting it RIP!
  4. Once the break point triggers we ask gdb what the process id is
  5. using the process id from Step 4 we can look up the memory map using the /proc/PID/maps device
The following screenshot shows how this is done:



And there you have it the memory is actually mapped with this crazy full perm setting!

Redirecting PT_LOADs


Okay so we can definitely change permissions but can we say change the address of a section in the actual memory image? Sure! Here's me doing that:

  1. hexedit the the p_vaddr of the first PT_LOAD segment in the ELF file
  2. open the binary in gdb
  3. break point on _start
  4. pop open the memory map
You should be able to see something like this:




Of course this doesn't really execute it kind of dies just after _start gets executed:



We can also inject an extra PT_LOAD command.  To inject another load command an easy way is to just rewrite the type of another section. Try using the PT_NOTE segment, they are pretty much ignored for our purposes. So here's me retyping the PT_NOTE to be an injected PT_LOAD:



This runs perfectly! Here's me confirming this in gdb, I've also included the live memory map:




And that's it for this one! I'm sure you folks can figure out more interesting games to play with the program headers in the next post I'm going to start covering the Section headers. Stay Tuned!


References and Reading: