Introduction to the ELF Format : The ELF Header (Part I)

ELF Files are charged with using their magic to perform two holy tasks in the linux universe. The first being to tell the kernel where to place stuff in memory from the ELF file on disk as well as providing ways to invoke the dynamic loaders functions and maybe even help out with some debugging information. Essentially speaking its telling the kernel where to put it in memory and also the plethora of tools that interpret the file where all the data structures are that hold useful information for making sense of the file. Anyway that's as far as I've figured it out - the actual break down is a little less simple.

I'll demonstrate why this is so here and over the next series of posts in the classic "Learn things by breaking them" style.

ELF Header and Identification fields

The first thing that appears in an ELF file is of course the header, which is like most things in file formats just a list of offsets in the file. Its purpose is to indicate essentially what kind of ELF this is and where the various interpreters of the file can find the good stuff.

Here's what the header looks like (I've included a sample here, you can grab any ELF file on the system):

If you're not super used to the linuxy world, please don't pay strong attention to the .elf extension to  my file normally ELF files do not have extensions to their file names.

The first field is called the ELF Identification. The ELF format is pretty flexible in that this same format can run on  a ton of different architectures, with support for multiple encoding and Application Binary Interfaces.  Here's the break down on how the EI_IDENT field works :

  • Offset 0x00 - 0x03 EI_MAG0 ... EL_MAG3 First for bytes of every ELF file are the ascii codes for 'E' 'L' 'F'.
  • Offset 0x04 EI_CLASS basically tells us whether the file is 32 or 64 bit. Standard says 0x1 means 32 bit and 0x2 means 64 bit. 
  • Offset 0x05 EI_DATA defines the endianness of the file 0x01 means little endian and 0x02 means big endian.
  • Offset 0x06 EI_VERSION shows the version of the ELF file, most should be set to 0x1 for version 1.
  • Offset 0x07 EI_OSABI shows the OS Application Binary Interface (ABI) extensions to the ELF file being enabled. Please bare in mind the documentation is a bit flakey here and may depend heavily on the interpretation of the particular OSABI involved sometimes. 

One can see what the EI_IDENT field says by looking at the output of readelf -h.

Pretty interesting stuff!

Lets see what happens when we change the value of the ELF version number, pop open hexedit and change offset 0x06 in the file to whatever you want, then run readelf -h on it. Here's what happens when I do this:

ELF Type, Machine and Version Fields

The next file after the e_ident file is the e_type. In the example above I claim that the type is one of EXEC (since it reads 0x02 0x00) - which according to the ELF standard means its meant to be executed (checking the standard will confirm this).

Lets dump the header of what it is probably a shared object and compare the parameters for the e_type field for instance. Here's the header for libvlc:

Yup looks like the byte offsets agree!

This one has the field for e_type set to the bytes 0x03 0x00 at offset 0x10 in the file header - this means its an ELF type of DYN which means its definitely a shared object. And here's read elf confirming this information:

After the type field we find the e_machine specification for the file which can have a number of settings each indicating the architecture this file is meant for. Again ELF supports a number of architectures so there's a range of values this can take. Might be a good idea to fiddle with this field and see what happens.
Here's some examples I found that don't appear in normal documentation:

Always good to throw a couple bytes at the format and see what it really does! Moving on the next field is the e_version which also indicates the ELF version number, which should as the byte field in the EI_IDENT field. You can pretty much set this to anything and it should still run:

The next field is one of the most important so I thought I would pop it in its own section and show you how to fiddle with it in a way that confirms its behavior.

The e_entry field

The e_entry field lists the offset in the file where the program should start executing.Normally it points to your _start method (of course if you compiled it with the usual stuff). You can point the e_entry anywhere you like, as an example I'm going to show that you can call a function that would other wise be impossible under normal execution. To start here's the C program and the Make file I'm using:

As you can see the never_call function never does get called in the main method. And when you run it the following happens:

Now lets see if we can make the e_entry point to the never_call method. To do that we need to get the following done:

  1. Look up the virtual address of the never_call function with objdump
  2. Stick the virtual address in the e_entry field
  3. Run the binary confirm the output
Here's how you look up the address of the never_call function. Run objdump -D compile_me.elf and look for the never_call function. Alternatively you could try objdump -D compile_me.elf | grep never_call.  

In my example the never_call is at address 0x400526
If you've injected the address correctly readelf -h ./compile_me.elf should show the following:

and when you run it you should see...

That's it for this post folks in Part II I'll cover the rest of the ELF header and do some weird stuff with PT_LOAD commands. Stay Tuned!

References and Reading