Introduction to the ELF File Format (Part III) : The Section Headers


Hi folks! This post is part of a series I'm covering on the ELF format. In this one I'm going to discuss the section headers and unpack how they work.

So far we have:
  1. Introduction to the ELF File Format : The ELF Header (Part I) 
  2. Introduction to the ELF File Format Part II : Program Headers  (I know the naming is confusing, totally didn't play this out that well but I'll keep it consistent from here on out ;)
  3. This
I know its a super long list right? But is going to get a bunch more entries very soon. In this one I'm going to cover the rest of the fields I skipped in the first section, unpack how section headers work and I thought I'd drop a nice illustration of the format as well. Enjoy!

e_flags field and the rest

This header field can contain a number of architecture specific values and sometimes indicate things about the ABI as well. Each architecture defines its own weird set of values for these and they basically mark the ELF with certain attributes, mostly involving whether it makes use of extensions or special code formats. Here's the example for MIPS:

from https://dmz-portal.mips.com/wiki/MIPS_ELF_header_definitions 




As you can see pretty boring stuff, there's also special fields for ARM and SPARC and should be for all the other architectures ELFs can run on (they just aren't as easy to find as an example as those two lol).

e_shstrndx

This field holds the index of the.shstrtab,  in the section header table.  This section is merely an array of names for sections (used by readelf as well) providing some semantics for interpretation. This array is delimited by null values.

To make sure we know how it works for sure here's a quick diagram showing how this section works:



As you can see, in the header value dump from readelf, the index number is listed as 28. The next image shows a dump of the section header table also from readelf -S. We're focused in on entry 28 which is called the .shstrtab. The last frame shows an honest hexdump of the file confirming these theories, offset 0x18f4 contains the start of the ascii data that programs like ld and readelf deference as the names of the sections.

Okay that's the ELF header finally done and dusted. Lets check out how section headers work. 

Section Headers

Finally time to explain the section headers. They serve almost purely to tag areas of the file with semantic information so other files can find symbols, debug information, meta-data about sections themselves and much much more. Here are the ELF header fields that hold information about the section header table:
  • e_shoff - file offset where the section headers start
  • e_shnum - number of entries in the 
  • e_shentsize - the size of entries in the section header table
These are pretty straight forward as you can see they just allow the ELF interpreters to aim at the start of the table and logically limit the size of entries. Each section header table entry itself has a couple of properties to it. Sections have types, related sections that hold meta-data, and names! Here's what the ELF standard defines as section attributes:
  • sh_name - the index of .strtab that contains the section name
  • sh_type - the section type (SHT_NULL, SHT_DYN,...)
  • sh_flags - the memory attributes of this section during execution (SHF_WRITE, SHF_ALLOC,...)
  • sh_addr - the address in the file where this section starts
  • sh_size - the size in bytes this section occupies
  • sh_link - associates a section to this one, field value can depend on sh_type
  • sh_addralign - memory alignment value for this section
  • sh_entsize - the size of the entry in bytes.
These fields have a number of sub-fields so I've sketched some of them out to give you a kind of cheat sheet over view:




The sh_link field associates this section to another in order to provide important meta-data for its function. So for instance if a section requires a list of other strings to make sense of this field will contain the index of the section that contains that data. 

A good analogy would be if the section is about lets say a list of pokemon cards you might need a section to define pokemon card types or hold the name values for the cards in this case sh_link would point to the section that contains this data. So it allows sections to support one another in function.

We can see examples of this in the functionality of sections like the .rela.plt or .dynsym (list of dynamic symbols and their properties) which probably needs to know where the dynamic symbol names are so therefore would contain some sh_link value that would prove helpful in this sense.

Here's how it looks when readelf interprets this - with some helpful annotation of course:



I hope that makes it clear what that field is for. It just provides a pointer to another section header with some important associated information. Its pretty much the same story for the sh_info field, here's what the section header table looks like when its labelled to reflect the sh_info field references:


Its no surprise the .dynsym points to the .interp section. .interp holds the path name of the interpreter. The interpreter is after all the program in charge of making sense of the symbol table and function relocation.

You might be interested in in knowing how this looks in hexdump, so here you go (with nice labels too!):




As you can see the .shstrtab really is used to deference the names of the sections. In the raw format, the 0x1b is the index in .shstrtab where the name of .interp is saved. We can now see that readelf actually fetches this for us and prints out the nice fancy name.

We can move on to unpacking how the symbol and library resolution works. Stay Tuned!

References and Reading



Comments