Introduction to The ELF Format (Part IV): Exploring Section Types and Special Sections


Hi folks, this post is part of a series about the ELF format. So far in this series we have:

  1. ELF Header https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html 
  2. ELF Header and Program Headers https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html 
  3. ELF Header and Section Header Table https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html 
In this post I'm going to go over in detail how some of the sections in the format work in a bit more detail. Previous posts didn't really expand on all the weirdness that each individual section type and format can harbor, especially in how it can break interpretation of the file under normal debugging and reverse engineering efforts. We're going to run through a couple sections here, talk about different section types and see what ELFs can make some of the binutils do if we mess around with the bytes. Hope you folks enjoy!

Section Types

From other posts I've already expanded on the section table header and in that header we have a field called sh_type, which indicates the section type. Each section type is like a model or layout type for a given kind of section and imposes certain attributes to how the bits and bytes are grouped together to mean things in those sections. For instance they might be simple lists or complex nested hash look up tables.

To make this clearer; lets imagine how this aids problem solving in the ELF format. Lets say a compiler, malware or exploit developer needs a section to host a simple list of strings, in this case a section type of SHT_STRTAB would be appropriate. And as we see the .shstrtab and .strtab are exactly those types:



Here's a list of what the some of others are meant to be used for:

  • SHT_NULL - purely for storing null bytes, documentation refers to this as directly for marking a section as unused and will most probably be skipped over by most semantically driven ELF utilities. This is also a field that sometimes avoids reading strings over-into other sections. One can imagine many C programmers enjoy scanning until the cows come home OR they hit a null byte - this is the odd reason why such fields are necessary sometimes. 
  • SHT_PROGBITS - This is just a marking for a section that says it could contain anything, and the format is usually dictated by the program being executed essentially. PROGBITs is pretty much for program specific behavior - which could be anything - literally anything even Turing complete anything! These are typically used for marking the sections that contain actual code for execution, the data section, initialization / finalization procedures (or perhaps even wilder concepts specific to the ABI or compiler producing the executable code sections and accompaniments - again this section type doesn't impose much format control really
  • SHT_SYMTAB - This provides a pointer to a section that should have the format of the symbol table - I will of course flesh out how this works later on in the post because it needs it own space so in a literal way I'm going to use this keyword to mark a section further down in the post :) 
  • SHT_STRTAB - A section that holds a null terminated list of strings.
  • SHT_HASH - This section is for holding a hash table, usually to speed up looking for symbols. In fact documentation says that if an executable participate in dynamic linking it MUST have one of these sections. I will put that bold brave beautiful claim to the test later on in the post (if not in its own post depending on how exciting this potential lie becomes).
There are tons more section types, I thing its best to revert to the documentation on the full list instead of re-creating it here. Lets take a closer look at how some of these work though.

SHT_STRTAB section types (.shstrndx and friends)


Looking at what a typical SHT_STRTAB is like in a hexdump:



As you can see the strings are nice and neatly delimited by null bytes, super easy to not mess this up when reading in strings in C :))).

In previous posts I mentioned that the .shstrtab holds section names, which means it provides a good starting point for mangling the section attributes in a way that skews their interpretation by debug tools or other ELF interpreters  - a key skill in understanding how they work!*

So in this same method; for the first experiment I decided to point the start of the shstrtab down 8 bytes to see what happens to readelf's output about the sections; I get the following results:



Just to make the diagram clearer, what we have here is on the top frame, the raw hexdump of the start of the shstrtab. Originally started at 0x18F4 and we shifted it down to start at 0x18FC.

What you should see in this perhaps bloated diagram sketch; is that by moving the start of the shstrtab section we've seen that the strings jump 8 bytes down for each entry. More accurately we can say they all start 8 bytes down, but because they are strings readelf will read bytes in until it hits a null byte.  For instance we can see that the first section name instead of .interp  which is at 0x1910 originally now points to 0x1917. The .interp section usually the first valid section is now called .note.ABI-tag.The following section name (which starts 8 bytes down) is then, I-tag (since this starts at 0x191F) and then reads until it hits the null byte at 0x1924.. The rest of the sections follow the same pattern - good exercise would be to to confirm this on your own.

Okay so what happens when we mangle the section types? Lets say we NULL them out, swap section types on some of them and see if the program still runs - and if it doesn't why and how far it manages to get close to running.

Here's the results from NULLing out the section types (re-call that marking a section has a null type in the section header table imposes that it will be "skipped"):




The large white column here marks the column in this ELF that contains the sh_type bytes, I'm really just being lazy with labeling here and leaving identification of the individual section types up to the reader if need be.  But once you get in the swing of identifying the section table layout by hand, you'll quickly realize if this column is null it immediately means a whole bunch of section types are nulled out. The smaller boxes next to this column, shows some virtual addresses for some of the sections, I highlight them here so you can see quickly that we have indeed written over the records for sections shown on the right. We can also see in the hexdump that the section header table starts at 0x1a00 (which is a common value and the one we often see for the example binary I'm using, so we can guess that I probably didn't change that, the faults are here caused directly by the section sh_type mangling alone).  To confirm another way we can see that in the readelf output on the right, all the section types are indicated by NULL.

We can also see this does strange things to gdb when its trying to load some information from those sections and can even break its ability to interpret it as an executable:


Some rudimentary anti-debugging right there. Of course the immediate compliment of this as a reverse engineering effort would be to reconstitute the section headers from a stripped binary (this would work essentially by understanding common layouts of the file and identifying the most possible offsets for the sh_* fields). It might be worth it to explore what happens when you mangle other section attributes and pass it to other utilities like strace and ltrace. Moving on!

SHT_NOTE sections (.note.ABI-tag and friends)


The SHT_NOTE  type sections are simple lists of integers that provide versioning and typing for vendors. The GNU folks tend to mark ELFs liberally with these sections on GNU/Linux systems. In fact these sections are meant to indicate that they were built by tools from these systems and indicate versioning information about them. So it lists your kernel version or GNU tool version potentially lets say (of course if you're doing forensics this might be helpful, or if you're avoiding it, it might be worth stripping or forging this field hehe).

This section holds some semantic versioning information about the ABI being used and the operating system this file is for. The format of the field is basically simply a list containing 4 32 bit-words or 4 groups of 4 bytes. The layout works as follows:
  • 0x00 (4 bytes) namesz - size of the name field in bytes. 
  • 0x04 (4 bytes) descsz - size of the desc field in bytes
  • 0x08 (4 bytes) type - the type field of the OS ABI
  • 0x0C (4 bytes) name - the name field containing a null terminated list of characters
  • 0x10 (4 bytes) desc -  the description field holding some numbers that indicate 
Documentation describes that you can potentially have a note section that has no descriptor, in that case we just set the descsz to 0, and don't have the section at 0x0B.

Here's what a note section looks like in a hexdump:



Here we can see the following settings for the field values:

  • namesz is set to 0x04 00 00 00 which means the name field is 4 bytes in size
  • descsz is set to 0x10 00 00 00 which means the description field is 16 bytes in size
  • type is set to 0x01 00 00 00 which means this is GNU/Linux (because my machines are FREE machines!)
  • name field reads 0x47 0x4e 0x55 0x00 which we can clearly see reads 'G' 'N' 'U'
  • desc field holds an array of values starting at 0x268 -> 0x27C
The desc field needs a little explaining and the documentation on it is slim but here's a couple places that may expand on it better than I do (I've included them in the reading and references section) To see how its handled check out this extract from glibc-2.28/elf/dl-load.c:



Essentially it indicates the OS version and this is clearly compared to a standardized value in the library when dl-load handles it. How exactly this OS version field works is going to take a little more research on my part before I get much more mouthy about it.

Conclusion


That's going to be it for this post I don't like to bloat posts with too much text because as we know things are easier to understand when they are broken into smaller parts and carefully studied*(see the side rant for more hehe). In further posts in the series I will expand on the rest of the sections. For now I hope that cracking open these few I've started you on your way in detailing how the others work too; by understanding their types, and therefore layout gives us power to control how they are interpreted. There is a lot more tricks that can be pulled off by messing with these fields. So happy hacking!

And stay tuned for the follow up posts on the GNU_HASH and other weird archaic section types.

References and Recommended Reading:



  1. https://en.wikipedia.org/wiki/Executable_and_Linkable_Format 
  2. https://refspecs.linuxfoundation.org/LSB_2.1.0/LSB-Embedded/LSB-Embedded/elftypes.html 
  3. https://blogs.oracle.com/solaris/inside-elf-symbol-tables-v2
  4. https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-79797/index.html 
  5. https://sourceware.org/ml/binutils/2006-10/msg00377.html
  6. https://r00tk1ts.github.io/2017/08/24/GNU%20Hash%20ELF%20Sections/ 
  7. https://en.wikipedia.org/wiki/Weird_machine 
  8. https://www.cs.dartmouth.edu/~sergey/wm/ 
  9. https://en.wikipedia.org/wiki/Category_theory 
  10. http://langsec.org/papers/Bratus.pdf 


*<side-rant>
Why is this? Why do we need to break things to learn them? Especially in computers? As we know in many sciences we learn how things are build by breaking them down, tearing them apart and boiling away their non-essential parts and deciding what they mean from the perspective of their super-structures - we study how the "super" works by breaking open its "minor" parts  i.e. we learn how large complex curves work and behave in calculus work by breaking them down into small straight lines; or learn what particles are constituted of, by smashing them into one another so we can see the smaller parts; or learn how philosophy texts work by deconstructing them in some contexts and reconstructing them in other contexts- it seems to be a common theme in fields held to traditions of rigorous logical thinking.

More directly perhaps in the science of computer hacking, because we often work in the realms governed by (or are inevitably always governed by) the capability of computer languages (which themselves are governed by the relations between sets, their labels and sizes); some have realized that our  greatest pains and harshest challenges come often straight from underestimating the way languages work when they are allowed to be spoken with their broken, inconsistent and superstructure referencing parts (every language is an expression of a "base" or "host" language that usually has different and more powerful capabilities than its "guest" - in computer science we discern the power of these languages by their computational capabilities).

Just to cleanly connect my points here - one language is the "bigger", around or hosting another language by the size of its computational power and because of the references possible from its "hosting" or subset and computationally smaller languages i.e what it can possibly compute under certain proofs when using those small languages in these contexts. Sometimes they lend "subsets" of this power to isolated subsets of their literal symbols: for instance have a "language" "within" JavaScript for setting variable values and another "within" JavaScript  for part controlling execution flow, could for instance a variable setting be allowed to become an if statement or equivalently a control of execution flow? Of course! Its JavaScript! Just stick the variable value in an eval call ;) 

So through these languages we can directly speak (strings and other input data) we make reference to outer more powerful structures that appear within languages themselves (or more generally are "equivalently" in the languages themselves - I leave space for category theory and input fuzzing to argue what is the "Set" and therefore what is "in" it as well), that also impose or allow power over their ordering and labeling and effective interpretation. We say that these spirits called "weird machines" arise from learning what we can summon in apparent or seeming "non-weird machines" by giving execution and interpretation to the aspects of a language that are built in the "intersections" between other languages.  Quick example relevant here is to say; if you can make string input to a program also impose meaning (ordering or labeling properties) on the stack layout (regardless of how); namely the string is both character data and stack address data, it exposes an intersection of two languages  which gives life to the string data in an unusual but powerful way - it is not just displayable but also executable!

Anyway sorry for the philosophical rant - on with the section meta-data mangling! </side-rant>



Comments