Hey, this is Maxwell Dulin. Today, I wanted to discuss a fascinating spectacle! The binary exploiters of the world are trying to run a car with a half broken engine, 3 wheels and a destroyed windshield. What does this actually mean? We are trying to keep the program running just long enough for the ability to run arbitrary code.
Alright, enough of the metaphors... In general, heap exploitation is about corrupting the metadata of the heap (dynamic memory management) data structures in a particular way that allows for overwriting values that we should not be able to control. This article expects a basic understanding of heap exploitation. All of the basic data structures that re needed for this exploit will be explained in this article; but, for more information, I recommend reading this ebook.
This article is about the newly discovered
House of Husk. This is a heap exploitation technique that can be leveraged in order to gain arbitrary code execution in GLibC. This is another one of these primitives, which eventually leads to code execution. I wanted to explain the inner workings of this technique, with a step-by-step guide on how to use it.
First, we will discuss how this attack works at a high level. Then, dive into the technical details of how it actually works. In GLibC, it is possible to create custom format format string operations. These operators are then ran inside of GLibC. What would happen if we could somehow overwrite one of our function pointers with some method? This is the basis of the House of Husk.
In order to make this possible, there are four main steps:
- Unsorted Bin Attack
- Freeing Spree
- Printf Execution
In a nutshell, here is what the following does... The unsorted bin attack is used in order to allow for arbitrary writing of heap pointers into LibC (beyond the fastbin memory location). Then, by freeing chunks of specially chosen sizes, we can overwrite pointers within GLibC that are used for custom format string operations. Finally, we execute the format string in order to gain control of the program.
For this to make the most sense, this will go a little out of order (in terms of the list above). So, in the exploitation code, just keep that in mind. First, we will discuss how the custom format string specification works. Then, we will go into the attacks used in order to exploit this.
Custom Printf Format Strings
In order to create a custom format string, the __register_printf_specifier needs to be called. This instantiates two tables that we care about: __printf_function_table and __printf_arginfo_table. These tables are what is used in order to execute the proper functions for a custom format string.
The calling of the functions is much more interesting though:
All this if statement looks for is a non-null value in the one of three tables. If so, it continues into the custom format string specification by jumping to the do_positional label.
The actual execution takes a function pointer from the __printf_arginfo_table table and executes this. The index is calculated based upon the format string specified. The code for this is seen below.
What did we learn from that? The function table must be a non-null value in order to hit the custom condition. Additionally, the __printf_arginfo_table must contain a pointer, where each index is a pointer to the function to be executed for the format string. Let's see if we can work with that!
Unsorted Bin Attack
GLibC's free chunks are set into different bins. These bins are used in order to store the memory that has been allocated and then freed. A bin is just a linked list of chunks of memory. There are two bins that we care about for this attack: fastbins and the unsorted bin. The fastbin is used for chunks of sizes 0x80 or less. The unsorted bin is used for all chunks just before they get put into the small/large bins. The unsorted bin can hold chunks of any size. For more information on chunks and bins, refer to Azeria Labs.
Free chunks generally have the following data structure:
The fd pointer is a pointer that points to the next
chunk in the bin. The bk pointer is a pointer that points to the previous
chunk in the bin.
Bk pointer Overwrite
Why is this important? The perk of having the chunk metdata inline with the chunks is that it is very space efficient. However, vulnerabilities, such as buffer overflows and use after frees, make this memory management have security issues. The following code snippet demonstrates how a chunk from the unsorted bin gets removed:
What's interesting, about this piece of code, is that
bck->fd = bin_ptr. What happens if we control the bck pointer (which is just current_pointer->bk)? We can write to an arbitrary place in memory! Although this is super interesting, we have a limitation: what can control where we write but not what.
Write to Where?
The next question is: "What would be beneficial to overwrite with a very large value?" Although there are several places (program specific even) ways to corrupt GLibC, one option is to overwrite the global_max_fast value. This value is the maximum size that a fastbin can be. Overwriting this value allows essentially all values to be treated like fastbins.
Instead of limiting the written values to the edge of the array of pointers, this now writes arbitrarily deep into GLibC. Now, we can write heap pointers into any location based upon the size of a chunk!
Negative Effects of Unsorted Bin Attack
Some of the effects are good; others are very destructive. The main negative effect is that the unsorted bin is now completely unusable. Anything put into the unsorted bin (after the attack) fwill crash the program instantly. All the dynamic memory, to be used, must come from either the previous fastbins, or memory that has already been allocated.
The unsorted bin attack turns everything into a fastbin chunk. So, we can overwrite any value in GLibC (after the malloc_state.fastbinsY address) with a heap pointer, simply by freeing a chunk with a specific size. The new question is the following: where should we put these heap pointers at?
With the new ability to write a heap pointer to any location beyond the fastbins, we should abuse this!
As discussed before, there are two main variables in GLibC that matter for the custom string formats: __printf_function_table and __printf_arginfo_table. We will start with __printf_function_table first.
The __printf_function_table (as shown in code above) only needs for the value to be non-null for the custom format string path to be evaluated. So, this will be the first value that we will overwrite. Overwriting this value, on the next printf, will trigger the custom format string operation to be used.
The __printf_arginfo_table is just a pointer to an array of function pointers. If we just overwrite this variable with a heap pointer, we now control the table of function pointers! Whooo! We just have to make sure that we actually control the format string trying to be accessed.
By freeing the values into these specific variables, we can control the flow of the program.
Now, we need to control the allocations for these chunks, in order to place the pointers into the proper GLibC variables. We will next discuss how these offsets are calculated.
At this point, it should be noted that my exploits were written for GLibC 2.23. The original POC was written for GLibC 2.27. I will point of where this makes a difference later on.
The formula for creating chunks of the proper size is
delta * 2 - offset, where delta is the amount of bytes from the main_arena to the variable to overwrite. The reason for the
* 2 is the pointers are in intervals of 8 bytes (assuming 64-bit) and the fastbins are arranged in 16 byte intervals. So, the * 2 comes from
16/8 = 2. Finally, the offset is dependent on the version of GLibC; 2.27 the offset is 0x10, while 2.23 does not have any offset. This value is dependent on the malloc_state struct. However, this is the only thing that I am not 100% confident on in this article. Reach out to me if you understand where this offset comes from :)
The two offsets are __printf_function_table- MAIN_ARENA and __printf_arginfo_table- MAIN_ARENA. If you multiple both of these by 2, that will be the proper chunk size to use.
The final step is triggering the code execution. Let's light a fire here!
The table itself is calculated using the following:
(ASCII_val - 2) * 8. For example, a value of 'X' would be a value of 88 on the ASCII chart. Furthermore, multiple this value by 8 in order to be at the index (pointers are 8 bytes long). So, for 'X' the final location would be
PRINTF_ARGINFO_TABLE + 688 or index 86 in the table. If we insert a function pointer in our chunk at this exact address, we have setup the custom format string properly.
Finally, for code execution! All we have to do is come across a format string, with the format that we just specified. Then, we have a fire started; code execution has been changed :) A proof of concept (POC) of this in GLibC 2.23 can be found at here.
Having the in-depth explanation above is nice for debugging. But, if you want a nice and easier to follow with a CTF situation, here it is...
- UAF on an unsorted bin chunk.
- LibC Address Leak.
- Very good control over chunk sizes being allocated and ordering.
Any version where the unsorted bin attack can be performed on. This works in all less than 2.28 quite easily. Newer versions of GLibC have some migrations in place for this attack. But, these mitigation's can be bypassed.
List of Steps
- Chunk1: Allocate a value that will be put into the unsorted bin (used for UAF later)
- Chunk2: Allocate a properly sized chunk to overwrite __printf_function_table.
- Chunk3: Allocate a properly sized chunk to overwrite __printf_arginfo_table.
- Free Chunk1.
- Overwrite the bk pointer of Chunk1 with a UAF vulnerability to be the value of global_max_fast - 0x10.
- Allocate a chunk of the same size as Chunk1 to trigger the overwrite of global_max_fast.
- Free Chunk2 to overwrite the __printf_function_table.
- Free Chunk3 to overwrite the __printf_arginfo_table. This chunk should have a pointer in the proper slot for the format string for the function pointer.
- Run a printf-ish function, with the proper format string, to get code execution :)
A couple of notes on the above:
- Steps 1-3 can happen in any order.
- Steps 7-8 can happen in either order.
- The reason that the value is global_max_fast - 0x10 is because the bins fd pointer is at offset 0x10 in the struct. Hence, we have to subtract 0x10 from this value in order for the write to happen in the proper location.
The steps above assume none of the chunks are consolidated with the top_chunk. This is too complicated to fully discuss and will be left as an exercise for the reader.
The unsorted bin is completely destroyed and unusable. Keep that in mind after running the unsorted bin attack.
- Other format strings can be used prior to executing the newly defined format string. So, do not panic in this situation.
- All format string functions can utilize this method, including printf and snprintf.
This is a really cool way to take control over a program! Overall, this takes quite a bit of control in over the allocation sizes and ordering. But, other than that, this exploitation method has a huge upside, in that a single UAF leads to RCE :) I hope you learned something reading this article; reach out to me (contact information is in the footer) if you have any questions or comments. Cheers from Maxwell "ꓘ" Dulin .