Buffer overflow

From Free net encyclopedia

Revision as of 13:49, 20 April 2006; view current revision
←Older revision | Newer revision→

In computer security and programming, a buffer overflow, or buffer overrun, is an anomalous condition where a process attempts to store data beyond the boundaries of a buffer. The result is that the extra data overwrites adjacent memory locations. The overwritten data may include other buffers, variables and program flow data.

Buffer overflows may cause a process to crash or produce incorrect results. They can be triggered by inputs specifically designed to execute malicious code or to make the program operate in an unintended way. As such, buffer overflows cause many software vulnerabilities and form the basis of many exploits. Sufficient bounds checking by either the programmer or the compiler can prevent buffer overflows.

Contents

Technical description

A buffer overflow occurs when data written to a buffer, due to insufficient bounds checking, corrupts data values in memory addresses adjacent to the allocated buffer. Most commonly this occurs when copying strings of characters from one buffer to another.

Basic example

In the following example, a program has defined two data items which are adjacent in memory: an 8-byte-long string buffer, A, and a two-byte integer, B. Initially, A contains nothing but zero bytes, and B contains the number 3. Characters are one byte wide.

A A A A A A A A B B
0 0 0 0 0 0 0 0 0 3

Now, the program attempts to store the character string "excessive" in the A buffer, followed by a zero byte to mark the end of the string. By not checking the length of the string, it overwrites the value of B:

A A A A A A A A B B
'e' 'x' 'c' 'e' 's' 's' 'i' 'v' 'e' 0

Although the programmer did not intend to change B at all, B's value has now been replaced by a number formed from part of the character string. (In this example, on a big-endian system that uses ASCII, 'e' followed by a zero byte becomes the number 25856.)

If B was the only other variable data item defined by the program, writing an even longer string that went past the end of B could cause an error such as a segmentation fault, terminating the process.

Buffer overflows on the stack

Besides changing values of unrelated variables, buffer overflows can often be used (exploited) by attackers to change the running program into executing arbitrary supplied code. The techniques available to an attacker to seek control over a process depend on the memory region where the buffer resides on. For example the stack memory region, where data can be temporarily "pushed" onto the "top" of the stack, and later "popped" to read the value of the variable. Typically, when a function begins executing, temporary data items (local variables) are pushed, which remain accessible only during the execution of that function. Not only are there stack overflows, but also heap overflows.

In the following example, "X" is data that was on the stack when the program began executing; the program then called a function "Y", which required a small amount of storage of its own; and "Y" then called "Z", which required a large buffer:

Z Z Z Z Z Z Y X X X
             : / / /

If the function Z caused a buffer overflow, it could overwrite data that belonged to function Y or to the main program:

Z Z Z Z Z Z Y X X X
. . . . . . . . / /

This is particularly serious because on most systems, the stack also holds the return address, that is, the location of the part of the program that was executing before the current function was called. When the function ends, the temporary storage is removed from the stack, and execution is transferred back to the return address. If, however, the return address has been overwritten by a buffer overflow, it will now point to some other location. In the case of an accidental buffer overflow as in the first example, this will almost certainly be an invalid location, not containing any program instructions, and the process will crash.

Example source code

The following is C source code exhibiting a common programming mistake. Once compiled, the program will generate a buffer overflow error if run with a command-line argument string that is too long, because this argument is used to fill a buffer without checking its length.

/* overflow.c - demonstrates a buffer overflow */

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
  char buffer[10];
  if(argc < 2)
  {
    fprintf(stderr, "USAGE: %s string\n", argv[0]);
    return 1;
  }
  strcpy(buffer, argv[1]);
  return 0;
}

Strings of 9 or fewer characters will not cause a buffer overflow. Strings of 10 or more characters will cause an overflow: this is always incorrect but may not always result in a program error or segmentation fault.

This program could be safely rewritten using strncpy as follows:

/* better.c - demonstrates one method of fixing the problem */

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
  char buffer[10];
  if(argc < 2)
  {
    fprintf(stderr, "USAGE: %s string\n", argv[0]);
    return 1;
  }
  strncpy(buffer, argv[1], sizeof(buffer));
  buffer[sizeof(buffer) - 1] = '\0';
  return 0;
}

Exploitation

The techniques to exploit a buffer overflow vulnerability vary per architecture, operating system and memory region. For example, exploitation on the heap (used for dynamically allocated variables) is very different from stack-based variables.

Stack-based exploitation

A technically inclined and malicious user may exploit stack-based buffer overflows to manipulate the program in one of several ways:

  • By overwriting a local variable that is near the buffer in memory on the stack to change the behaviour of the program which may benefit the attacker.
  • By overwriting the return address in a stack frame. Once the function returns, execution will resume at the return address as specified by the attacker, usually a user input filled buffer.

If the address of the user-supplied data is unknown, but the location is stored in a register, then the return address can be overwritten with the address of an opcode which will cause execution to jump to the user supplied data. If the location is stored in a register R, then a jump to the location containing the opcode for a jump R, call R or similar instruction, will cause execution of user supplied data. The locations of suitable opcodes, or bytes in memory, can be found in DLLs or the executable itself. However the address of the opcode typically cannot contain any null characters and the locations of these opcodes can vary in their location between applications and versions of the operating system. The Metasploit Project is one such database of suitable opcodes, though only those found in the Windows operating system are listed.

Heap-based exploitation

Template:Main

A buffer overflow occurring in the heap data area is refered to as a heap overflow and is exploitable in a similar manner to that of stack-based overflows since memory on the heap is dynamically allocated by the application at run-time and typically contains program data.

Barriers to exploitation

Manipulation of the buffer which occurs before it is read or executed may lead to the failure of an exploitation attempt. These manipulations can mitigate the threat of exploitation, but may not make it impossible. Manipulations could include conversion to upper or lower case, removal of metacharacters and filtering out of non-alphanumeric strings. However techiques exist to bypass these filters and manipulations; alphanumeric code, polymorphic code Self-modifying code and return to lib-C attacks. The same methods can be used to avoid detection by Intrusion detection systems.

Protection against buffer overflows

Various techniques have been used to detect or prevent buffer overflows, with various tradeoffs. The most reliable way to avoid or prevent buffer overflows is to use automatic protection at the language level. This sort of protection, however, cannot be applied to legacy code, and often technical, business, or cultural constraints call for a vulnerable language. The following sections describe the choices and implementations available.

Choice of programming language

The choice of programming language can have a profound effect on the occurrence of buffer overflows. As of 2006, among the most popular languages are C and its derivative, [[C++]], with an enormous body of software having been written in these languages. C and C++ provide no protection against accessing or overwriting data in any part of memory through invalid pointers; more specifically, they do not check that data written to an array (the implementation of a buffer) is within the assumed boundaries of that array.

Variations on C, such as Cyclone help to prevent more buffer overflows by, for example, attaching size information to arrays. The D programming language uses a variety of techniques to avoid most uses of pointers and user-specified bounds checking.

Many other programming languages provide runtime checking which might send a warning or raise an exception when C or C++ would overwrite data. Examples of such languages range broadly from Python to Ada, from Lisp to Modula-2, and from Smalltalk to OCaml. The Java and .NET bytecode environments also require bounds checking on all arrays. Nearly every interpreted language will protect against buffer overflows, signalling a well-defined error condition. Often where a language provides enough type information to do bounds checking an option is provided to enable or disable it. Static analysis can remove many dynamic bound and type checks, but poor implementations and awkward cases can significantly decrease performance. Software engineers must carefully consider the tradeoffs of safety vs. performance costs when deciding which language and compiler setting to use.

Use of safe libraries

The problem of buffer overflows is common in the C and C++ languages because they expose low level representational details of buffers as containers for data types. Buffer overflows are thus avoided by maintaining a high degree of correctness in code which performs buffer management. Well-written and tested abstract data type libraries which centralize and automatically perform buffer management and include bounds checking can reduce the occurrence of buffer overflows. The two main building block data types in these languages in which buffer overflows commonly manifest are strings and arrays; libraries preventing buffer overflows in these data types provide the vast majority of the necessary coverage. Still, failure to use these safe libraries correctly can result in buffer overflows and other vulnerabilities; naturally, any bug in a library itself is a potential vulnerability. Safe library impementations include The Better String Library, Arri Buffer API and Vstr. The OpenBSD operating system's C library provides some API changes, the strlcpy and strlcat functions, but these are much more limited than full safe library implementations.

Stack-smashing protection

Template:Main

Stack-smashing protection is used to detect the most common buffer overflows by checking that the stack has not been altered when a function returns. If it has been altered, the program exits with a segmentation fault. Three such systems are Libsafe, and the StackGuard and ProPolice gcc patches.

Stronger stack protection is possible by splitting the stack in two: one for data and one for function returns. This split is present in the Forth programming language, though it was not a security-based design decision. Regardless, this is not a complete solution to buffer overflows, as sensitive data other than the return address may still be overwritten.

Executable space protection

Template:Main

Some operating systems now include features to prevent execution of code on the stack. These include Windows' Data Execution Prevention, OpenBSD's W^X and the PaX and Exec Shield patches for Linux.

Address space layout randomization

Template:Main

Randomization of the virtual memory addresses at which functions and variables can be found can make exploitation of a buffer overflow more difficult, but not impossible. It also forces the attacker to tailor the exploitation attempt to the individual system, which foils the attempts of internet worms. A similar but less effective method is to rebase processes and libraries in the virtual address space.

Deep Packet Inspection

Template:Main

The use of Deep Packet Inspection (DPI) can detect, at the network perimeter, remote attempts to exploit buffer overflows by use of attack signatures and heuristics. These are able to block packets which have the signature of a known attack, or if a long series of No-Operation (NOP) instructions (known as a nop-sled) is detected, these are often used when the location of the exploit's payload is slightly variable.

Packet scanning is not an effective method since it can only prevent known attacks and there are many ways that a 'nop-sled' can be encoded. Attackers have begun to use alphanumeric, metamorphic, and self-modifying shellcodes to avoid detection by heuristic packet scans also.

History

The earliest known exploitation of a buffer overflow was in 1988. It was one of several exploits used by the Morris worm to propagate itself over the internet. The program exploited was a Unix service_(computing) called fingerd.

Later, in 1995, Thomas Lopatic independently rediscovered the buffer overflow and published his findings on the Bugtraq security mailing list [1]. A year later, in 1996, Elias Levy (aka Aleph One) published in Phrack magazine the paper "Smashing the Stack for Fun and Profit"[2], a step-by-step introduction to exploiting stack-based buffer overflow vulnerabilities.

Since then at least two major internet worms have exploited buffer overflows to comprimise a large number of systems. In 2001, the Code Red worm exploited a buffer overflow in Microsoft's Internet Information Services (IIS) 5.0[3] and in 2003 the SQLSlammer worm compromised machines running Microsoft SQL Server 2000 [4].

See also

External links

es:Desbordamiento de búfer fr:Dépassement de tampon it:Buffer overflow he:פירצת Buffer Overflow nl:Bufferoverloop ja:バッファオーバーラン pl:Przepełnienie bufora ru:Переполнение буфера fi:Puskurin ylivuotovirhe tr:Arabellek aşımı