Pointer

From Free net encyclopedia

In computer science, a pointer is a programming language datatype whose value refers directly to ("points to") another value stored elsewhere in the computer memory using its address. Obtaining the value that a pointer refers to is called dereferencing the pointer. A pointer is a simple implementation of the general reference datatype (although it is quite different from the facility referred to as a [[reference (C++)|reference]] in C++).

Pointers are so commonly used as references that sometimes people use the word "pointer" to refer to references in general; however, more properly it only applies to data structures whose interface explicitly allows it to be manipulated as a memory address. If you are seeking general information on a small piece of data used to find an object, see reference (computer science).

Contents

Architectural roots

Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word, effectively transforming all of memory into a very large array. Then, if we have an address, the system provides an operation to retrieve the value stored in the memory unit at that address. Pointers are datatypes which hold addresses. See reference (computer science).

In the usual case, a pointer is large enough to hold more different addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, called a segmentation fault. On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times.

In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.

Uses

Pointers are directly supported without restrictions in C, [[C++]], Pascal and most assembly languages. They are primarily used for constructing references, which in turn are fundamental to constructing nearly all data structures, as well as in passing data between different parts of a program.

When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.

Pointers are used to pass parameters by reference. This is useful if we want a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.

Examples of use

Below is an example of the definition of a linked list in C; this is not possible in C without pointers.

/* the empty linked list is represented by NULL or some other signal value */
#define EMPTY_LIST NULL

struct link {
    void *data; /* the data of this link */
    struct link *next; /* the next link; EMPTY_LIST if this is the last link */
};

Note that this pointer-recursive definition is essentially the same as the reference-recursive definition from the Haskell programming language:

data Link a = Nil             {- the empty list -}
            | Cons a (Link a) {- a cons cell of a value of type a and another link -}

The definition with references, however, is type-checked and doesn't use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper functions, which are carefully checked for correctness.

Arrays in C are just pointers to consecutive areas of memory. Thus:

#include <stdio.h>

int main() {
    int array[5] = { 2, 4, 3, 1, 5 };
    printf("%p\n", array);      /* print the address of the array */
    printf("%d\n", array[0]);   /* print the first item of the array, 2 */
    printf("%d\n", *array);     /* print the first integer at the address
                                 * pointed to by array; this is the first
                                 * item, 2 */
    printf("%d\n", array[3]);   /* print the fourth item of the array, 1 */
    printf("%p\n", array+3);    /* print the third address past array */
    printf("%d\n", *(array+3)); /* print the value at the address just
                                 * printed this is the fourth item, 1 */
    return 0;
}

This activity is called pointer arithmetic: direct arithmetic operations on pointers are used to index arrays. See below for more detail.

Lastly, pointers can be used to pass variables by reference, allowing their value to be changed. For example:

#include <stdio.h>

void alter(int *n) {
    *n = 120;
}

int main() {
    int x = 24;
    int *address = &x;       /* the '&' operator (read "reference") retrieves
                              * the address of a variable */

    printf("%d\n", x);       /* show x */
    printf("%p\n", address); /* show x's address */
    alter(&x);               /* pass x's address to alter, x is passed "by reference" */
    printf("%d\n", x);       /* show x's new value */
    printf("%p %p\n", address, &x); /* notice that x's address is not altered */

    return 0;
}

Typed pointers and casting

In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.

However, few languages strictly enforce pointer types, because programmers often run into situations where they want to treat an object of one type as though it were of another type. For these cases, it is possible to typecast, or cast, the pointer. Some casts are always safe, while other casts are dangerous, possibly resulting in incorrect behavior. Although it's impossible in general to determine at compile-time which of these casts are safe, some languages store run-time type information which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.

Making pointers safer

Because pointers allow a program to access objects that are not explicitly declared beforehand, they enable a variety of programming errors. However, the power they provide is so great that it can be difficult to do some programming tasks without them. To help deal with their problems, many languages have created objects that have some of the useful features of pointers, while avoiding some of their pitfalls.

One major problem with pointers is that, as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative languages like Java, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.

A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behaviour, either because the initial value is not a valid address, or because using it may damage the runtime system and other unrelated parts of the program.

In systems with explicit memory allocation, it's possible to create a dangling pointer by deallocating the memory region it points into. This type of pointer is dangerous and subtle, because a deallocated memory region may contain the same data as it did before it was deallocated, but may be then reallocated and overwritten by unrelated code, unbeknownst to the earlier code. Languages with garbage collection prevent this type of error.

Some languages, like C++, support smart pointers, which use a simple form of reference counting to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks.

The null pointer

A null pointer has a reserved value, often but not necessarily the value zero, indicating that it refers to no object. Null pointers are used routinely, particularly in C and C++, to represent exceptional conditions such as the lack of a successor to the last element of a linked list, while maintaining a consistent structure for the list nodes. This use of null pointers can be compared to the use of null values in relational databases and to the "Nothing" value in the "Maybe" monad. In C, each pointer type has its own null value, and sometimes they have different representations.

Because it refers to nothing, an attempt to dereference a null pointer causes a run-time error that usually terminates the program immediately (in the case of C, often with a segmentation fault, since the address literally corresponding to the null pointer will likely not be allocated to the running program). In Java, access to a null reference triggers a Java.lang.NullPointerException, which can be caught (but a common practice is to attempt to ensure such exceptions never occur). In safe languages a possibly null pointer can be replaced with a tagged union which enforces explicit handling of the exceptional case; in fact, a possibly-null pointer can be seen as a tagged union with a computed tag.

A null pointer should not be confused with an uninitialized pointer: a null pointer is guaranteed to compare unequal to a pointer to any object or function, whereas an uninitialized pointer might have any value. Two separate null pointers are guaranteed to compare equal.

Wild pointers

Wild pointers are pointers that have not been initialized (that is, set to point to a valid address) and may make a program crash or behave oddly . In the Pascal or C programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.

Examples

The following example code shows a wild pointer:

#include <stdio.h>
#include <stdlib.h>  

int main(void)
{
    char *p1 = (char *)malloc(sizeof(char)); // allocate memory and initialized pointer
    printf("p1 points to: %p\n",  p1);       // points to some place in the heap
    printf("Value of  *p1: %c\n", *p1);      // (undefined) value of some place in the heap
 
    char *p2;                                // wild (uninitialized) pointer
    printf("Address of p2: %p\n",  p2);      // undefined value, may not be a valid address
 
    // if you are LUCKY, this will cause an addressing exception
    printf("Value of  *p2: %c\n", *p2);      // random value at random address
 
    return 0;
}

The problems with invalid pointers include more than simply uninitialized values.

For instance, pointers can be used after the object or variable they point to no longer exists, or has gone out of scope, as in the example below. If such an invalid pointer is used, the program will probably not immediately crash, but the result will still probably be incorrect, and the failure will probably be hard to track down.

#include <stdio.h>
#include <stdlib.h>
  
int badIdea(int **p)  // p is a pointer to a pointer to an int
{
    int x = 1;           // allocate an int on the stack
    **p = x;             // assign value of x to int that pointer p points to
    *p = &x;             // make the pointer that p points to point to x
    return x;            // after return x is out of scope and undefined
}
 
int main(void)
{
    int y = 0;
    int *p1 = &y;                             // initialized pointer to y
    int *p2 = NULL;                           // a good habit to form
    printf("Address of p1: %p\n",  p1);       // prints address of y
    printf("Value of  *p1: %d\n", *p1);       // prints value of y
    y = badIdea(&p1);                         // changes y and changes p1          
  
    // p1 now points to where x was
    // The place where x was will get clobbered, 
    // for instance, on the next interrupt, or on
    // the next subroutine call, as below....
 
    // some other code that also uses the stack
    p2 = (int *)malloc(5*sizeof(int));          
 
    // this probably will NOT crash, but value printed is unpredictable
    printf("Value of  *p1: %p\n", *p1);     // prints value of where x was 
 
    return 0;
}

A very common problem is using a pointer to the heap after that memory has been deallocated, as in this example. The invalid copies of the pointer are usually much harder to find than here.

#include <stdio.h>
#include <stdlib.h>
 
int main(void)
{
    int *p1 = (int *)malloc(sizeof(int));     // initialized pointer to heap
    int *p2 = p1;                             // make a copy
    *p1 = 0;                                  // initialize the heap
    printf("Address of p2: %p\n",  p2);       // points into the heap
    printf("Value of  *p2: %d\n", *p2);       // should print zero
    free(p1);                                 // deallocate the memory
    ....     // other code, possibly using the heap 
 
    // p2 still points to the original allocation, but who knows what is there
    printf("Value of  *p2: %d\n", *p2);       // invalid use of p2
 
    return 0;
}

A third way that pointers can be misused is to access outside the data structure they point to. Here is a simple example.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int y = 5;                                // create a variable 
    int *p1 = &y;                             // initialized pointer to y
    printf("Address of p1: %p\n",  p1);       // address of y
    printf("Value of  *p1: %d\n", *p1);       // value of y
    p1 = p1 + y;                              // allowed pointer arithmetic
    printf("Value of  *p1: %p\n", *p1);       // p1 no longer points to y
    return 0;
}

If a pointer is used to write beyond the end of a local buffer, the stack can be destroyed. In the case below, the problem will probably manifest when the main program returns.

#include <stdio.h>
#include <stdlib.h>

// copy source to destination, no check on sizes
void strcopy(char *d, char *s)
{
    while (*d++ = *s++) // copy until '\0' encountered
      ;
}  

int main(void)
{
    char y[3];                                  // create a local buffer
    char *p1 = (char *)malloc(10*sizeof(char)); // another buffer on heap
    p1[9] = '\0';                               // terminate the larger buffer
    strcopy(y, p1);                             // overflow the local buffer
    free(p1);
    return 0;                                   // now bad stuff happens
}

Support in various programming languages

A number of languages support some type of pointer, although some are more restricted than others. If a pointer is significantly abstracted, such that it can no longer be manipulated as an address, the resulting data structure is no longer a pointer; see the more general reference article for more discussion of these.

Ada

Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized to null, and any attempt to access data through a null pointer causes an exception to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports "safe" arithmetic on access types via the package System.Storage_Elements.

C/C++

In C and [[C++]], pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types. A special pointer type called the "void pointer" points to an object of unknown type and cannot be dereferenced. The address can be directly manipulated by casting a pointer to and from an integer.

[[C++]] fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. The C++ standard library also provides auto ptr, a sort of smart pointer which can be used in some situations as a safe alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a [[reference (C++)|reference]] or reference type.

Pointer arithmetic

Pointer arithmetic is unrestricted; adding or subtracting from a pointer moves it by a multiple of the size of the datatype it points to. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers -- which is often the intended result. Pointer arithmetic cannot be performed on void pointers.

Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. In particular, the C definition explicitly declares that the syntax a[n], which is the n-th element of the array pointed by a, is equivalent to *(a+n), which is the content of the element pointed by a+n.

While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmers, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high level computer languages (for example Java) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone addresses many of the issues with pointers. See C programming language#Pointers for more criticism.

C#

In the C# programming language, pointers are supported only under certain conditions: any block of code including pointers must be marked with the unsafe keyword. Such blocks usually require higher security permissions than pointerless code to be allowed to run. The syntax is essentially the same as in C++, and the address pointed can be either managed or unmanaged memory. However, pointers to managed memory (any pointer to a managed object) must be declared using the fixed keyword, which prevents the garbage collector from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.

The .NET framework includes many classes and methods in the System and System.Runtime.InteropServices namespaces (such as the Marshal class) which convert .NET types (for example, System.String) to and from many unmanaged types and pointers (for example, LPWSTR or void*) to allow communication with unmanaged code.

D

The D programming language is a derivative of C and C++ which fully supports C pointers and C typecasting. But D also offers numerous constructs such as foreach loops, out function parameters, reference types, and advanced array handling which replace pointers for most routine programming tasks.

Modula-2

Pointers are implemented very much as in Pascal, as are VAR parameters in procedure calls. Modula 2 is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula 2 (such as Modula-3) include garbage collection.

Oberon

Much as with Modula-2, pointers are available. There are still fewer ways to evade the type system and so Oberon and its variants are still safer with respect to pointers than Modula-2 or its variants. As with Modula-3, garbage collection is a part of the language specification.

Pascal

Pascal implements pointers in a straightforward, limited, and relatively safe way. It helps catch mistakes made by people who are new to programming, like dereferencing a pointer into the wrong datatype; however, a pointer can be cast from one pointer type to another. Pointer arithmetic is unrestricted; adding or subtracting from a pointer moves it by that number of bytes in either direction, but using the Inc or Dec standard procedures on it moves it by the size of the datatype it is declared to point to. Trying to dereference a null pointer, named nil in Pascal, or a pointer referencing unallocated memory, raises an exception in protected mode. Parameters may be passed using pointers (as VAR parameters) but are automatically handled by the runtime system.


See also

External links

  • Pointer Fun With Binky Introduction to pointers in a 3 minute educational video - Stanford Computer Science Education Library (this link has crashed some browsers -- use with caution)
  • 0pointer.de A terse list of minimum length source codes that dereference a null pointer in several different programming languagesca:Punter

cs:Ukazatel (informatika) de:Zeiger (Informatik) es:Puntero (programación) fr:Pointeur (programmation) ko:포인터 (프로그래밍) it:Puntatore (programmazione) he:מצביע nl:Pointer (programmeerconcept) ja:ポインタ pl:Zmienna wskaźnikowa pt:Ponteiro (programação) ru:Указатель (тип данных) sk:Ukazovateľ (informatika) tr:İşaretçiler