Type system
From Free net encyclopedia
In computer science, a type system defines how a programming language classifies values and variables into types, how it can manipulate those types and how they interact. A type indicates a set of values that have the same sort of generic meaning or intended purpose (although some types, such as abstract types and function types, might not get represented as values in the running computer program). Type systems vary a lot between languages, with perhaps the biggest variations in their compile-time syntactic and run-time operational implementations.
A compiler may use the static type of a value in order to optimize the storage it needs and the choice of algorithms for the operations on the value. For example, in the C programming language, a value of the "float" data type represents a 32-bit quantity that represents a single-precision floating-point number as defined by the IEEE floating-point standard. Thus C uses the respective floating-point operations to add, multiply etc. those values.
The depth of type constraints and the manner of their evaluation affects the typing of the language. Further, a programming language may associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory studies type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation and language design.
Contents |
Basis
Assigning datatypes ("typing") gives meaning to collections of bits. Types usually have associations either with values in memory or with objects such as variables. Because any value simply consists of a set of bits in a computer, hardware makes no distinction even between memory addresses, instruction code, characters, integers and floating-point numbers. Types inform programs and programmers how they should treat those bits.
Major functions that type systems provide include:
- Safety - Use of types may allow a compiler to detect meaningless or provably invalid code. For example, we can identify an expression
"Hello, World" / 3
as invalid because one cannot divide (in the usual sense) a string literal by an integer. As discussed below, strong typing offers more safety, but it does not necessarily guarantee complete safety (see type-safety for more information). - Optimization - Static type-checking may provide useful information to a compiler. For example, if a type says a value must align at a multiple of 4, the compiler may be able to use more efficient machine instructions.
- Documentation - In more expressive type systems, types can serve as a form of documentation, since they can illustrate the intent of the programmer. For instance, timestamps may be a subtype of integers -- but if a programmer declares a function as returning a timestamp rather than merely an integer, this documents part of the meaning of the function.
- Abstraction (or modularity) - Types allow programmers to think about programs at a higher level, not bothering with low-level implementation. For example, programmers can think of strings as values instead of as a mere array of bytes. Or types can allow programmers to express the interface between two subsystems. This localizes the definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.
Typically a program associates each value with one particular type (although a type may have more than one subtype). Other entities, such as objects, modules, communication channels, dependencies, or even types themselves, can become associated with a type. For example:
A type system, specified in each programming language, stipulates the ways typed programs may behave and makes behavior outside these rules illegal. An effect system typically provides more fine-grained control than a type system.
More formally, type theory studies type systems.
Type checking
The process of verifying and enforcing the constraints of types - type checking - may occur either at compile-time (a static check) or run-time (a dynamic check). Static type-checking becomes a primary task of the semantic analysis carried out by a compiler. If a language enforces type rules strongly (that is, generally allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.
Static and dynamic typing
In dynamic typing, type checking often takes place at runtime because variables can acquire different types depending on the execution path. Static type systems for dynamic types usually need to explicitly represent the concept of an execution path, and allow types to depend on it. This seems to require either a trivial or a cumbersome type system in order to work well.
Dynamic typing often occurs in "scripting languages" and other rapid application development languages. Dynamic types appear more often in interpreted languages, whereas compiled languages favour static types. See typed and untyped languages for a fuller list of typed and untyped languages.
The term duck typing refers to a form of dynamic typing implemented in languages which "guess" the type of a value.
To see how type checking works, consider the following pseudocode example:
var x; // (1) x := 5; // (2) x := "hi"; // (3)
In this example, (1) declares the name x; (2) associates the integer value 5 to the name x; and (3) associates the string value "hi" to the name x. In most statically typed systems, this code fragment would be illegal, because (2) and (3) bind x to values of inconsistent type.
By contrast, a purely dynamically typed system would permit the above program to execute, since types are attached to values, not names. The implementation of a dynamically typed language will catch errors related to the misuse of values - "type errors" - at the time of the computation of the erroneous statement or expression. In other words, dynamic typing catches errors during program execution. A typical implementation of dynamic typing will keep all program values "tagged" with a type, and check the type tag before using any value in an operation. For example:
var x = 5; // (1) var y = "hi"; // (2) var z = x + y; // (3)
In this code fragment, (1) binds the value 5 to x; (2) binds the value "hi" to y; and (3) attempts to add x to y. In a dynamically typed language, the value bound to x might be a pair (integer, 5), and the value bound to y might be a pair (string, "hi"). When the program attempts to execute line 3, the language implementation checks the type tags integer and string, discovers that the operation + (addition) is not defined over these two types, and signals an error.
Some statically typed languages have a "back door" in the language that enables programmers to write code that does not statically type check. For example, Java and C-style languages have "casts".
The presence of static typing in a programming language does not necessarily imply the absence of dynamic typing mechanisms. For example, Java uses static typing, but certain operations require the support of runtime type tests, which are a form of dynamic typing. See programming language for more discussion of the interactions between static and dynamic typing.
Static and dynamic type checking in practice
The choice between static and dynamic typing requires some trade-offs. Many programmers strongly favor one over the other; some to the point of considering languages following the disfavored system to be unusable or crippled.
Static typing finds type errors reliably and at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus what proportion of those bugs which are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased. Advocates of strongly typed languages such as ML and Haskell have suggested that almost all bugs can be considered type errors, if the types used in a program are sufficiently well declared by the programmer or inferred by the compiler. [1]
Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce machine code that just does the right thing. Further, compilers in statically typed languages can find shortcuts more easily. Some dynamically-typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization.
Statically-typed languages which lack type inference – such as Java – require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or drift out of synchronization. However, a language can be statically typed without requiring type declarations, so this is not a consequence of static typing.
Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be used as an additional mechanism for communicating the intentions of the library developer.
Dynamic typing allows constructs that some static type systems would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible (however, the typing within that evaluated code might remain static). Furthermore, dynamic typing accommodates transitional code and prototyping, such as allowing a string to be used in place of a data structure.
Dynamic typing allows debuggers greater functionality; in particular, the debugger can modify the code arbitrarily and let the program continue to run. Programmers in dynamic languages sometimes "program in the debugger" and thus have a shorter edit-compile-test-debug cycle. However, some computer scientists consider the need to use debuggers as a sign of problems in the processes of design or of development.
Dynamic typing typically makes metaprogramming more powerful and easier to use. For example, [[C++]] templates are typically more cumbersome to write than the equivalent Ruby or Python code. More advanced constructs such as metaclasses and introspection are often more difficult to use in statically-typed languages. This has led some writers, such as Paul Graham, to speculate that many design patterns observed in statically-typed languages are simply evidence of "the human compiler" repeatedly writing out metaprograms.[2]
Dynamic typing may allow compilers and interpreters to run more quickly, since changes to source code in dynamically-typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle.
Strong and weak typing
For a more complete discussion of the different meanings of the phrase strongly typed, see strongly-typed programming language.
One definition of strongly typed involves not allowing an operation to succeed on arguments which have the wrong type. A C cast gone wrong exemplifies the absence of strong typing; if a programmer casts a value in C, not only must the compiler allow the code, but the runtime should allow it as well. This allows compact and fast C code, but it can make debugging more difficult.
Some pundits use the term memory-safe language (or just safe language) to describe languages that do not allow undefined operations to occur. For example, a memory-safe language will also check array bounds.
Weak typing means that a language will implicitly convert (or cast) types when used. Revisiting the previous example:
var x = 5; // (1) var y = "hi"; // (2) x + y; // (3)
Writing the code above in a weakly-typed language, such as older versions of Visual Basic (pre-.Net), would produce runnable code which would yield the result "5hi". The system would convert the number 5 into the string "5" to make sense of the operation (the language overloads the plus-sign operator '+' to express both addition and concatenation). However, problems can ensue with such conversions and with operators overloaded in this way. For example, would the following code produce a result of 9 or "54"?
var x = 5; var y = "4"; x + y;
In contrast, the REXX language (a weakly-typed environment because it only has one type) does not overload the '+' operator, and hence '+' always denotes addition. The equivalent of the first example would fail (with one operand not a number), and the second would yield "9", unambiguously. Careful language design has also allowed other languages to appear weakly-typed (through type inference and other techniques) for usability while preserving the type checking and protection offered by languages such as VB.Net, C# and Java.
Safely and unsafely typed systems
A third way of categorising the type system of a programming language uses the safety of typed operations and conversions. Computer scientists consider a language "type-safe" if it does not allow operations or conversions which lead to erronous conditions.
Let us again have a look at the Visual Basic examples:
var x = 5; // (1) var y = "hi"; // (2) var z = x + y; // (3)
Variable z in the example will acquire the value "5hi". While the programmer may not have intended this, nevertheless the language defines the result specifically and the program does not crash or assign an undefined value to z. In this respect the Visual Basic example appears "type-safe".
Now let us look at the same example in C:
int x = 5; char y[] = "hi"; char* z = x + y;
In this example z will point to a memory address five characters beyond y, equivalent to two characters after the terminating zero character of the string pointed to by y. The content of that location is undefined, and might lie outside addressable memory, and so dereferencing z at this point could cause termination of the program. We have a well-typed, but not memory-safe program — a condition that cannot occur in a type-safe language.
Polymorphism and types
The term polymorphism refers to the ability of code (in particular, functions or classes) to act on values of multiple types, or to the ability of different instances of the same data-structure to contain elements of different types. Type systems that allow polymorphism generally do so in order to improve the potential for code re-use: in a language with polymorphism, programmers need only implement a data structure such as a list or a dictionary once, rather than once for each type of element with which they plan to use it. For this reason computer scientists sometimes call the use of certain forms of polymorphism generic programming. The type-theoretic foundations of polymorphism are closely related to those of abstraction, modularity and (in some cases) subtyping.
Explicit or implicit declaration and inference
Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate each variable with a particular type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on how programmers use those variables. For example, given a function f(x,y) which adds x and y together, the compiler can infer that x and y must be numbers -- since addition is only defined for numbers. Therefore, any call to f elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would signal an error.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14
might imply a type of floating-point; while [1, 2, 3]
might imply a list of integers; typically an array.
Types of Types
A type of types is a kind. In the type systems of most programming languages, there are only simple types in the sense that they all belong to the same kind. Kinds appear explicitly in typeful programming, such as a type constructor in the Haskell programming language, which returns a simple type after being applied to enough simple types. For example, the type constructor Either has the kind * -> * -> * and its application Either String Integer is a simple type (kind *).
Types fall into several broad categories:
- primitive types — the simplest kind of type, e.g. integer and floating-point number
- integral types — types of whole numbers, e.g. integers and natural numbers
- floating point types — types of numbers in floating-point representation
- composite types — types composed of basic types, e.g. records. Abstract data types have attributes of both composite types and interfaces, depending on whom you talk to.
- subtype
- derived type
- object types, e.g. type variable
- partial type
- recursive type
- function types, e.g. binary functions
- universally quantified types, such as parameterized types
- existentially quantified types, such as modules
- refinement types, types which identify subsets of other types
- dependent types, types which depend on run-time values
Compatibility: equivalence and subtyping
A type-checker for a statically typed language must verify that the type of any expression is consistent with the type expected by the context in which that expression appears. For instance, in an assignment statement of the form x := e
,
the inferred type of the expression e must be consistent with the declared or inferred type of the variable x
. This notion of consistency, called compatibility, is specific to each programming language.
Clearly, if the type of e and the type of x
are the same, then the assignment should be allowed. In the simplest type systems, therefore, the question of whether two types are compatible reduces to that of whether they are equal (or equivalent). Different languages, however, have different criteria for when two type expressions are understood to denote the same type. These different equational theories of types vary widely, two extreme cases being structural type systems, in which any two types are equivalent that describe values with the same structure, and nominative type systems, in which no two syntactically distinct type expressions denote the same type (i.e., types must have the same "name" in order to be equal).
In languages with subtyping, the compatibility relation is more complex. In particular, if A is a subtype of B, then a value of type A can be used in a context where one of type B is expected, even if the reverse is not true. Like equivalence, the subtype relation is defined differently for each programming language, with many variations possible. The presence of parametric or ad hoc polymorphism in a language may also have implications for type compatibility.
Type system cross reference list
Template:Type system cross reference list
See also
- Programming language
- Operator overloading
- Polymorphism in object-oriented programming
- Type signature
- Signednessast:Tipu de datu
de:Datentyp es:Tipo de dato fr:Type it:Tipo di dato lt:Duomenų struktūra hu:Adattípus nl:Gegevenstype ja:データ型 pl:Dynamiczne typowanie pt:Tipo de dado ru:Тип данных sk:Dátový typ sl:Podatkovni tip sv:Datatyp