Computational complexity theory

From Free net encyclopedia

In computer science, computational complexity theory is the branch of the theory of computation that studies the resources, or cost, of the computation required to solve a given computational problem. This cost is usually measured in terms of abstract parameters such as time and space, called computational resources. Time represents the number of steps it takes to solve a problem and space represents the quantity of information storage required or how much memory it takes. There are often tradeoffs between time and space that have to be considered when trying to solve a computational problem. It often turns out that an alternative algorithm will require less time but more space (or vice versa) to solve a given problem. Time requirements sometimes must be amortized to determine the time cost for a well defined average case. Space requirements can be profiled over time, too, especially in consideration of a multi-user computer system.

Other resources can also be considered, such as how many parallel processors are needed to solve a problem in parallel. In this case, "parallelizable time" and "non-parallelizable time" are considered. The latter is important in real-time applications, and it gives a limit to how far the computation can be parallelized. Some steps must be done sequentially because they depend on the results of previous steps.

Complexity theory differs from computability theory, which deals with whether a problem can be solved at all, regardless of the resources required.

1 Overview
2 Decision problems
3 Computational resources
4 Complexity classes
5 The P = NP question
6 Intractability
7 Notable researchers
8 See also
9 References
10 External links

[edit]

Overview

After the theory explaining which problems can be solved and which cannot be, it was natural to ask about the relative computational difficulty of computable functions. This is the subject matter of computational complexity.

A single "problem" is an entire set of related questions, where each question is a finite-length string. For example, the problem FACTORIZE is: given an integer written in binary, return all of the prime factors of that number. A particular question is called an instance. For example, "give the factors of the number 15" is one instance of the FACTORIZE problem.

The time complexity of a problem is the number of steps that it takes to solve an instance of the problem as a function of the size of the input (usually measured in bits), using the most efficient algorithm. To understand this intuitively, consider the example of an instance that is n bits long that can be solved in n² steps. In this example we say the problem has a time complexity of n². Of course, the exact number of steps will depend on exactly what machine or language is being used. To avoid that problem, we generally use Big O notation. If a problem has time complexity O(n²) on one typical computer, then it will also have complexity O(n²p(n)) on most other computers for some polynomial p(n), so this notation allows us to generalize away from the details of a particular computer.

Example: Mowing grass has linear complexity because it takes double the time to mow double the area. However, looking up something in a dictionary has only logarithmic complexity because a double sized dictionary only has to be opened one time more (e.g. exactly in the middle - then the problem is reduced to the half).

[edit]

Decision problems

Much of complexity theory deals with decision problems. A decision problem is a problem where the answer is always YES/NO. For example, the problem IS-PRIME is: given an integer written in binary, return whether it is a prime number or not. A decision problem is equivalent to a language, which is a set of finite-length strings. For a given decision problem, the equivalent language is the set of all strings for which the answer is YES.

Decision problems are often considered because an arbitrary problem can always be reduced to a decision problem. For example, the problem HAS-FACTOR is: given integers n and k written in binary, return whether n has any prime factors less than k. If we can solve HAS-FACTOR with a certain amount of resources, then we can use that solution to solve FACTORIZE without much more resources. Just do a binary search on k until you find the smallest factor of n. Then divide out that factor, and repeat until you find all the factors.

Complexity theory often makes a distinction between YES answers and NO answers. For example, the set NP is defined as the set of problems where the YES instances can be checked "quickly" (i.e. in polynomial time). The set Co-NP is the set of problems where the NO instances can be checked quickly. The "Co" in the name stands for "complement". The complement of a problem is one where all the YES and NO answers are swapped, such as IS-COMPOSITE for IS-PRIME.

An important result in complexity theory is the fact that no matter how hard a problem can get (i.e. how much time and space resources it requires), there will always be even harder problems. For time complexity, this is determined by the time hierarchy theorem. A similar space hierarchy theorem can also be derived.

[edit]

Computational resources

Complexity theory analyzes the difficulty of computational problems in terms of many different computational resources. The same problem can be described in terms of the necessary amounts of many different computational resources, including time, space, randomness, alternation, and other less-intuitive measures. A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

Perhaps the most well-studied computational resources are deterministic time (DTIME) and deterministic space (DSPACE). These resources represent the amount of computation time and memory space needed on a deterministic computer, like the computers that actually exist. These resources are of great practical interest, and are well-studied.

Some computational problems are easier to analyze in terms of more unusual resources. For example, a nondeterministic Turing machine is a computational model that is allowed to branch out to check many different possibilities at once. The nondeterministic Turing machine has very little to do with how we physically want to compute algorithms, but its branching exactly captures many of the mathematical models we want to analyze, so that nondeterministic time is a very important resource in analyzing computational problems.

Many more unusual computational resources have been used in complexity theory. Technically, any complexity measure can be viewed as a computational resource, and complexity measures are very broadly defined by the Blum complexity axioms.

[edit]

Complexity classes

A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

The complexity class P is the set of decision problems that can be solved by a deterministic machine in polynomial time. This class corresponds to an intuitive idea of the problems which can be effectively solved in the worst cases.

The complexity class NP is the set of decision problems that can be solved by a non-deterministic machine in polynomial time. This class contains many problems that people would like to be able to solve effectively, including the Boolean satisfiability problem, the Hamiltonian path problem and the Vertex cover problem. All the problems in this class have the property that their solutions can be checked effectively.

Many complexity classes can be characterized in terms of the mathematical logic needed to express them; see descriptive complexity.

[edit]

The P = NP question

The question of whether P is the same set as NP is the most important open question in theoretical computer science. There is even a $1,000,000 prize for solving it. (See complexity classes P and NP and oracles).

Questions like this motivate the concepts of hard and complete. A set of problems X is hard for a set of problems Y if every problem in Y can be transformed easily into some problem in X with the same answer. The definition of "easily" is different in different contexts. The most important hard set is NP-hard. Set X is complete for Y if it is hard for Y, and is also a subset of Y. The most important complete set is NP-complete. See the articles on those two sets for more detail on the definition of "hard" and "complete".

[edit]

Intractability

Problems that are solvable in theory, but can't be solved in practice, are called intractable. What can be solved "in practice" is open to debate, but in general only problems that have polynomial-time solutions are solvable for more than the smallest inputs. Problems that are known to be intractable include those that are EXPTIME-complete. If NP is not the same as P, then the NP-complete problems are also intractable.

To see why exponential-time solutions are not usable in practice, consider a problem that requires 2ⁿ operations to solve (n is the size of the input). For a relatively small input size of n=100, and assuming a computer that can perform 10¹⁰ (10 giga) operations per second, a solution would take about 4*10¹² years, much longer than the current age of the universe.

[edit]

Notable researchers

Manindra Agrawal
Laszlo Babai
Manuel Blum, who developed an axiomatic complexity theory based on his Blum axioms
Allan Borodin
Stephen Cook
Uriel Feige
Juris Hartmanis
Russell Impagliazzo
Richard Karp
Marek Karpinski
Leonid Levin
Christos H. Papadimitriou
Alexander Razborov
Walter Savitch
Michael Sipser
Richard Stearns
Madhu Sudan
Leslie Valiant
Andrew Yao
Oded Goldreich

[edit]

References

L. Fortnow, Steve Homer (2002/2003). A Short History of Computational Complexity. In D. van Dalen, J. Dawson, and A. Kanamori, editors, The History of Mathematical Logic. North-Holland, Amsterdam.

[edit]