Data-flow analysis

From Free net encyclopedia

(Difference between revisions)
Revision as of 21:13, 11 April 2006
Bluebot (Talk | contribs)
Bringing "External links" and "See also" sections in line with the [[WP:MOS|Manual of Style]] using [[Wikipedia:AutoWikiBrowser|AWB]]
Next diff →

Current revision

A simple way to perform dataflow analysis (DFA) of programs is to set up dataflow equations for each node of the control flow graph (CFG) and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes, i.e., it reaches a fixpoint. To guarantee termination it is required that the data flow equations form a fixed-point data-flow analysis, i.e., the local updates by the equations are monotone.

A canonical example of a data-flow analysis is reaching definitions.

Contents

An iterative algorithm

The fixpoint of the dataflow equations can be calculated using the round-robin iterative algorithm:

<math>\texttt{for}\ i\ \leftarrow\ 1\ \texttt{to}\ N</math>
    <math>\mathit{initialize\ node}\ i</math>
<math>\texttt{while}\ (\mathit{sets\ are\ still\ changing})</math> <math>\texttt{for}\ i\ \leftarrow\ 1\ \texttt{to}\ N</math> <math>\mathit{recompute\ sets\ at\ node}\ i</math>

The order matters

The efficiency of iteratively solving data-flow equations is influenced by the order at which local nodes are visited. For the above round-robin iterative algorithm it depends in which order nodes are selected from the set of nodes. Furthermore, it depends, whether the data-flow equations are used for forward or backward data-flow analysis over the CFG. In the following, a few iteration orders for solving data-flow equations are discussed (a related concept to iteration order of a CFG is tree traversal of a tree).

  • random order This iteration order is not aware whether the DFA solves a forward or backward data-flow problem. Therefore, the performance is relatively poor compared to specialized iteration orders.
  • postorder This is a typical iteration order for backward data-flow problems. In postorder iteration a node is visited after all its successor nodes have been visited. Typically, the postorder iteration is implemented with the depth-first strategy.
  • reverse-postorder This is a typical iteration order for forward data-flow problems. In reverse-postorder iteration a node is visited before all its successor nodes have been visited. A more natural name for reverse-postorder iteration would be "preorder iteration", but this would be confusing with the mathematical definition of preorder.

Examples

The following are examples of properties of computer programs that can be calculated by data-flow analysis. Note that the properties calculated by data-flow analysis are typically only approximations of the real properties. This is because data-flow analysis operates on the syntactical structure of the CFG without simulating the exact control flow of the program. However, to be still useful in practice, a data-flow analysis algorithm is typically designed to calculate an upper respectively lower approximation of the real program properties.

Forward Analysis

  • reaching definitions

The reaching definition analysis calculates for each program point the set of variable values that may potentially reach this program point.

1: if b==4 then
2:    a = 5;
3: else 
4:    a = 3;
5: endif
6:
7: if  a < 4 then
8:    ...

The reaching definition of variable "a" at line 7 is the set {3,5}.

Backward Analysis

  • live variables

The live variable analysis calculates for each program point the variables that may be potentially read afterwards before their next write update.

1: a = 3;
2: c = 5;
3: a = b + c;

The set of live variables at line 2 is {b,c}, but the set of live variables at line 1 is only {b} since variable "c" is updated in line 2.

Sensitivities

Modern Research on Data Flow Analysis frequently talk about different sensitivities. It is not quite clear whether all authors intend the same when using these terms. The information here might be utterly wrong, so please correct it and then remove this sentence, because I have looked everywhere for proper definitions of these terms.

  • Flow Sensitivity generally means to respect the flow of information and not merge it prematurely. In the intraprocedural case it is to merge over all paths (such as by performing an indexed depth first search over the CFG) as opposed to joining information at each merge-point (as when calculating the fixpoint). For example, a flow-insensitive alias analysis might conclude that "x may alias y in procedure q". A flow-sensitive analysis could conclude "x may alias y after statement 20 of procedure q", and conclude otherwise for statement 10.
  • context sensitive analyses take into account the context of the call. When this is not done, an interprocedural analysis can consider impossible paths, such as the path leading to one call, through the procedure, and exiting from a different call.
  • path sensitive analyses attempt to only take into account valid paths. If a certain operation is guarded by a condition and much later in the program another operation is guarded by the same condition, you have only two valid paths: either do both operations or none.

Further reading

Aho, Alfred V. Sethi, Ravi. Ullman, Jeffrey D. Compilers: Principles, Techniques and Tools. Addison Wesley. 1986.

Appel, Andrew W. Modern Compiler Implementation in ML. Cambridge University Press. 1999.

Cooper, Keith D. and Torczon, Linda. Engineering a Compiler. Morgan Kaufmann. 2005.

Muchnick, Steven S. Advanced Compiler Design and Implementation. Morgan Kaufmann. 1997.

Hecht, Matthew S. Flow Analysis of Computer Programs. Elsevier North-Holland Inc. 1977.