Preprocessor
From Free net encyclopedia
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some sebsequent programs like compilers.
A common example from computer programming is the processing performed on source code before the next step of compilation. Preprocessors are typical examples of Domain-Specific Programming Languages.
Contents |
Lexical pre-processors
Lexical pre-processors are the lowest-level of preprocessors, insofar as they only require lexical analysis. Indeed, these pre-processors work by simple substitution of tokenized character sequences for other tokenized character sequences, according to user-defined rules. They typically perform macro substitution, inclusion of other files (by opposition to higher-order features such as inclusion of modules/packages/units/components), conditional compilation and/or conditional inclusion.
Pre-processing in C/C++
The most widely used lexical preprocessor is CPP, the C preprocessor, used pervasively in the C/C++ world.
Inclusion
The most common use of the C preprocessor is the
#include "..."
or
#include <...>
directive, which copies the full content of a file in the current file. The most common type of inclusion files are libraries headers such as math functions <math.h> and the standard I/O functions <stdio.h>.
While this use of a preprocessor for code reuse is simple, it is also slow and very inefficient and requires the additional use of conditional compilation to avoid multiple inclusions of a given header file.
Since the 1970s, faster, safer and more efficient alternatives to reuse by file inclusion have been known and used by the programming language community, and implemented in most programming languages: Java has packages, Pascal has units, Modula, OCaml, Haskell or Python have modules, just as D, designed as a replacement of C and C++, has imports.
Macros
Macros are commonly used in C to define small snippets of code. During the preprocessing phase, each macro call is replaced, in-line, by the corresponding macro definition.
For instance,
#define max(a,b) a>b?a:b
defines macro max. This macro may be called as any C function. Therefore, after preprocessing,
z = max(x,y);
becomes
z = x>y?x:y;
While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.
For instance, if f and g are two functions, calling
z = max(f(), g());
will not evaluate once f() and once g() and place the highest value in z as one may believe. Rather, one of the functions will be evaluated twice. If that function has side effects, this is usually not the expected behavior.
More modern languages typically do not use this form of meta-programming by macro-expansion of character strings, rather relying on (either automatic or manual) inlining of functions/methods and templates/parametric polymorphism/generic functions/classes/data structures.
Lisp languages, on the other hands, are designed around the use of macros of a similar -- yet much more powerful -- style.
Conditional compilation
The C preprocessor also offers conditional compilation. This permits having different versions of a same code in the same source file. Typically, this is used to customize the program with respect to the compilation platform, the status (work-in-progress/production code), as well as to ensure that header files are only included once.
#ifdef x ... #else ... #endif
or
#if x ... #else ... #endif
Once again, most modern programming languages discard this feature, rather relying on traditional if...then...else... flow control operators, leaving to the compiler the task of removing useless code from the executable.
Other lexical pre-processors
Other lexical preprocessors include
- the general-purpose m4, most commonly used in cross-platform build systems,
- the general-purpose php, most commonly used in web design.
Syntactic pre-processors
Syntactic preprocessors have been introduced with the Lisp family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case of Lisp and OCaml. Some other languages rely on a fully external language to define the transformations. This is the case of the XSLT pre-processor for XML, or its statically typed counterpart CDuce.
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives or turn a generic programming language into a Domain-Specific Programming Languages.
Customizing syntax
A good example of syntax customization is the existence of two different syntaxes in the Objective Caml programming language. Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.
Similarly, a number of programs written in OCaml customize the syntax of the language by the addition of new operators.
Extending a language
The best example of language extension through macros is the Lisp family of languages. While the languages, by themselves, are simple dynamically-typed functional cores, the standard distribution of Scheme or Common Lisp permit imperative or object-oriented programming, as well as static typing. All these features are implemented by syntactic pre-processing.
Similarly, statically-checked, type-safe regular expressions or code generation may be added to the syntax and semantics of OCaml through macros, as well as micro-threads/coroutines/fibers, monads or transparent XML manipulation.
Specializing a language
One of the unusual features of the Lisp family of languages is the possibility of using macros to "morphing" one such into an internal Domain-Specific Programming Language. Typically, in a large Lisp-based project, a module may be written using a SQL-based dialect of Lisp, another in a dialect specialized for GUIs or pretty-printing, etc.
The MetaOCaml preprocessor/language provides similar features for external Domain-Specific Programming Languages. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the OCaml programming language -- and from that language, either to bytecode or to native code
See also
- The C preprocessor
- The OCaml pre-processor-pretty-printer
- The MetaOCaml metaprogramming language
- The Epigram metaprogramming language
- The Generic PreProcessor
- Oracle Pro*C
- Programming from the bottom up
- DSL Design in Lispde:Präprozessor
fr:Préprocesseur lt:Preprocesorius ja:プリプロセッサ it:Preprocessore pl:Preprocesor ru:Препроцессор