Forth

From Free net encyclopedia

(Redirected from FORTH)

Forth is a programming language and programming environment, initially developed by Charles H. Moore at the US National Radio Astronomy Observatory in the early 1970s. It was formalized in 1977, and standardized by ANSI in 1994. Forth is sometimes spelled in all capital letters following the customary usage during its earlier years, although the name is not an acronym.

A procedural, stack-oriented, reflective, and typeless programming language, Forth features both interactive execution of commands (making it suitable as a shell for systems that lack a more formal operating system) and the ability to compile sequences of commands for later execution. Some Forth versions (especially early ones) compile threaded code, but many implementations today generate optimized machine code like other language compilers.

Forth is so named because "[t]he file holding the interpreter was labeled FORTH, for 4th (next) generation software - but the operating system restricted file names to 5 characters.[1]" Moore's use of the phrase 4th (next) generation software appears to predate the definition of fourth-generation programming languages; he saw Forth as a successor to compile-link-go third-generation programming languages, or software for "4th generation" hardware, not a 4GL as we understand the term today.

Contents

Overview

Forth offers a standalone programming environment consisting of a stack-oriented, interactive, incremental interpreter and compiler. Programming is done by extending the language with words (the term used for Forth subroutines), which become part of the language once defined. A typical Forth package will consist of a pre-compiled kernel of the core words, which the programmer uses to define new words for the application. The application, once complete, can be saved as an image, with the programmer specified words already compiled. Generally, programmers extend the initial core with words that are useful to the types of applications that they write, and save this as their working foundation.

The logical structure of Forth resembles a virtual machine. Forth, especially early versions, implement an inner interpreter tracing indirectly threaded machine code, giving compact and fast high-level code that can be compiled rapidly. More modern implementations generate optimized machine code like other language compilers.

Forth became very popular in the 1980s because it was well suited to the small microcomputers of that time, as it is very efficient in its use of memory and easily implemented on a new machine. At least one home computer, the British Jupiter ACE, had Forth in its ROM-resident OS. The language is still used today in many embedded systems (small computerized devices) for several reasons: ease of porting, efficient memory use, shortened development time, and fast execution speed. It has been implemented efficiently on modern RISC processors, and processors that use Forth as machine language have been produced. Other uses of Forth include the Open Firmware boot ROMs used by Apple, IBM, the boot loader on Sun SPARC computers and by the FreeBSD operating system as the first stage boot controller.

Forth is one of the simplest extensible languages; the modular and extensible nature of Forth permits the writing of high-level applications such as CAD systems. Unfortunately, extensibility also helps poor programmers to write incomprehensible code, which has caused Forth to acquire a reputation as a "write-only" language. However, Forth has been used successfully in large and complex projects, and applications developed by competent and disciplined professionals have been shown to be easily maintained over decades of use on evolving hardware platforms.

Forth from a programmer's perspective

Forth relies heavily on explicit use of a data stack and reverse Polish notation (RPN or postfix notation), commonly used in calculators from Hewlett-Packard. In RPN, the operator is placed after its operands, as opposed to the more common infix notation where the operator is placed between its operands. The rationale for postfix notation is that it is closer to the machine language the computer will eventually use, and should therefore be faster to execute. For example, one could get the result of the mathematical expression (25 * 10 + 50) this way:

25 10 * 50 + .
300 ok

This command line first puts the numbers 25 and 10 on the implied stack; the word * multiplies the two numbers on the top of the stack and replaces them with their product; then the number 50 is placed on the stack, and the word + adds it to the previous product; finally, the . command prints the result to the user's terminal. Even the language's structural features are stack-based. For example:

: FLOOR5 ( n -- n' )   DUP 6 < IF DROP 5 ELSE 1 - THEN ;

This code defines a new word (again, 'word' is the term used for a subroutine) called FLOOR5 using the following commands: DUP simply duplicates the number on the stack; < compares the two numbers on the stack and replaces them with a true-or-false value; IF takes a true-or-false value and chooses to execute commands immediately after it or to skip to the ELSE; DROP discards the value on the stack; and THEN ends the conditional. The text in parentheses is a comment, advising that this word expects a number on the stack and will return a possibly changed number. The net result is a function that performs similarly to this function written in the Python programming language:

def floor5(v):
  if v < 6:
    return 5
  else:
    return v - 1

and similarly to this function written in the C programming language:

int floor5(int v) { return v < 6 ? 5 : v - 1; }

An even terser Forth definition of FLOOR5 that gives the same result:

: FLOOR5 ( n -- n' )  1- 5 MAX ;

Facilities of a Forth system

Interpreter

Forth has no formal grammar, so parsing is very simple. The interpreter reads a line of input from the user input device, which is then parsed for a word using spaces as a delimiter; some systems recognise additional whitespace characters. When the interpreter finds a word, it tries to look the word up in the dictionary. If the word is found, the interpreter executes the code associated with the word, and then returns to parse what is left of the input stream. If the word isn't found, the word is assumed to be a number, and an attempt is made to convert it into a number and push it on the stack; if successful, the interpreter continues parsing the input stream. Otherwise, if both the lookup or number conversion fails, the interpreter prints the word followed by an error message indicating the word is not recognised, flushes the input stream, and waits for new user input.

Compiler

The compiler extends the interpreter. If the word is found, the interpreter compiles instead of executing the code associated with the word. (An exception to this rule are words that are marked IMMEDIATE; they are always executed, not compiled, in either state.) Compilation is started with : (colon), which takes a name as a parameter and creates a dictionary entry. Forth returns to interpreter mode with ; (semi-colon). The simplicity of the interpreter is therefore extended to the compiler; for instance

10 DUP 1+ . .

will interpret the line and print 11 10 on the output device.

: X DUP 1+ . . ;

will compile the word X. When executed by typing 10 X at the console this will print 11 10.

Assembler

Most Forth systems include a specialized assembler that produces executable words. Forth assemblers often use a reverse-polish syntax in which the parameters of an instruction precede the instruction. The usual design of a Forth assembler is to construct the instruction on the stack, then copy it into memory as the last step. Registers may be referenced by the name used by the manufacturer, numbered (0..n, as used in the actual operation code) or named for their purpose in the Forth system: e.g. "S" for the register used as a stack pointer.

Operating System, Files and MultiTasking

Classic Forth systems traditionally use no operating system nor file system. Instead of storing code in files, source-code is stored in disk blocks written to physical disk addresses. The word BLOCK is employed to translate the number of a 1K-sized block of disk space into the address of a buffer containing the data, which managed automatically by the Forth system. Some implement contiguous disk files using the system's disk access, where the files are located at fixed disk block ranges. Usually these are implemented as fixed-length binary records, with an integer number of records per disk block. Quick searching is achieved by hashed access on key data.

Multitasking, most commonly cooperative round-robin scheduling is normally available (although multitasking words and support are not covered by the ANSI Forth Standard). The word PAUSE is used to save the current task's execution context, to locate the next task, and restore its execution context. Each task has its own stacks, private copies of some control variables and a scratch area. Swapping tasks is simple and efficient; as a result, Forth multitaskers are available even on very simple microcontrollers such as the 8051, AVR, and MSP430.

By contrast, some Forth systems run under a host operating system such as Microsoft Windows, Linux or a version of Unix and use the host operating system's file system for source and data files; the ANSI Forth Standard describes the words used for I/O. Other non-standard facilities include a mechanism for issuing calls to the host OS or windowing systems, and many provide extensions that employ the scheduling provided by the operating system. Typically they have a larger and different set of words from the stand-alone Forth's PAUSE word for task creation, suspension, destruction and modification of priority.

Self (meta) and cross compilation

A full-featured Forth system with all source code will compile itself, a technique commonly called meta-compilation by Forth programmers (although the term doesn't exactly match meta-compilation as it is normally defined). The usual method is to redefine the handful of words that place compiled bits into memory. The compiler's words therefore use specially-named versions of fetch and store that can be redirected to fetch and store to a buffer area in memory. The buffer area simulates or accesses a memory area beginning at a different address than the code buffer. Such compilers define words to access both the target computer's memory, and the host (compiling) computer's memory.

After the fetch and store operations are redefined for the code space, the compiler, assembler, etc. are recompiled using the new definitions of fetch and store. This effectively reuses all the code of the compiler and interpreter. Then, the Forth system's code is compiled, but this version is stored in the buffer. The buffer in memory is written to disk, and ways are provided to load it temporarily into memory for testing. When the new version appears to work, it is written over the previous version.

There are numerous variations of such compilers for different environments. For embedded systems, the code may instead be written to another computer over a serial port or even a single TTL bit, while keeping the word names and other non-executing parts of the dictionary in the original compiling computer. The minimum definitions to "remote" a forth compiler are the words that fetch and store a byte, and the word that commands a forth word to be executed. Often the most time-consuming part of a remote port is to construct the initial program to implement fetch, store and execute. Many modern microprocessors have integrated debugging features (such as the Motorola CPU32) that eliminate even this task.

Structure of the language

The basic data structure of Forth is the "dictionary" which maps "words" to executable code or named data structures. The dictionary is laid out in memory as a linked list with the links proceeding from the latest (most recently) defined word to oldest, until a sentinel, usually a NULL pointer, is found.

A defined word generally consists of head and body with the head consisting of the name field (NF) and the link field (LF) and body consisting of the code field (CF) and the parameter field (PF).

Head and body of a dictionary entry are treated separately because they may not be contiguous. For example, when a Forth program is recompiled for a new platform, the head may remain on the compiling computer, while the body goes to the new platform. In some environments (such as embedded systems) the heads occupy memory unnecessarily. However, some cross-compilers may put heads in the target if the target itself is expected to support an interactive Forth.

Dictionary Entry

The exact format of a dictionary entry is not prescribed, and implementations vary. However, certain components are almost always present though the exact size and order may vary. Described as a C language structure, a dictionary entry might look this way:

 struct forthword {
   byte _flag; /* 3bit flags + length of word's name. */
   char name[];  /* name's runtime length isn't known at compile time in C. */
   struct forthword *previous; /* backward ptr to previous word. */
   struct forthword *codeword; /* ptr to the code to execute this word. */
   byte parameterfield[]; /* unknown length of data, words, or opcodes. */
 };

The name field starts with a prefix giving the length of the word's name (typically up to 32 bytes), and several bits for flags. The character representation of the word's name then follows the prefix. Depending on the particular implementation of Forth, there may be one or more NUL ('\0') bytes for alignment.

The link field contains a pointer to the previously defined word. The pointer may be a relative displacement or an absolute address that points to the next oldest sibling.

The code field pointer will be either the address of the word which will execute the code or data in the parameter field or the beginning of machine code that the processor will execute directly. For colon defined words, the code field pointer points to the word that will save the current Forth instruction pointer (IP) on the return stack, and load the IP with the new address from which to continue execution of words. This is the same as what a processor's call/return instructions does.

Structure of the Compiler

The compiler itself consists of Forth words. This gives the programmer considerable control of the compiler, and a programmer can change the compiler's words for special purposes.

The "compile time" flag in the name field is set for words with "compile time" behavior. Most simple words execute the same code whether they are typed on a command line, or embedded in code. When compiling these, the compiler simply places code or a threaded pointer to the word.

Compile-time words are actually executed by the compiler. The classic examples of compile-time words are the control-structures such as IF and WHILE. All of Forth's control structures, and almost all of its compiler are implemented as compile-time words.

The assembler (see above) is a special dialect of the compiler.

Structure of Code

In most Forth systems, the body of a code definition consists of either machine language, or some form of threaded code. Traditionally, indirect-threaded code was used, but direct-threaded and subroutine threaded Forths have also been popular. The fastest modern Forths use subroutine threading, insert simple words as macros, and perform peephole optimization or other optimizing strategies to make the code smaller and faster.

Data Objects

When a word is a variable or other data object, the CF points to the runtime code associated with the defining word that created it. A defining word has a characteristic "defining behavior" (creating a dictionary entry plus possibly allocating and initializing data space) and also specifies the behavior of an instance of the class of words constructed by this defining word. Examples include:

  • VARIABLE -- Names an uninitialized, one-cell memory location. Instance behavior of a VARIABLE returns its address on the stack.
  • CONSTANT -- Names a value (specified as an argument to CONSTANT). Instance behavior returns the value.
  • CREATE -- Names a location; space may be allocated at this location, or it can be set to contain a string or other initialized value. Instance behavior returns the address of the beginning of this space.

Forth also provides a facility by which a programmer can define new application-specific defining words, specifying both a custom defining behavior and instance behavior. Some examples include circular buffers, named bits on an I/O port, and automatically-indexed arrays.

Data objects defined by these and similar words are global in scope. The function provided by local variables in other languages is provided by the data stack in Forth. Forth programming style uses very few named data objects compared with other languages; typically such data objects are used to contain data which is used by a number of words or tasks (in a multitasked implementation).

Forth does not enforce consistency of data type usage; it is the programmer's responsibility to use appropriate operators to fetch and store values or perform other operations on data.

Computer programs in Forth

Words written in Forth are compiled into an executable form. The classical "indirect threaded" implementations compile lists of addresses of words to be executed in turn; many modern systems generate actual machine code (including calls to some external words and code for others expanded in place). Some systems feature sophisticated optimizing compilers. Generally speaking, a Forth program is saved as the memory image of the compiled program with a single command (e.g., RUN) that is executed when the compiled version is loaded.

During development, the programmer uses the interpreter to execute and test each little piece as it is developed.

Most Forth programmers therefore advocate a loose top-down design, and bottom-up development with continuous testing and integration.

The top-down design is usually separation of the program into "vocabularies" that are then used as high-level sets of tools to write the final program. A well-designed Forth program reads like natural language, and implements not just a single solution, but also sets of tools to attack related problems.

The tool-box approach is one of the reasons that Forth is so difficult to master. While learning the syntax is easy, mastering the tools delivered with a professional Forth system can take several months, working full-time. The task is actually more difficult than rewriting one's own Forth system from scratch. Unfortunately, a rewrite also loses the experience accumulated in a typical professional Forth toolbox.

Implementation of a Forth System

Forth uses two stacks for each executing task. The stacks are the same width as the index register of the computer, so that they can be used to fetch and store addresses. The parameter or data stack (commonly referred to as the stack) is used to pass data to words. The linkage or return stack (commonly referred to as the rstack) is used store return addresses when words are nested (the equivalent of a subroutine call), and store local variables. There are standard words to move data between the stacks, and to load and store variables on the stack.

The Forth interpreter looks up words one at a time in the dictionary, and executes their code. The basic algorithm is to search a line of characters for a non-blank, non-control-character string. If this string is in the dictionary, and it is not a compile-time word (marked in the flag byte), the code is executed. If it is not in the dictionary, it may be a number. If it converts to a number, the number is pushed onto the parameter stack. If it does not convert, then the interpreter prints an error message; for example, the string followed by a question mark. The interpreter then throws away the rest of the input.

A Forth compiler produces dictionary entries. Other than that, it tries to simulate the same effect that would be produced by typing the text into the interpreter.

The great secret to implementing Forth is natively compiling it, so that it compiles itself. The basic scheme is to have the compiler defined in terms of a few words that access a code area. Then, one definition of the words may compile to the normal area of memory while another definition compiles to disk, or to some special memory area. The compiler is adapted by recompiling it with the new definitions. Some systems have defined low-level words to communicate with a debugger on a different computer, building up the Forth system in a different computer.

Hello world

For an explanation of the tradition of programming "Hello World", see Hello world program.

One possible implementation:

: HELLO  ( -- )  CR ." Hello, world!" ;
HELLO

A standard Forth system is also an interpreter, and the same output can be obtained by typing into the Forth console (note parentheses instead of quotes):

CR .( Hello, world!)

Note that the word 'CR' comes before the text to print. By convention, the Forth interpreter does not start output on a new line. Also by convention, the interpreter waits for input at the end of the previous line, after an 'ok ' prompt. There is no implied 'flush-buffer' action in Forth's CR, as it sometimes is in other programming languages.

Online Forth interpreters

See also

Dialects

External links

Template:Wikibookspar

Books, tutorials, and classes

Freely available Forth implementations

  • amrFORTH -- 8051 Tethered Forth for Windows/OSX/Linux/*BSD
  • PFE -- Portable Forth Environment
  • Gforth -- GNU Forth Language Environment
  • bigFORTH -- x86 native code Forth with MINOS GUI
  • kForth -- Small Forth Interpreter written in C++
  • SPF -- OpenSource Forth for Win32 and Linux. This is a ANS94 compliant Forth with optimising compiler and fast subroutine threading code
  • RetroForth -- Public Domain, for DOS, Linux, FreeBSD, Windows or standalone use -- has a wiki
  • pForth -- PD portable Forth in 'C' for embedded systems or desktops.
  • herkforth -- A colored forth for PPC Linux -- has a wiki
  • TWiki's Forth page -- includes a list of implementations for many homecomputers.
  • Computer Intelligence Forth -- an assembler-based ISO-Forth
  • eForth by C.H.Ting
  • Mops -- an object-oriented Forth dialect for the Macintosh based on the formerly commercial Neon
  • Win32Forth -- Forth for Microsoft Windows 98/2000/XP
  • colorForth -- for the PC, downloader / source reader program.
  • Reva -- a small fast x86 Forth implementation for Linux and Windows by Ron Aaron
  • 4IM -- small simple fast 16bits standalone and DOS Forth system; 32 bits Linux and Windows (portable C version), featuring GUI library bindings.
  • CamelForth -- implementation for embedded microprocessors (8051, 8086, Z80, and 6809)

Commercial Forth implementations

Historical Forth implementations

Forth communities

History of Forth

Template:Major programming languagesbg:Forth cs:Forth de:Forth (Informatik) es:FORTH fr:Forth (langage) it:Forth nl:Forth ja:FORTH pl:Forth pt:Forth ru:Forth (язык программирования) fi:Forth sv:Forth (programmeringsspråk) zh:Forth