Formal language

From Free net encyclopedia

In mathematics, logic, and computer science, a formal language is a set of finite-length words (i.e. character strings) drawn from some finite alphabet, and the scientific theory that deals with these entities is known as formal language theory. Note that we can talk about formal language in many contexts (scientific, legal, linguistic and so on), meaning a mode of expression more careful and accurate, or more mannered than everyday speech. The sense of formal language dealt with in this article is the precise sense studied in formal language theory.

An alphabet might be <math>\left \{ a , b \right \}</math>, and a string over that alphabet might be <math>ababba</math>.

A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number of symbols <math>a</math> and <math>b</math>.

The empty word (that is, length-zero string) is allowed and is often denoted by <math>e</math>, <math>\epsilon</math> or <math>\Lambda</math>. While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words in it may be unbounded).

A question often asked about formal languages is "how difficult is it to decide whether a given word belongs to the language?" This is the domain of computability theory and complexity theory.

Contents

Examples

Some examples of formal languages:

  • the set of all words over <math>{a, b}</math>
  • the set <math>\left \{ a^{n}\right\}</math>, n is a prime number and <math>a^{n}</math> means <math>a</math> repeated <math>n</math> times
  • Finite languages, such as <math>{a, aa, bba}</math> -
  • the set of syntactically correct programs in a given programming language; or
  • the set of inputs upon which a certain Turing machine halts.

Specification

A formal language can be specified in a great variety of ways, such as:

Operations

Several operations can be used to produce new languages from given ones. Suppose <math>L_{1}</math> and <math>L_{2}</math> are languages over some common alphabet.

  • The concatenation <math>L_{1}L_{2}</math> consists of all strings of the form <math>vw</math> where <math>v</math> is a string from <math>L_{1}</math> and <math>w</math> is a string from <math>L_{2}</math>.
  • The intersection <math>L_1 \cap L_2</math> of <math>L_{1}</math> and <math>L_{2}</math> consists of all strings which are contained in <math>L_1</math> and also in <math>L_{2}</math>.
  • The union <math>L_1 \cup L_2</math> of <math>L_{1}</math> and <math>L_{2}</math> consists of all strings which are contained in <math>L_{1}</math> or in <math>L_{2}</math>.
  • The complement of the language <math>L_{1}</math> consists of all strings over the alphabet which are not contained in <math>L_{1}</math>.
  • The right quotient <math>L_{1}/L_{2}</math> of <math>L_{1}</math> by <math>L_{2}</math> consists of all strings <math>v</math> for which there exists a string <math>w</math> in <math>L_{2}</math> such that <math>vw</math> is in <math>L_{1}</math>.
  • The Kleene star <math>L_{1}^{*}</math> consists of all strings which can be written in the form <math>w_{1}w_{2}...w_{n}</math> with strings <math>w_{i}</math> in <math>L_{1}</math> and <math>n \ge 0</math>. Note that this includes the empty string <math>\epsilon</math> because <math>n = 0</math> is allowed.
  • The reverse <math>L_{1}^{R}</math> contains the reversed versions of all the strings in <math>L_{1}</math>.
  • The shuffle of <math>L_{1}</math> and <math>L_{2}</math> consists of all strings which can be written in the form <math>v_{1}w_{1}v_{2}w_{2}...v_{n}w_{n}</math> where <math>n \ge 1</math> and <math>v_{1},...,v_{n}</math> are strings such that the concatenation <math>v_{1}...v_{n}</math> is in <math>L_{1}</math> and <math>w_{1},...,w_{n}</math> are strings such that <math>w_{1}...w_{n}</math> is in <math>L_{2}</math>.

See also

Further reading

  • Hopcroft, J. & Ullman, J (1979). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. ISBN 020102988X

Template:Formal languages and grammarsbg:Формален език cs:Formální jazyk de:Formale Sprache es:Lenguaje formal fr:Langage formel ko:형식 언어 it:Linguaggio formale (matematica) nl:Formele taal ja:形式言語 pl:Język formalny pt:Linguagem formal ro:Limbaje formale ru:Формальный язык sk:Formálny jazyk fi:Formaali kieli tr:Biçimsel dil kuramı zh:形式语言