Tokenizing

From Free net encyclopedia

Image:Merge-arrows.gif It has been suggested that this article or section be merged with Token (parser). (Discuss)

Tokenizing is the operation of splitting up a string of characters into a set of tokens.

The term is also used when, during the parsing of source code of some programming languages, the symbols are converted into another format that is much smaller. Most BASIC interpreters used this to save room, a command such as print would be replaced by a single number which uses much less room in memory. In fact most lossless compression systems use a form of tokenizing, although it's typically not referred to as such.

In human cognition tokenization is often used to refer to the process of converting a sensory stimulus into a cognitive "token" suitable for internal processing. A stimulus that is not correctly tokenized may not be processed or may be incorrectly merged with other stimuli.

Template:Comp-sci-stub