Stemming algorithm

From Free net encyclopedia

A stemming algorithm is a method of reducing words to their stem, base, or root form. The algorithm has been a long-standing problem in computer science; the first paper on the subject was published in 1968. The process of stemming, often called conflation, is useful in search engines, natural language processing, and other word processing problems.

For example, a stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".

[edit]

Methods

There are several types of stemming algorithms. Some techniques used are suffix stripping and lookup table replacement. In lemmatization, the part of speech is first detected prior to attempting to find the root since for some languages, the stemming rules change depending on a word's part of speech.

While much of the work in this area has focused on the English language (with significant use of the Porter Stemmer algorithm), other languages have been investigated including at least French, Italian, Spanish, Portuguese, German, Dutch, Swedish, Norwegian, Danish, Russian, Finnish, Hebrew, and Arabic. Apparently, Hebrew and Arabic are still considered difficult research languages for stemming.

[edit]

External links

Retrieved from "http://www.netipedia.com/index.php/Stemming_algorithm"

Categories: Natural language processing

Stemming algorithm

From Free net encyclopedia

Methods

Further reading

External links

Views

Personal tools

Search

Partner sites