Aho-Corasick algorithm
From Free net encyclopedia
The Aho-Corasick algorithm is a string searching algorithm discovered by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of patterns (the "dictionary") within an input text. It matches all patterns "at once", so the complexity of the algorithm is linear in the size of the patterns plus the size of the search string.
Informally, the algorithm constructs a finite automaton first and then applies that automaton to the input text. When the pattern dictionary is known in advance (e.g. a computer virus database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use.
The Aho-Corasick algorithm forms the basis of the Unix command fgrep.
Sources
- {{cite journal
| first = Alfred V. | last = Aho | authorlink = Alfred Aho | coauthors = Margaret J. Corasick | year = 1975 | month = June | title = Efficient string matching: An aid to bibliographic search | journal = Communications of the ACM | volume = 18 | issue = 6 | pages = 333–340 | id = Template:Doi | url = }} (Access to the full text may be restricted.)
External links
- Animation of the Aho/Corasick Pattern Matching Automaton
- Set Matching and Aho-Corasick Algorithm by Pekka Kilpeläinen
- Aho-Corasick string matching in C# by Tomáš Petříček (mirror)
- Aho-Corasick entry in NIST's Dictionary of Algorithms and Data Structuresde:Aho-Corasick-Algorithmus