21 lines
928 B
Plaintext
21 lines
928 B
Plaintext
Snowball stemming algorithms, for information retrieval
|
|
|
|
Stemming algorithms
|
|
|
|
PyStemmer provides access to efficient algorithms for calculating a
|
|
"stemmed" form of a word. This is a form with most of the common
|
|
morphological endings removed; hopefully representing a common
|
|
linguistic base form. This is most useful in building search
|
|
engines and information retrieval software; for example, a search
|
|
with stemming enabled should be able to find a document containing
|
|
"cycling" given the query "cycles".
|
|
|
|
PyStemmer provides algorithms for several (mainly european) languages,
|
|
by wrapping the libstemmer library from the Snowball project in a
|
|
Python module.
|
|
|
|
It also provides access to the classic Porter stemming algorithm for
|
|
english: although this has been superceded by an improved algorithm,
|
|
the original algorithm may be of interest to information retrieval
|
|
researchers wishing to reproduce results of earlier experiments.
|