Snowball Stemmer: It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer
Stemming: It is the process of reducing the word to its word stem that affixes to suffixes and prefixes or to roots of words known as a lemma. In simple words stemming is reducing a word to its base word or stem in such a way that the words of similar kind lie under a common stem. For example – The words care, cared and caring lie under the same stem ‘care’. Stemming is important in natural language processing(NLP).
Some few common rules of Snowball stemming are:
Few Rules:
ILY -----> ILI
LY -----> Nill
SS -----> SS
S -----> Nill
ED -----> E,Nill
Nill means the suffix is replaced with nothing and is just removed.
There may be cases where these rules vary depending on the words. As in the case of the suffix ‘ed’ if the words are ‘cared’ and ‘bumped’ they will be stemmed as ‘care‘ and ‘bump‘. Hence, here in cared the suffix is considered as ‘d’ only and not ‘ed’. One more interesting thing is in the word ‘stemmed‘ it is replaced with the word ‘stem‘ and not ‘stemmed‘. Therefore, the suffix depends on the word.
Word Stem
cared care
university univers
fairly fair
easily easili
singing sing
sings sing
sung sung
singer singer
sportingly sport
import nltk
from nltk.stem.snowball import SnowballStemmer
#the stemmer requires a language parameter
snow_stemmer = SnowballStemmer(language='english')
#list of tokenized words
words = ['cared','university','fairly','easily','singing',
'sings','sung','singer','sportingly']
#stem's of each word
stem_words = []
for w in words:
x = snow_stemmer.stem(w)
stem_words.append(x)
#print stemming results
for e1,e2 in zip(words,stem_words):
print(e1+' ----> '+e2)
Difference Between Porter Stemmer and Snowball Stemmer:
Snowball Stemmer is more aggressive than Porter Stemmer.
Some issues in Porter Stemmer were fixed in Snowball Stemmer.
There is only a little difference in the working of these two.
Words like ‘fairly‘ and ‘sportingly‘ were stemmed to ‘fair’ and ‘sport’ in the snowball stemmer but when you use the porter stemmer they are stemmed to ‘fairli‘ and ‘sportingli‘.
The difference between the two algorithms can be clearly seen in the way the word ‘Sportingly’ in stemmed by both. Clearly Snowball Stemmer stems it to a more accurate stem.
References:
https://www.geeksforgeeks.org/snowball-stemmer-nlp
No comments:
Post a Comment