Sunday, November 6, 2022

Stemming and Lemmatization

From a corpus of words, a word is converted to its base form . Eg: fix, fixing, fixed gives fix. Different types of stemming are

1. Porter Stemmer, 

2. Lancaster Stemmer,

3. Snowball Stemmer

Lemmatization cuts the word to gets its lemma word meaning it gets a much more meaningful form than what stemming does. The output we get after Lemmatization is called ‘lemma’.

For e.g. Having is converted to Hav in Stemming, while Lemmatization converts it to Have. 

Some of them are WordNet Lemmatization, TextBlob, Spacy, Tree Tagger, Pattern, Genism, and Stanford CoreNLP lemmatization. 

references:

https://www.analyticsvidhya.com/blog/2021/05/topic-modelling-in-natural-language-processing/

No comments:

Post a Comment