Ankitsrihbti Authors Andres Segura-Tinoco
7 days
30 days
All time
Recent
Popular
Text Normalization (TN) are techniques in the field of #NLP that are used to prepare text, sentences, and words for further processing or analysis.
Two of the most common TN techniques are Stemming and Lemmatization. In the next thread🧵 I will briefly tell you about them.
1/5
The aim of both methods (Stemming and Lemmatization) is the same: to reduce the inflectional forms of each word/term into a common base or root.
So what is the difference between them?
2/5
Stemming: process in which terms are transformed to their root in order to reduce the size of the vocabulary. It is carried by applying word reduction rules.
Two of the most common stemming algorithms are:
▪️Porter
▪️Snowball
3/5
Lemmatization: it performs a morphological analysis using reference dictionaries to create equivalence classes between words.
For example, for the token “eclipses”, a stemming rule would return the term “eclips“, while through lemmatization we would get the term “eclipse“.
4/5
Finally, let me share a quick example on the use of these two NLP techniques (with spaCy and Python):
https://t.co/Qm0Fa4cGaV
5/5
Two of the most common TN techniques are Stemming and Lemmatization. In the next thread🧵 I will briefly tell you about them.
1/5

The aim of both methods (Stemming and Lemmatization) is the same: to reduce the inflectional forms of each word/term into a common base or root.
So what is the difference between them?
2/5
Stemming: process in which terms are transformed to their root in order to reduce the size of the vocabulary. It is carried by applying word reduction rules.
Two of the most common stemming algorithms are:
▪️Porter
▪️Snowball
3/5

Lemmatization: it performs a morphological analysis using reference dictionaries to create equivalence classes between words.
For example, for the token “eclipses”, a stemming rule would return the term “eclips“, while through lemmatization we would get the term “eclipse“.
4/5

Finally, let me share a quick example on the use of these two NLP techniques (with spaCy and Python):
https://t.co/Qm0Fa4cGaV
5/5