A lemmatizer determines the token's part of speech (POS tagging), then maps it to a lemma via a morphological dictionary or inflection rules. It requires linguistic resources, so it is slower and more language-dependent than stemming.
Stemming yields non-word stems and conflates unrelated forms. Lemmatization, using morphological knowledge, correctly merges inflected forms ("was", "is", "will be" → "be") while preserving interpretability.
Without part of speech, "left" (verb vs adjective) is lemmatized incorrectly.
A full pipeline (tokenization + POS + dictionary) is markedly slower than stripping rules.
Time complexity: O(n) + koszt POS-taggingu. Space complexity: O(|L|) słownik morfologiczny.
Whether the lemmatizer receives a part-of-speech tag — critical for grammatically ambiguous words.
Source of morphological knowledge: a dictionary (WordNet), a statistical/neural model (spaCy) or rules (Morfologik for PL).
Dictionary lookup and morphological rules are CPU-bound; when POS tagging uses a neural model it may benefit from a GPU.