The algorithm applies a sequence of suffix-stripping rules (e.g. Porter stemmer: -ing, -ed, -s). It operates purely on the surface, based on character patterns, with no grammatical analysis or dictionary. The result (stem) need not be a valid word.
Bag-of-Words and TF-IDF treat "runs", "ran", "running" as separate terms, fragmenting the statistics. Stemming collapses them, shrinking the vocabulary and improving recall.
Over-stemming merges unrelated words ("universal", "university" → "univers"); under-stemming fails to merge related ones.
The Porter stemmer was designed for English; for Polish it yields poor results.
Time complexity: O(n) względem liczby tokenów. Space complexity: O(1) na token.
Choice of stemmer: Porter (mild), Snowball/Porter2 (improved, multilingual), Lancaster (aggressive).
Rule-based string manipulation — no hardware acceleration needed.