cos(θ) = (A · B) / (||A|| · ||B||). The dot product of vectors A and B divided by the product of their Euclidean norms. Result in [-1, 1] (for non-negative vectors such as TF-IDF, in [0, 1]): 1 = identical direction, 0 = orthogonality (no shared features).
Euclidean distance between document vectors is dominated by their length — a longer document is "farther" despite identical topic. Cosine similarity normalises this by looking only at the angle.
A zero vector (e.g. a document with no known terms) causes division by zero in the norm.
Recomputing norms for the same vectors repeatedly wastes time.
Time complexity: O(d) na parę, O(n·d) batch. Space complexity: O(d) na wektor.
Whether vectors are pre-normalised to unit length — then cosine = dot product.
Batched cosine similarity is matrix multiplication — ideal for GPUs with dense embeddings.
For sparse vectors (TF-IDF) a SIMD-capable CPU is efficient.