1) Offline phase: the base LLM's weight matrices are decomposed via SVD; lightweight expert vectors 'Z' are trained via RL — each vector specialises in a task category (e.g., math, coding, reasoning). 2) Inference phase, pass 1 (dispatch): the system analyses the prompt and identifies the task type. 3) Inference phase, pass 2 (execute): expert vectors matching the task are dynamically mixed and applied to the singular values of the weights, yielding a model tailored to the specific prompt — without updating the original weights.
Classical fine-tuning and PEFT (LoRA) produce static adapters that cannot handle arbitrary unseen tasks at runtime. Transformer² solves this by dynamically composing expert vectors at inference time.
Base LLM weight matrices decomposed via SVD into U·Σ·Vᵀ. Singular values (Σ) are the application point for expert vectors.
Lightweight task-specialised vectors trained via Reinforcement Learning. They modulate the singular values Σ during inference.
Official
Lightweight classifier that analyses the prompt in the first pass and selects the appropriate set of expert vectors.
Official
If the dispatcher misidentifies the task type, it will select the wrong expert vectors and quality degrades significantly.
RL training of Z vectors can be unstable in the presence of sparse or noisy rewards.
Number of trained Z vectors covering different task categories.
Number of singular values retained during weight decomposition — trades adaptation capacity vs. cost.
Reward function used to train Z vectors (typically task-specific reward).
Conceptually similar to MoE, but routing operates in SVD space rather than across FFN blocks.
The first pass classifies the task; the second applies a mix of expert vectors to the singular values of the weights.
The second inference pass depends on the first pass (sequential dispatch → execute), but expert execution itself is fully parallel.
Both SVD decomposition and LLM inference rely on dense matrix operations well supported by tensor cores.