The Transformer model is trained on tokens from a text corpus, learning to predict the next token (autoregression). At sufficient scale (parameters, data, compute), emergent capabilities arise: reasoning, in-context learning, and instruction following.
Previous NLP models were narrowly specialized (separate models for translation, classification, QA). LLMs unify multiple language tasks within a single generic model.
LLMs generate fluent text even without knowledge of a given fact โ instead of saying "I do not know" the model fabricates details. Critical in medical, legal, financial applications.
LLMs have a finite context window (4kโ1M tokens). When exceeded the model loses earlier information. Long documents require chunking + RAG or summarization.
Malicious data in the agent environment (webpage content, email) can override system instructions and hijack the agent. Especially dangerous for agents with tool access.
OpenAI publishes GPT-3 (175B), demonstrating few-shot learning and emergent language capabilities.
OpenAI releases ChatGPT (InstructGPT/GPT-3.5), combining LLM with RLHF. Mass adoption of conversational interface.
Meta releases LLaMA, initiating the era of open-weights large language models.
LLM training and inference relies on Transformer matrix operations natively accelerated by CUDA Tensor Cores (A100, H100, GB200).
Google uses TPUs to train Gemini and PaLM models.