An engineered prompt is a constructed input text containing: (1) system/role instructions, (2) optional few-shot examples (question-answer), (3) the current user query. Techniques like CoT ask the model to show reasoning ('think step by step'), improving answer quality on complex tasks. Prompt format directly influences which training patterns the model activates.
Language models are sensitive to input formulation: the same model can return significantly different results depending on how a question is phrased, the order of examples, or whether role instructions are added.
LLMs are sensitive to word order, punctuation and formatting. A prompt working on GPT-4 may produce poor results on Claude or Gemini without modification.
A prompt optimized for model X stops working after a model update. Lack of prompt versioning leads to production regressions.
Untrusted input (e.g. webpage content) can override system instructions and hijack the agent — so-called prompt injection.
Brown et al. demonstrate that GPT-3 can perform tasks via in-context examples, launching prompt engineering as a field.
Wei et al. show that asking models to reason step-by-step dramatically improves performance on arithmetic and reasoning tasks.
Chat APIs (OpenAI, Anthropic) standardize system prompt separation, enabling persona and instruction injection.
Works like APE and DSPy propose automatic prompt optimization, reducing reliance on manual engineering.