Large Language Models

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

HHyunjong OkJJaeho Lee
arXiv ID
2601.14152
Published
January 20, 2026
Authors
2
Hugging Face Likes
4
Comments
2

Abstract

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.

Keywords

large language modelsprompt structuremultiple-choice question answeringcausal attentioncausal maskinformation bottleneck

More in Large Language Models

View all
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models | Paperchime