Large Language Models

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

HHaoran LiSSucheng RenAAlan YuilleFFeng Wang
Published
February 5, 2026
Authors
4
Word Count
7,707

CoPE enhances LLMs for long context tasks effectively.

Abstract

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoPE frequencies to accommodate unseen positions, and (2) Semantic Modeling, which posits that the attention scores computed with RoPE should always prioritize semantically similar tokens. In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely CoPE: soft clipping lowfrequency components of RoPE. CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping. Extensive experiments demonstrate that simply applying our soft clipping strategy to RoPE yields significant performance gains that scale up to 256k context length, validating our theoretical analysis and establishing CoPE as a new state-of-the-art for length generalization. Our code, data, and models are available at https://github.com/hrlics/CoPE.

Key Takeaways

  • 1

    CoPE improves LLM performance on long context tasks.

  • 2

    Soft clipping of low-frequency components addresses OOD behavior.

  • 3

    CoPE prevents spectral leakage and semantic decay.

Limitations

  • Requires careful tuning of clipping parameters.

  • May not fully resolve all long-context challenges.

Keywords

Rotary Positional Embeddingcontext scalingLarge Language Modelsout-of-distribution mitigationSemantic Modelingattention scoresspectral leakagesoft clippinglength generalization

More in Large Language Models

View all
CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs | Paperchime