Large Language Models

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

OOr ShafranSShaked RonenOOmri FahnSShauli RavfogelAAtticus GeigerMMor Geva
Published
February 2, 2026
Authors
6
Word Count
16,136
Code
Includes code

Language models organize concepts through local geometric regions, not single global directions.

Abstract

Activation decomposition methods in language models are tightly coupled to geometric assumptions on how concepts are realized in activation space. Existing approaches search for individual global directions, implicitly assuming linear separability, which overlooks concepts with nonlinear or multi-dimensional structure. In this work, we leverage Mixture of Factor Analyzers (MFA) as a scalable, unsupervised alternative that models the activation space as a collection of Gaussian regions with their local covariance structure. MFA decomposes activations into two compositional geometric objects: the region's centroid in activation space, and the local variation from the centroid. We train large-scale MFAs for Llama-3.1-8B and Gemma-2-2B, and show they capture complex, nonlinear structures in activation space. Moreover, evaluations on localization and steering benchmarks show that MFA outperforms unsupervised baselines, is competitive with supervised localization methods, and often achieves stronger steering performance than sparse autoencoders. Together, our findings position local geometry, expressed through subspaces, as a promising unit of analysis for scalable concept discovery and model control, accounting for complex structures that isolated directions fail to capture.

Key Takeaways

  • 1

    Language model concepts are scattered across multiple directions, not encoded as single coherent directions in activation space.

  • 2

    Mixtures of Factor Analyzers partitions activation space into regions with local low-dimensional structure for better interpretability.

  • 3

    Local coordinate systems within semantic clusters better capture activation variation than global directional assumptions.

Limitations

  • The approach requires choosing the number of components, which may not be obvious across different models.

  • The method was only tested on two models, limiting generalizability claims to broader language model architectures.

Keywords

Mixture of Factor Analyzersactivation spaceGaussian regionslocal covariance structureconcept discoverymodel controlsparse autoencoders

More in Large Language Models

View all
From Directions to Regions: Decomposing Activations in Language Models via Local Geometry | Paperchime