Large Language Models

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

ZZiwen XuKKewei XuHHaoming XuHHaiwen HongLLongtao HuangHHui XueNNingyu ZhangYYongliang ShenGGuozhou ZhengHHuajun ChenSShumin Deng
Published
March 3, 2026
Authors
11

Abstract

Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.

Keywords

Large Language Modelscontrollabilityhierarchical benchmarklanguage featuressentimentpersonalitysteering methodsbehavioral intenttextual output

More in Large Language Models

View all
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities | Paperchime