OpenAutoNLU: Open Source AutoML Library for NLU

GGrigory ArshinovAAleksandr BoriskinSSergey SenichevAAyaz ZaripovDDaria GalimzianovaDDaniil KarpovLLeonid Sanochkin

Published: March 2, 2026
Authors: 7
Word Count: 5,150

Open-source AutoML library that automatically selects NLU training methods based on your data.

Abstract

OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existing solutions, we introduce data-aware training regime selection that requires no manual configuration from the user. The library also provides integrated data quality diagnostics, configurable out-of-distribution (OOD) detection, and large language model (LLM) features, all within a minimal lowcode API. The demo app is accessible here https://openautonlu.dev.

Key Takeaways

1
OpenAutoNLU automatically selects optimal training methods based on dataset size without manual configuration.
2
The library integrates out-of-distribution detection, data quality diagnostics, and LLM augmentation in one unified API.
3
Data-aware regime selection uses empirically determined thresholds to transition between AncSetFit, SetFit, and fine-tuning approaches.

Limitations

LLM-based approaches may outperform on zero-shot tasks but incur higher computational and monetary costs.
The library requires Python and is optimized for text classification and NER tasks only.

Keywords

automated machine learningnatural language understandingtext classificationnamed entity recognitiondata-aware training regime selectiondata quality diagnosticsout-of-distribution detectionlarge language models

More in Natural Language Processing

View all

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Anton Korznikov, Andrey Galichin +4

Sparse Autoencoders (SAEs) have emerged as a promising tool for interpreting neural networks by decomposing their activations into sparse sets of human-interpretable features. Recent work has introduc...

Feb 1555

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Kun Yang, Yuxuan Zhu +8

Sequential recommendation increasingly employs latent multi-step reasoning to enhance test-time computation. Despite empirical gains, existing approaches largely drive intermediate reasoning states vi...

Feb 2323

STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts

Zachary Bamberger, Till R. Saenger +4

Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often...

Feb 1518

Semantic Search over 9 Million Mathematical Theorems

Luke Alexander, Eric Leonen +5

Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition t...

Feb 517

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Junxiao Liu, Zhijun Wang +7

Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substan...

Feb 516

More Natural Language Processing papers