Natural Language Processing

OpenAutoNLU: Open Source AutoML Library for NLU

GGrigory ArshinovAAleksandr BoriskinSSergey SenichevAAyaz ZaripovDDaria GalimzianovaDDaniil KarpovLLeonid Sanochkin
Published
March 2, 2026
Authors
7
Word Count
5,150

Open-source AutoML library that automatically selects NLU training methods based on your data.

Abstract

OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existing solutions, we introduce data-aware training regime selection that requires no manual configuration from the user. The library also provides integrated data quality diagnostics, configurable out-of-distribution (OOD) detection, and large language model (LLM) features, all within a minimal lowcode API. The demo app is accessible here https://openautonlu.dev.

Key Takeaways

  • 1

    OpenAutoNLU automatically selects optimal training methods based on dataset size without manual configuration.

  • 2

    The library integrates out-of-distribution detection, data quality diagnostics, and LLM augmentation in one unified API.

  • 3

    Data-aware regime selection uses empirically determined thresholds to transition between AncSetFit, SetFit, and fine-tuning approaches.

Limitations

  • LLM-based approaches may outperform on zero-shot tasks but incur higher computational and monetary costs.

  • The library requires Python and is optimized for text classification and NER tasks only.

Keywords

automated machine learningnatural language understandingtext classificationnamed entity recognitiondata-aware training regime selectiondata quality diagnosticsout-of-distribution detectionlarge language models

More in Natural Language Processing

View all
OpenAutoNLU: Open Source AutoML Library for NLU | Paperchime