Speech & Audio AI

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

WWarit SirichotedumrongAAdisai Na-ThalangPPotsawee ManakulPPittawat TaveekitworachaiSSittipong SripaisarnmongkolKKunat Pipatanakul
Published
January 19, 2026
Authors
6
Word Count
5,646

Compact model achieves real-time Thai ASR with high accuracy.

Abstract

Large encoder-decoder models like Whisper achieve strong offline transcription but remain impractical for streaming applications due to high latency. However, due to the accessibility of pre-trained checkpoints, the open Thai ASR landscape remains dominated by these offline architectures, leaving a critical gap in efficient streaming solutions. We present Typhoon ASR Real-time, a 115M-parameter FastConformer-Transducer model for low-latency Thai speech recognition. We demonstrate that rigorous text normalization can match the impact of model scaling: our compact model achieves a 45x reduction in computational cost compared to Whisper Large-v3 while delivering comparable accuracy. Our normalization pipeline resolves systemic ambiguities in Thai transcription --including context-dependent number verbalization and repetition markers (mai yamok) --creating consistent training targets. We further introduce a two-stage curriculum learning approach for Isan (north-eastern) dialect adaptation that preserves Central Thai performance. To address reproducibility challenges in Thai ASR, we release the Typhoon ASR Benchmark, a gold-standard human-labeled datasets with transcriptions following established Thai linguistic conventions, providing standardized evaluation protocols for the research community.

Key Takeaways

  • 1

    Data quality is crucial for low-resource ASR performance.

  • 2

    Compact model outperforms larger baselines in real-time tasks.

  • 3

    Normalization pipeline improves Thai transcription consistency.

Limitations

  • Model optimized for Thai-dominant speech, not code-switching.

  • Strict normalization may require post-processing for readability.

Keywords

FastConformer-Transducerstreaming applicationstext normalizationcurriculum learningThai ASRspeech recognitioncomputational costmodel scalinglinguistic conventionsbenchmark dataset

More in Speech & Audio AI

View all
Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition | Paperchime