Natural Language Processing

STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion

SShui-Hsiang HsuTTsung-Hsiang ChouCChen-Jui YuYYao-Chung Fan
Published
January 22, 2026
Authors
4
Word Count
3,423

STAR enhances table retrieval with semantic alignment.

Abstract

Table retrieval is the task of retrieving the most relevant tables from large-scale corpora given natural language queries. However, structural and semantic discrepancies between unstructured text and structured tables make embedding alignment particularly challenging. Recent methods such as QGpT attempt to enrich table semantics by generating synthetic queries, yet they still rely on coarse partial-table sampling and simple fusion strategies, which limit semantic diversity and hinder effective query-table alignment. We propose STAR (Semantic Table Representation), a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion. STAR first applies header-aware K-means clustering to group semantically similar rows and selects representative centroid instances to construct a diverse partial table. It then generates cluster-specific synthetic queries to comprehensively cover the table's semantic space. Finally, STAR employs weighted fusion strategies to integrate table and query embeddings, enabling fine-grained semantic alignment. This design enables STAR to capture complementary information from structured and textual sources, improving the expressiveness of table representations. Experiments on five benchmarks show that STAR achieves consistently higher Recall than QGpT on all datasets, demonstrating the effectiveness of semantic clustering and adaptive weighted fusion for robust table representation. Our code is available at https://github.com/adsl135789/STAR.

Key Takeaways

  • 1

    STAR improves table retrieval through semantic clustering and fusion.

  • 2

    Header-aware clustering selects diverse, representative table rows.

  • 3

    Dynamic weight fusion enhances query-table alignment.

Limitations

  • Relies on informative table headers for clustering.

  • Introduces computational overhead with synthetic query generation.

Keywords

semantic clusteringweighted fusiontable retrievalsemantic representationsynthetic queriesK-means clusteringpartial tableembedding alignment

More in Natural Language Processing

View all
STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion | Paperchime