tabularfm_logo

TabularFM: An Open Framework For Tabular Foundational Models

1Vietnam National University, Ho Chi Minh City, Vietnam
2IBM Research, Yorktown US
3IBM Research, Dublin Ireland


*Equal Contribution

Abstract

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated millions of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future.

Leaderboards

Rank Model Contact #Params Paper Code Shape score Trend score Overall score
1 GReaT Vadim Borisov 81,912,576 paper github 0.73 (0.16) 0.5 (0.25) 0.61 (0.18)
2 CTGAN Lei Xu 97,243,685 paper github 0.70 (0.16) 0.49 (0.26) 0.59 (0.19)
3 STVAE Quan Tran 9,315,214 N/A github 0.54 (0.16) 0.45 (0.28) 0.50 (0.19)
4 TVAE Lei Xu N/A paper github 0.47 (0.13) 0.49 (0.27) 0.48 (0.17)
5 STVAEM Quan Tran N/A N/A github 0.43 (0.17) 0.40 (0.24) 0.42 (0.16)
Rank Model Contact #Params Paper Code Shape score Trend score Overall score
1 GReaT Vadim Borisov 81,912,576 paper github 0.72 (0.14) 0.56 (0.23) 0.64 (0.16)
2 CTGAN Lei Xu 97,243,685 paper github 0.69 (0.12) 0.53 (0.24) 0.62 (0.15)
3 STVAE Quan Tran 9,315,214 N/A github 0.48 (0.13) 0.43 (0.25) 0.46 (0.13)
4 TVAE Lei Xu N/A paper github 0.39 (0.13) 0.45 (0.27) 0.42 (0.17)
5 STVAEM Quan Tran N/A N/A github 0.43 (0.11) 0.39 (0.22) 0.41 (0.13)
Rank Model Contact #Params Paper Code Shape score Trend score Overall score
1 CTGAN Lei Xu 97,243,685 paper github 0.67 (0.12) 0.55 (0.24) 0.61 (0.14)
2 GReaT Vadim Borisov 81,912,576 paper github 0.68 (0.22) 0.56 (0.27) 0.61 (0.19)
3 STVAE Quan Tran 9,315,214 N/A github 0.55 (0.12) 0.59 (0.27) 0.57 (0.13)
4 TVAE Lei Xu N/A paper github 0.45 (0.13) 0.49 (0.27) 0.47 (0.15)

Transferability performance of pretrained vs. trained from scratch models

Transferability Analysis

BibTeX

TBU