Skip to content

YentlCollin/Projet_TabPFN_yentl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimizing Tabular Foundation Models: From Intelligent Sampling to Efficient Attention

Authors: Yentl Collin & Clément Dureuil Context: Deep Learning Semester Project


Overview

This project investigates two complementary strategies for improving Tabular Foundation Models (TFMs) based on nanoTabPFN:

Contribution Goal Method Gain
Part I – Intelligent Context Sampling Better use of the fixed context budget GBDT-based learned retrieval ≈4× context compression, +0.18 AUC
Part II – Efficient Attention Break the O(N²) memory/time bottleneck ISAB / Sparse / α-Entmax Linear scaling, +0.04 Acc at k=150

The benchmark dataset is Bank Marketing (OpenML id=1461), a binary classification task with ~10k balanced samples.


Repository Structure

Projet_TabPFN_yentl/
│
├── nanoTabPFN/                      # Core model implementations
│   ├── model.py                     # Classic NanoTabPFN (baseline, O(N²) attention)
│   ├── model_isab.py                # ISAB variant: O(N·M) row attention (inducing points)
│   ├── model_sparse.py              # Top-K sparse row attention: O(N·k)
│   ├── model_splash.py              # α-Entmax attention with FIXED alpha (hyperparameter)
│   ├── model_splash_learned.py      # α-Entmax attention with LEARNED α per head (nn.Parameter)
│   ├── train.py                     # Training loop + PriorDumpDataLoader (HDF5)
│   ├── experiment.ipynb             # Reproduce original nanoTabPFN paper results
│   └── README.md                    # Original nanoTabPFN README
│
├── TabICL/                      # In-Context Learning evaluation framework
│   ├── sampling.py              # Random and k-NN context sampling (baselines)
│   ├── retrival_1.py            # LearnedContextSelector: GBDT-based utility predictor
│   ├── eval.py                  # TabICLEvaluator: runs and plots benchmark results
│   └── main copy.ipynb          # Interactive notebook for ICS experiments
│
├── src/
│   ├── data/
│   │   └── openml_bank_marketing.py   # Data loading, balancing, feature selection
│   └── eval/
│       ├── benchmark.py               # Scalability & performance benchmarks
│       └── metrics.py                 # Binary classification metrics (Acc, AUC, F1, LogLoss)
│
├── Rendu/
│   ├── poster/
│   │   ├── poster.tex           # Beamer poster (Gemini theme)
│   │   └── image/               # Figures: tabpfn.png, learned_retrieval.png,
│   │                            #          isab.png, inference_time.png
│   └── report/
│       └── report.tex           # 5-page academic report (this document)
│
├── main.ipynb                   # Main experiment notebook
├── benchmark.ipynb              # Scalability benchmark notebook
├── pre_training.ipynb           # Pre-training experiments
├── requirements.txt             # Python dependencies
└── README.md                    # This file

Quick Start

1. Install Dependencies

pip install -r requirements.txt
# Additional for Part I (learned retrieval):
pip install tabicl
# Additional for training nanoTabPFN:
pip install schedulefree h5py

requirements.txt includes:

  • numpy, scipy, pandas — scientific stack
  • scikit-learn, openml — datasets and ML baselines
  • torch>=2.0 — deep learning (CPU / Apple Silicon / CUDA compatible)
  • matplotlib, seaborn — visualization
  • tqdm, psutil — progress and memory tracking

2. Download the Pre-training Data

cd nanoTabPFN
curl http://ml.informatik.uni-freiburg.de/research-artifacts/nanoTabPFN/300k_150x5_2.h5 \
     --output 300k_150x5_2.h5

This HDF5 file contains 300k synthetic tabular datasets (each 150 rows × 5 features), generated from Gaussian Process / Bayesian Network priors.

3. Pre-train a Model

from nanoTabPFN.model import NanoTabPFNModel, NanoTabPFNClassifier
from nanoTabPFN.train import PriorDumpDataLoader, train, get_default_device

device = get_default_device()   # auto-detects CUDA > MPS > CPU

model = NanoTabPFNModel(
    embedding_size=128,
    num_attention_heads=8,
    mlp_hidden_size=512,
    num_layers=4,
    num_outputs=3
)
prior = PriorDumpDataLoader("nanoTabPFN/300k_150x5_2.h5", num_steps=2500, batch_size=32)
model, history = train(model, prior, lr=4e-3, device=device, steps_per_eval=25)

Pre-training takes ~5–10 minutes on a GPU (A100 / M-series Apple Silicon).

4. Run Inference (sklearn-like API)

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

clf = NanoTabPFNClassifier(model, device)
clf.fit(X_train, y_train)
prob = clf.predict_proba(X_test)
print("AUC:", roc_auc_score(y_test, prob[:, 1]))
print("Acc:", accuracy_score(y_test, prob.argmax(axis=1)))

Part I — Intelligent Context Sampling

Motivation

When the training pool $N$ exceeds the context budget $k$, random sampling wastes slots on uninformative rows. Our Learned Retrieval strategy trains a GBDT to predict the marginal utility of each candidate example for a given query, enabling a 4× context compression.

Usage

from TabICL.retrival_1 import LearnedContextSelector
from src.data.openml_bank_marketing import load_data, process_data
from sklearn.model_selection import train_test_split

# 1. Load and preprocess Bank Marketing
X, y = load_data()
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train, X_test = process_data(X_train_raw, X_test_raw, y_train, top_k_features=5, method="mi")

# 2. Train the GBDT selector (offline mining phase — takes a few minutes)
selector = LearnedContextSelector(n_candidates_mining=20)
selector.fit(X_train, y_train, n_training_queries=50)

# 3. At inference time: retrieve k "golden" examples for any query
query = X_test.iloc[0]
X_ctx, y_ctx = selector.select(query, k=16, X_pool=X_train, y_pool=y_train)

Benchmarking (Random vs k-NN vs Learned Retrieval)

from TabICL.eval import TabICLEvaluator
from TabICL.sampling import get_random_context, get_knn_context

evaluator = TabICLEvaluator(X_train, y_train, X_test, y_test, n_eval=200)

evaluator.run("Random",   get_random_context,          context_sizes=[4, 8, 16, 32, 64])
evaluator.run("KNN",      get_knn_context,              context_sizes=[4, 8, 16, 32, 64])
evaluator.run("Learned",  selector.select,              context_sizes=[4, 8, 16, 32, 64])

evaluator.plot_results()   # Plots AUC, Accuracy, Log-Loss vs context size

Part II — Efficient Attention Variants

All variants share the same sklearn-like interface and training pipeline as the classic model.

Available Models

File Model class Classifier class Row-attention Complexity
model.py NanoTabPFNModel NanoTabPFNClassifier Standard softmax O(N²)
model_isab.py ISABTabPFNModel ISABTabPFNClassifier Induced Set (M=32) O(N·M)
model_sparse.py SparseTabPFNModel SparseTabPFNClassifier Top-K mask (k=16) O(N·k)
model_splash.py SplashTabPFNModel SplashTabPFNClassifier α-entmax, fixed α O(N²)*
model_splash_learned.py LearnedSplashTabPFNModel LearnedSplashTabPFNClassifier α-entmax, learned α/head O(N²)*

* Same asymptotic complexity as Classic, but sparser and less noisy distributions. Memory savings require a custom sparse CUDA kernel (not included).

Instantiating an Efficient Variant

# ISAB variant (recommended for large N)
from nanoTabPFN.model_isab import ISABTabPFNModel
model_isab = ISABTabPFNModel(
    embedding_size=128,
    num_attention_heads=8,
    mlp_hidden_size=512,
    num_layers=4,
    num_outputs=3,
    num_inducing_points=32    # M — controls accuracy/speed trade-off
)

# Sparse variant
from nanoTabPFN.model_sparse import SparseTabPFNModel
model_sparse = SparseTabPFNModel(
    embedding_size=128,
    num_attention_heads=8,
    mlp_hidden_size=512,
    num_layers=4,
    num_outputs=3,
    k_neighbors=16            # k — number of attended neighbors per query
)

# α-Entmax, fixed alpha (same for all heads)
from nanoTabPFN.model_splash import SplashTabPFNModel
model_splash = SplashTabPFNModel(
    embedding_size=128,
    num_attention_heads=8,
    mlp_hidden_size=512,
    num_layers=4,
    num_outputs=3,
    alpha=1.5,        # fixed entmax exponent: 1→softmax, 1.5→mild sparse, 2→sparsemax
    entmax_iters=5,   # Halley-bisection iterations for τ root-finding
)

# α-Entmax, one learned α per attention head (nn.Parameter, trained with the model)
from nanoTabPFN.model_splash_learned import LearnedSplashTabPFNModel
model_splash_learned = LearnedSplashTabPFNModel(
    embedding_size=128,
    num_attention_heads=8,
    mlp_hidden_size=512,
    num_layers=4,
    num_outputs=3,
    init_alpha=1.5,   # starting value; each head's α is optimised independently
)

# After training, inspect which heads converged to dense vs. sparse:
print(model_splash_learned.get_learned_alphas())
# {'layer_0': {'feature': tensor([1.43, 1.78, ...]), 'datapoint': tensor([...])}, ...}

All variants use the same train() function from train.py — just swap the model.


Benchmarking

Performance Benchmark (multi-context)

# In benchmark.ipynb or src/eval/benchmark.py
from src.eval.benchmark import run_performance_eval, measure_scalability
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
models_dict = {
    "Classic": model_classic,
    "ISAB":    model_isab,
    "Sparse":  model_sparse,
    "Entmax":  model_splash,
}

# Performance on real datasets
df_perf = run_performance_eval(models_dict, device)
print(df_perf.groupby(["Dataset", "Model"])[["Accuracy", "AUC"]].mean())

# Scalability (time + peak RAM vs N)
df_scale = measure_scalability(models_dict, device, n_ranges=[100, 250, 500, 1000, 1500, 2000])

Metrics Available

src/eval/metrics.py provides a clean, torch-compatible interface:

from src.eval.metrics import compute_binary_metrics

metrics = compute_binary_metrics(y_true, prob_pos=prob[:, 1])
# Returns dict: {"accuracy", "f1", "auc", "logloss"}

Key Results

Method k=25 Acc k=25 AUC k=150 Acc k=150 AUC Complexity
Random Forest 0.66 0.72 0.75 0.84
XGBoost 0.66 0.74 0.78 0.86
ClassicPFN 0.66 0.74 0.78 0.86 O(N²)
ISABPFN 0.68 0.74 0.77 0.85 O(N·M)
SparsePFN 0.70 0.77 0.76 0.85 O(N·k)
EntmaxPFN 0.68 0.74 0.82 0.87 O(N²)
RetrievalPFN 0.84 0.92 0.88 0.94

Reproducing the Poster & Report

# Compile with tectonic (recommended — no LaTeX installation required)
cd Rendu/poster  && tectonic poster.tex   # → poster.pdf
cd ../report     && tectonic report.tex   # → report.pdf

# Or with a standard LaTeX installation
pdflatex poster.tex
pdflatex report.tex

References

  1. Hollmann et al. (2023). TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. ICLR 2023.
  2. Pfefferle et al. (2025). nanoTabPFN: A Lightweight Reimplementation of TabPFN. arXiv:2511.03634.
  3. Rubin, Herzig & Berant (2022). Learning To Retrieve Prompts for In-Context Learning. NAACL 2022.
  4. Lee et al. (2019). Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. ICML 2019.
  5. Gonçalves, Treviso & Martins (2025). AdaSplash: Adaptive Sparse Flash Attention. ICML 2025 (Oral).

About

Tabular foundation model experiments with learned context sampling and efficient attention

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors