Should you care about Tabular AI?

Marie Brayer

Mar 6

SOTA #7: TabPFN, CARTE, and few-shot data science

Read →

7 Comments

Nikhil Kapila

Mar 7

Eye opening read!

While I wonder how they created a generalized dataset for TabFN. CARTE makes a lot of sense from a model architecture perspective!

Your post motivates me to look more into tabular models now 😄

Expand full comment

Maxime @Storen

Mar 7

“Wait a minute. Can’t normal LLMs work with tabular data?" => On a similar note, I stumbled upon Time-LLM, a paper describing a method using LLM for Time Series Forecasting (https://arxiv.org/abs/2310.01728). The overall performance is quite good actually!

As time series can be seen as a type of tabular data (with data points collected over time), there might be some interesting work to do.

Btw, thanks for this newsletter, I enjoyed reading it!

Expand full comment

Reply (1)

Marie Brayer

Mar 7

Time series are a different beast I think, as many papers have pointed out that the benchmarks are 100% poisoned (ie in LLM training data)

I would speak with Valeryi who is super opinionated about this / read his content if you want to know more (https://www.linkedin.com/posts/activity-7300552983686053889-MC4i?utm_source=share&utm_medium=member_desktop&rcm=ACoAAA65rMgBj8nHAQG4OcEV7JJoYG1UnxmosQM)

TabPFN has a branch that does time series in an interesting way btw (they "featurized" time series): https://www.youtube.com/watch?v=qFnYgM2Yvfs

Another great person working on a Foundation model for time series is Geoffrey Negiar (https://www.linkedin.com/in/geoffrey-negiar/en) who is building on his Amazon experience

Expand full comment

Yohan Obadia

Mar 6

Very interesting. XGBoost has already been a pain to work with and risk overfitting on small datasets. CARTE seems really smart. What a long way from the first GNN.

Expand full comment

Omar Hedeya

Mar 8

Very well written, thanks a lot! Quick question, if I understood correctly this means XGBoost still is better in two use cases:

- Very large tables with a lot of rows

- When explanability is required

Is that correct?

If yes, what will be the killer use-case of tabular AI in your opinion?

Expand full comment

Reply (2)

Mar 13

You could argue that a boosting tree model with +300 weak learners is not very explainable either. You need black box methods like permutation sampling or Shapley values to extract some feature importances, with major theoretical limitations.

Expand full comment

Marie Brayer

Mar 8

I don’t think we’ve seen proper scale yet. It’s a bit like the time where deep learning x NLP was good for stuffs like SwiftKey and sentiment guessing

Expand full comment

SOTA by Marie Brayer

Should you care about Tabular AI?