Hacker News new | past | comments | ask | show | jobs | submit login

The Conformal Prediction advocates (especially a certain prominent Twitter account) tend to rehash old frequentist-vs-bayesian arguments with more heated rhetoric than strictly necessary. That fight has been going on for almost a century now. Bayesian counterargument (in caricature form) would be that MLE frequentists just choose an arbitrary (flat) prior, and penalty hyperparameters (common in NN) are a de facto prior. The formal guarantees only have bite in the asymptotic setting or require convoluted statements about probabilities over repeated experiments; and asymptotically, the choice of prior doesn't matter anyway.

(I'm a moderate that uses both approaches, seeing them as part of a general hierarchical modeling method, which means I get mocked by either side for lack of purity).

Bayesians are losing ground at the moment because their computational methods haven't been advanced as fast by the GPU revolution for reasons having to do with difficulty in parallelization, but there's serious practical work (especially using JAX) to catch up, and the whole normalizing flow literature might just get us past the limitations of MCMC for hard problems.

But having said that, Conformal Prediction works as advertised for UQ as a wrapper on any point estimating model. If you've got the data for it - and in the ML setting you do - and you don't care about things like missing data imputation, error in inputs, non-iid spatio-temporal and hierarchical structures, mixtures of models, evidence decay, unbalanced data where small-data islands coexist big data - all the complicated situations where Bayesian methods just automatically work and other methods require elaborate workarounds, yup, use Conformal Prediction.

Calibration is also a pretty magical way to improve just about any estimator. It's cheap to do and it works (although hard to guarantee anything with that in the general case...)

And don't forget quantile regression penalties! Awkward to apply in the NN setting, but an easy and effective way to do UQ in XGBoost world.






"Bayesian counterargument (in caricature form) would be that MLE frequentists just choose an arbitrary (flat) prior, and penalty hyperparameters (common in NN) are a de facto prior."

This has been my view for a while now. Is this not correct?

In general, I think the idea of a big "frequentist vs Bayesian" debate is silly. I think it is very useful to take frequentist ideas and see what they look like from a Bayesian point of view, and vice versa (when applicable). I think this is pretty much the general stance among most people in the field - it's generally expected that one will understand that regularization methods equate to certain priors, for instance, and in general be able to relate these two perspectives as much as possible.


Yeah, I know the account you are talking about, it really is a bit over the top. It's a shame, I've met a bunch of people who mentioned that they were actually turned away from Conformal Prediction due to them.

> But having said that, Conformal Prediction works as advertised for UQ as a wrapper on any point estimating model. If you've got the data for it - and in the ML setting you do - and you don't care about things like missing data imputation, error in inputs, non-iid spatio-temporal and hierarchical structures, mixtures of models, evidence decay, unbalanced data where small-data islands coexist big data - all the complicated situations where Bayesian methods just automatically work and other methods require elaborate workarounds, yup, use Conformal Prediction.

Many of these things can actually work really well with Conformal Prediction, but the algorithms require extensions (much like if you are doing Bayesian inference, you also need to update your model accordingly!). They generally end up being some form of reweighting to compensate for the distribution shifts (excluding the Online Conformal Prediction literature, which is another beast entirely). Also, worth noting that if you have iid data then Conformal Prediction is remarkably data-efficient; as little as 20 samples are enough for it to start working for 95% predictive intervals, and with 50 samples (and with almost surely unique conformity scores) it's going to match 95% coverage fairly tightly.


Are we talking about NN Taleb? I am curious about the twitter persona.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: