ICML 2026

Singular Bayesian
Neural Networks

Low-rank Bayesian neural networks with singular posterior geometry, reduced complexity scaling, and scalable uncertainty quantification.

Mame Diarra Touré, David A. Stephens
McGill University
$W = AB^\top$
Low-rank Bayesian posteriors concentrate on rank-constrained manifolds.
TL;DR

We parameterize Bayesian neural network weights through low-rank factors $W = AB^\top$, inducing a singular posterior geometry concentrated on the rank-$r$ manifold. This reduces variational complexity from $O(mn)$ to $O(r(m+n))$ while maintaining competitive uncertainty-aware performance across MLPs, LSTMs, and Transformers.

01

Core idea

From mean-field to structured low-rank geometry

Standard Bayesian neural networks often rely on fully factorized posteriors that ignore structured correlations between weights. We instead introduce low-rank variational factors:

$$A \in \mathbb{R}^{m \times r}, \qquad B \in \mathbb{R}^{n \times r}, \qquad W = AB^\top.$$

Although the factors themselves can be mean-field, the induced posterior over $W$ becomes highly structured, introducing correlations through shared latent factors.

Complexity $O(r(m+n))$ instead of $O(mn)$
Posterior support rank-$r$ manifold singular in ambient matrix space
Architectures MLP · LSTM · Transformer drop-in Bayesian layers
Posterior geometry figure
Posterior geometry. Mean-field posteriors occupy a full-dimensional volume, whereas low-rank posteriors concentrate on the rank-constrained manifold.
02

Theory

Singular posterior support

$$q_W(\mathcal{R}_r)=1, \qquad \lambda(\mathcal{R}_r)=0, \qquad q_W \perp \lambda.$$

The induced posterior is singular with respect to Lebesgue measure on the ambient weight space.

Reduced complexity scaling

$$\sqrt{ \frac{r(m+n)}{mn} }$$

Low-rank posteriors reduce the dominant complexity scaling when $r \ll \min(m,n)$.

Structured correlations

Shared latent factors induce non-trivial covariance structure between weights, unlike fully factorized posteriors.

Weight correlation heatmap
Induced correlation structure in low-rank Bayesian weights.
Combined bound figure
PAC-Bayes and Gaussian complexity perspectives on low-rank scaling.
03

Experimental highlights

MIMIC-III ICU mortality

  • Best reported OOD detection metrics in the paper’s clinical-shift setting
  • Competitive in-domain performance with explicit uncertainty tradeoffs
  • 70% fewer parameters than Full-Rank BBB; 88% fewer than Deep Ensemble

Beijing PM$_{2.5}$ forecasting

    +
  • Best coverage among compared LSTM uncertainty methods
  • 17.4% MAE reduction at 80% retention in selective prediction
  • 64% fewer parameters than Full-Rank BBB

SST-2 Transformer

  • Best AUPR-In and second-best AUROC-OOD among compared Transformer methods
  • 1.5M Low-Rank BBB parameters
  • 13× fewer parameters than Full-Rank BBB; 33× fewer than Deep Ensemble
Radar plot
Selective prediction
LSTM metrics
Transformer radar

Parameter-count intuition

Move the rank slider to compare full-rank and low-rank parameterization for a single $256 \times 256$ layer.

Full rank
65536
Low rank
8192

8× fewer parameters.

04

Resources

arXiv Paper abstract PDF Full paper GitHub Code and repositories Slides Interactive ICML presentation Homepage Main academic page

Citation

@inproceedings{toure2026singular,
  title     = {Singular Bayesian Neural Networks},
  author    = {Toure, Mame Diarra and Stephens, David A.},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}