
K. Fisher and Y. M. MarzoukCan Bayesian Neural Networks Make Confident Predictions?Preprint, (2025).

L. Cao, J. Chen, M. Brennan, T. O’Leary-Roseberry, Y. M. Marzouk and O. GhattasLazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transportPreprint, (2024).

R. Baptista, A.-A. Pooladian, M. Brennan, Y. M. Marzouk and J. Niles-WeedConditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier mapsPreprint, (2024).

F. Y. Li, R. Baptista and Y. M. MarzoukExpected information gain estimation via density approximations: Sample allocation and dimension reductionPreprint, (2024).

H. Lu, L. Salo-Salgado, Y. M. Marzouk and R. JamesUncertainty Quantification of Fluid Leakage and Fault Instability in Geologic CO2 StoragePreprint, (2024).

A. Belhadji, Q. J. Zhu and Y. M. MarzoukOn the design of scalable, high-precision spherical-radial Fourier featuresPreprint, (2024).

M. T. C. Li, T. Cui, F. Y. Li, Y. M. Marzouk and O. ZahmSharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalitiesPreprint, (2024).

M. Le Provost, J. Glaubitz and Y. M. MarzoukPreserving linear invariants in ensemble filtering methodsPreprint, (2024).

F. Y. Li, A. Belhadji and Y. M. MarzoukNonlinear Bayesian optimal experimental design using logarithmic Sobolev inequalitiesPreprint, (2024).

G. Gottwald, F. Y. Li, Y. M. Marzouk and S. ReichStable generative modeling using Schrödinger bridgesPreprint, (2024).

X. Zhang, J. Blanchet, Y. M. Marzouk, V. A. Nguyen and S. WangWasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying ExtremesPreprint, (2023).

Z. O. Wang, R. Baptista, Y. M. Marzouk, L. Ruthotto and D. VermaEfficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian InferencePreprint, (2023).

M. Le Provost, R. Baptista, J. D. Eldredge and Y. M. MarzoukAn adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localizationPreprint, (2023).

P.-B. Rubio, Y. M. Marzouk and M. ParnoA transport approach to sequential simulation-based inferencePreprint, (2023).

R. Baptista, B. Hosseini, N. B. Kovachki, Y. M. Marzouk and A. SagivAn Approximation Theory Framework for Measure-Transport Sampling AlgorithmsPreprint, (2023).

B. J. Zhang, Y. M. Marzouk and K. SpiliopoulosTransport map unadjusted Langevin algorithmsPreprint, (2023).

J. Pidstrigach, Y. M. Marzouk, S. Reich and S. WangInfinite-Dimensional Diffusion Models for Function SpacesPreprint, (2022).

X. Zhang, J. Blanchet, Y. M. Marzouk, V. A. Nguyen and S. WangDistributionally robust Gaussian process regression and Bayesian inverse problemsPreprint, (2022).

R. Baptista, Y. M. Marzouk and O. ZahmGradient-based data and parameter dimension reduction for Bayesian models: an information theoretic perspectivePreprint, (2022).

R. Baptista, L. Cao, J. Chen, O. Ghattas, F. Y. Li, Y. M. Marzouk and J. T. OdenBayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transportPreprint, (2022).

B. J. Zhang, T. Sahai and Y. M. MarzoukComputing eigenfunctions of the multidimensional Ornstein-Uhlenbeck operatorPreprint, (2021).

A. Scarinci, M. Fehler and Y. M. MarzoukBayesian inference under model misspecification using transport-Lagrangian distances: an application to seismic inversionPreprint, (2021).

R. Baptista, Y. M. Marzouk, R. Morrison and O. ZahmLearning non-Gaussian graphical models via Hessian scores and triangular transportPreprint, (2021).

C. Feng and Y. M. MarzoukA layered multiple importance sampling scheme for focused optimal Bayesian experimental designPreprint, (2019).

F. Augustin and Y. M. MarzoukA trust region method for derivative-free nonlinear constrained stochastic optimizationPreprint, (2017).

X. Huan and Y. M. MarzoukSequential Bayesian optimal experimental design via approximate dynamic programmingPreprint, (2016).

N. Lowry, R. Mangoubi, M. Desai, Y. M. Marzouk and P. SammakBayesian level sets for image segmentationPreprint, (2015).

F. Augustin and Y. M. MarzoukNOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraintsPreprint, (2014).
No preprints found matching filter.
Can Bayesian Neural Networks Make Confident Predictions?
Bayesian inference promises a framework for principled uncertainty quantification of neural network predictions. Barriers to adoption include the difficulty of fully characterizing posterior distributions on network parameters and the interpretability of posterior predictive distributions. We demonstrate that under a discretized prior for the inner layer weights, we can exactly characterize the posterior predictive distribution as a Gaussian mixture. This setting allows us to define equivalence classes of network parameter values which produce the same likelihood (training error) and to relate the elements of these classes to the network’s scaling regime – defined via ratios of the training sample size, the size of each layer, and the number of final layer parameters. Of particular interest are distinct parameter realizations that map to low training error and yet correspond to distinct modes in the posterior predictive distribution. We identify settings that exhibit such predictive multimodality, and thus provide insight into the accuracy of unimodal posterior approximations. We also characterize the capacity of a model to "learn from data" by evaluating contraction of the posterior predictive in different scaling regimes.
BibTeX entry
@article{fisher-bayesiannets-2025, author = "Fisher, K. and Marzouk, Y. M.", journal = "Preprint", month = "1", title = "Can Bayesian Neural Networks Make Confident Predictions?", year = "2025", doi = "10.48550/arXiv.2501.11773", abstract = "Bayesian inference promises a framework for principled uncertainty quantification of neural network predictions. Barriers to adoption include the difficulty of fully characterizing posterior distributions on network parameters and the interpretability of posterior predictive distributions. We demonstrate that under a discretized prior for the inner layer weights, we can exactly characterize the posterior predictive distribution as a Gaussian mixture. This setting allows us to define equivalence classes of network parameter values which produce the same likelihood (training error) and to relate the elements of these classes to the network's scaling regime -- defined via ratios of the training sample size, the size of each layer, and the number of final layer parameters. Of particular interest are distinct parameter realizations that map to low training error and yet correspond to distinct modes in the posterior predictive distribution. We identify settings that exhibit such predictive multimodality, and thus provide insight into the accuracy of unimodal posterior approximations. We also characterize the capacity of a model to "learn from data" by evaluating contraction of the posterior predictive in different scaling regimes.", keywords = "bayesian neural networks,model misspecification,Bayesian inference" }
LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transport
We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. During the online phase, when given observational data, we seek rapid posterior approximation using surrogate-driven training of a lazy map [Brennan et al., NeurIPS, (2020)], i.e., a structure-exploiting transport map with low-dimensional nonlinearity. The trained lazy map then produces approximate posterior samples or density evaluations. Our surrogate construction is optimized for amortized Bayesian inversion using lazy map variational inference. We show that (i) the derivative-based reduced basis architecture [O’Leary-Roseberry et al., Comput. Methods Appl. Mech. Eng., 388 (2022)] minimizes the upper bound on the expected error in surrogate posterior approximation, and (ii) the derivative-informed training formulation [O’Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] minimizes the expected error due to surrogate-driven transport map optimization. Our numerical results demonstrate that LazyDINO is highly efficient in cost amortization for Bayesian inversion. We observe one to two orders of magnitude reduction of offline cost for accurate posterior approximation, compared to simulation-based amortized inference via conditional transport and conventional surrogate-driven transport. In particular, LazyDINO outperforms Laplace approximation consistently using fewer than 1000 offline samples, while other amortized inference methods struggle and sometimes fail at 16,000 offline samples.
BibTeX entry
@article{cao-lazydino-2024, author = "Cao, L. and Chen, J. and Brennan, M. and O'Leary-Roseberry, T. and Marzouk, Y. M. and Ghattas, O.", journal = "Preprint", month = "11", title = "LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transport", year = "2024", doi = "10.48550/arXiv.2411.12726", abstract = "We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. During the online phase, when given observational data, we seek rapid posterior approximation using surrogate-driven training of a lazy map [Brennan et al., NeurIPS, (2020)], i.e., a structure-exploiting transport map with low-dimensional nonlinearity. The trained lazy map then produces approximate posterior samples or density evaluations. Our surrogate construction is optimized for amortized Bayesian inversion using lazy map variational inference. We show that (i) the derivative-based reduced basis architecture [O'Leary-Roseberry et al., Comput. Methods Appl. Mech. Eng., 388 (2022)] minimizes the upper bound on the expected error in surrogate posterior approximation, and (ii) the derivative-informed training formulation [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] minimizes the expected error due to surrogate-driven transport map optimization. Our numerical results demonstrate that LazyDINO is highly efficient in cost amortization for Bayesian inversion. We observe one to two orders of magnitude reduction of offline cost for accurate posterior approximation, compared to simulation-based amortized inference via conditional transport and conventional surrogate-driven transport. In particular, LazyDINO outperforms Laplace approximation consistently using fewer than 1000 offline samples, while other amortized inference methods struggle and sometimes fail at 16,000 offline samples.", keywords = "transport,dimension reduction,surrogates,amortized inference" }
Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps
Conditional simulation is a fundamental task in statistical modeling: Generate samples from the conditionals given finitely many data points from a joint distribution. One promising approach is to construct conditional Brenier maps, where the components of the map pushforward a reference distribution to conditionals of the target. While many estimators exist, few, if any, come with statistical or algorithmic guarantees. To this end, we propose a non-parametric estimator for conditional Brenier maps based on the computational scalability of \emph{entropic} optimal transport. Our estimator leverages a result of Carlier et al. (2010), which shows that optimal transport maps under a rescaled quadratic cost asymptotically converge to conditional Brenier maps; our estimator is precisely the entropic analogues of these converging maps. We provide heuristic justifications for choosing the scaling parameter in the cost as a function of the number of samples by fully characterizing the Gaussian setting. We conclude by comparing the performance of the estimator to other machine learning and non-parametric approaches on benchmark datasets and Bayesian inference problems.
BibTeX entry
@article{baptista-nonparametricbrenier-2024, author = "Baptista, R. and Pooladian, A.-A. and Brennan, M. and Marzouk, Y. M. and Niles-Weed, J.", journal = "Preprint", month = "11", title = "Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps", year = "2024", doi = "10.48550/arXiv.2411.07154", abstract = "Conditional simulation is a fundamental task in statistical modeling: Generate samples from the conditionals given finitely many data points from a joint distribution. One promising approach is to construct conditional Brenier maps, where the components of the map pushforward a reference distribution to conditionals of the target. While many estimators exist, few, if any, come with statistical or algorithmic guarantees. To this end, we propose a non-parametric estimator for conditional Brenier maps based on the computational scalability of \emph{entropic} optimal transport. Our estimator leverages a result of Carlier et al. (2010), which shows that optimal transport maps under a rescaled quadratic cost asymptotically converge to conditional Brenier maps; our estimator is precisely the entropic analogues of these converging maps. We provide heuristic justifications for choosing the scaling parameter in the cost as a function of the number of samples by fully characterizing the Gaussian setting. We conclude by comparing the performance of the estimator to other machine learning and non-parametric approaches on benchmark datasets and Bayesian inference problems.", keywords = "optimal transport,brenier maps,conditional simulation,entropic optimal transport" }
Expected information gain estimation via density approximations: Sample allocation and dimension reduction
Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the MSE of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.
BibTeX entry
@article{li-eigdimensionreduction-2024, author = "Li, F. Y. and Baptista, R. and Marzouk, Y. M.", journal = "Preprint", month = "11", title = "Expected information gain estimation via density approximations: Sample allocation and dimension reduction", year = "2024", doi = "10.48550/arXiv.2411.08390", abstract = "Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the MSE of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.", keywords = "expected information gain,optimal experimental design,density estimation,dimension reduction" }
Uncertainty Quantification of Fluid Leakage and Fault Instability in Geologic CO2 Storage
Geologic CO2 storage is an important strategy for reducing greenhouse gas emissions to the atmosphere and mitigating climate change. In this process, coupling between mechanical deformation and fluid flow in fault zones is a key determinant of fault instability, induced seismicity, and CO2 leakage. Using a recently developed methodology, PREDICT, we obtain probability distributions of the permeability tensor in faults from the stochastic placement of clay smears that accounts for geologic uncertainty. We build a comprehensive set of fault permeability scenarios from PREDICT and investigate the effects of uncertainties from the fault zone internal structure and composition on forecasts of CO2 permanence and fault stability. To tackle the prohibitively expensive computational cost of the large number of simulations required to quantify uncertainty, we develop a deep-learning-based surrogate model capable of predicting flow migration, pressure buildup, and geomechanical responses in CO2 storage operations. We also compare our probabilistic estimation of CO2 leakage and fault instability with previous studies based on deterministic estimates of fault permeability. The results highlight the importance of including uncertainty and anisotropy in modeling of complex fault structures and improved management of geologic CO2 storage projects.
BibTeX entry
@article{lu-co2storage-2024, author = "Lu, H. and Salo-Salgado, L. and Marzouk, Y. M. and James, R.", journal = "Preprint", month = "10", title = "Uncertainty Quantification of Fluid Leakage and Fault Instability in Geologic CO2 Storage", year = "2024", doi = "10.48550/arXiv.2411.08039", abstract = "Geologic CO2 storage is an important strategy for reducing greenhouse gas emissions to the atmosphere and mitigating climate change. In this process, coupling between mechanical deformation and fluid flow in fault zones is a key determinant of fault instability, induced seismicity, and CO2 leakage. Using a recently developed methodology, PREDICT, we obtain probability distributions of the permeability tensor in faults from the stochastic placement of clay smears that accounts for geologic uncertainty. We build a comprehensive set of fault permeability scenarios from PREDICT and investigate the effects of uncertainties from the fault zone internal structure and composition on forecasts of CO2 permanence and fault stability. To tackle the prohibitively expensive computational cost of the large number of simulations required to quantify uncertainty, we develop a deep-learning-based surrogate model capable of predicting flow migration, pressure buildup, and geomechanical responses in CO2 storage operations. We also compare our probabilistic estimation of CO2 leakage and fault instability with previous studies based on deterministic estimates of fault permeability. The results highlight the importance of including uncertainty and anisotropy in modeling of complex fault structures and improved management of geologic CO2 storage projects.", keywords = "co2 storage,uncertainty propagation,surrogate modeling" }
On the design of scalable, high-precision spherical-radial Fourier features
Approximation using Fourier features is a popular technique for scaling kernel methods to large-scale problems, with myriad applications in machine learning and statistics. This method replaces the integral representation of a shift-invariant kernel with a sum using a quadrature rule. The design of the latter is meant to reduce the number of features required for high-precision approximation. Specifically, for the squared exponential kernel, one must design a quadrature rule that approximates the Gaussian measure on ℝd. Previous efforts in this line of research have faced difficulties in higher dimensions. We introduce a new family of quadrature rules that accurately approximate the Gaussian measure in higher dimensions by exploiting its isotropy. These rules are constructed as a tensor product of a radial quadrature rule and a spherical quadrature rule. Compared to previous work, our approach leverages a thorough analysis of the approximation error, which suggests natural choices for both the radial and spherical components. We demonstrate that this family of Fourier features yields improved approximation bounds.
BibTeX entry
@article{belhadji-sphericalfourier-2024, author = "Belhadji, A. and Zhu, Q. J. and Marzouk, Y. M.", journal = "Preprint", month = "8", title = "On the design of scalable, high-precision spherical-radial Fourier features", year = "2024", doi = "10.48550/arXiv.2408.13231", abstract = "Approximation using Fourier features is a popular technique for scaling kernel methods to large-scale problems, with myriad applications in machine learning and statistics. This method replaces the integral representation of a shift-invariant kernel with a sum using a quadrature rule. The design of the latter is meant to reduce the number of features required for high-precision approximation. Specifically, for the squared exponential kernel, one must design a quadrature rule that approximates the Gaussian measure on ℝd. Previous efforts in this line of research have faced difficulties in higher dimensions. We introduce a new family of quadrature rules that accurately approximate the Gaussian measure in higher dimensions by exploiting its isotropy. These rules are constructed as a tensor product of a radial quadrature rule and a spherical quadrature rule. Compared to previous work, our approach leverages a thorough analysis of the approximation error, which suggests natural choices for both the radial and spherical components. We demonstrate that this family of Fourier features yields improved approximation bounds." }
Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities
Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure π as a perturbation of a given reference measure μ along a few significant directions of ℝd. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback–Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds.
BibTeX entry
@article{li-lowdimdetection-2024, author = "Li, M. T. C. and Cui, T. and Li, F. Y. and Marzouk, Y. M. and Zahm, O.", journal = "Preprint", month = "6", title = "Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities", year = "2024", doi = "10.48550/arXiv.2406.13036", abstract = "Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure π as a perturbation of a given reference measure μ along a few significant directions of ℝd. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds.", keywords = "Bayesian inference, dimensional logarithmic Sobolev inequality, dimensional Poincaré inequality, feature detection, gradient-based dimension reduction, reaction coordinates." }
Preserving linear invariants in ensemble filtering methods
Formulating dynamical models for physical phenomena is essential for understanding the interplay between the different mechanisms and predicting the evolution of physical states. However, a dynamical model alone is often insufficient to address these fundamental tasks, as it suffers from model errors and uncertainties. One common remedy is to rely on data assimilation, where the state estimate is updated with observations of the true system. Ensemble filters sequentially assimilate observations by updating a set of samples over time. They operate in two steps: a forecast step that propagates each sample through the dynamical model and an analysis step that updates the samples with incoming observations. For accurate and robust predictions of dynamical systems, discrete solutions must preserve their critical invariants. While modern numerical solvers satisfy these invariants, existing invariant-preserving analysis steps are limited to Gaussian settings and are often not compatible with classical regularization techniques of ensemble filters, e.g., inflation and covariance tapering. The present work focuses on preserving linear invariants, such as mass, stoichiometric balance of chemical species, and electrical charges. Using tools from measure transport theory (Spantini et al., 2022, SIAM Review), we introduce a generic class of nonlinear ensemble filters that automatically preserve desired linear invariants in non-Gaussian filtering problems. By specializing this framework to the Gaussian setting, we recover a constrained formulation of the Kalman filter. Then, we show how to combine existing regularization techniques for the ensemble Kalman filter (Evensen, 1994, J. Geophys. Res.) with the preservation of the linear invariants. Finally, we assess the benefits of preserving linear invariants for the ensemble Kalman filter and nonlinear ensemble filters.
BibTeX entry
@article{leprovost-invariantfiltering-2024, author = "Le Provost, M. and Glaubitz, J. and Marzouk, Y. M.", journal = "Preprint", month = "4", title = "Preserving linear invariants in ensemble filtering methods", year = "2024", doi = "10.48550/arXiv.2404.14328", abstract = "Formulating dynamical models for physical phenomena is essential for understanding the interplay between the different mechanisms and predicting the evolution of physical states. However, a dynamical model alone is often insufficient to address these fundamental tasks, as it suffers from model errors and uncertainties. One common remedy is to rely on data assimilation, where the state estimate is updated with observations of the true system. Ensemble filters sequentially assimilate observations by updating a set of samples over time. They operate in two steps: a forecast step that propagates each sample through the dynamical model and an analysis step that updates the samples with incoming observations. For accurate and robust predictions of dynamical systems, discrete solutions must preserve their critical invariants. While modern numerical solvers satisfy these invariants, existing invariant-preserving analysis steps are limited to Gaussian settings and are often not compatible with classical regularization techniques of ensemble filters, e.g., inflation and covariance tapering. The present work focuses on preserving linear invariants, such as mass, stoichiometric balance of chemical species, and electrical charges. Using tools from measure transport theory (Spantini et al., 2022, SIAM Review), we introduce a generic class of nonlinear ensemble filters that automatically preserve desired linear invariants in non-Gaussian filtering problems. By specializing this framework to the Gaussian setting, we recover a constrained formulation of the Kalman filter. Then, we show how to combine existing regularization techniques for the ensemble Kalman filter (Evensen, 1994, J. Geophys. Res.) with the preservation of the linear invariants. Finally, we assess the benefits of preserving linear invariants for the ensemble Kalman filter and nonlinear ensemble filters.", keywords = "linear invariants · measure transport · nonlinear filtering · ensemble Kalman filter" }
Nonlinear Bayesian optimal experimental design using logarithmic Sobolev inequalities
We study the problem of selecting experiments from a larger candidate pool, where the goal is to maximize mutual information (MI) between the selected subset and the underlying parameters. Finding the exact solution is to this combinatorial optimization problem is computationally costly, not only due to the complexity of the combinatorial search but also the difficulty of evaluating MI in nonlinear/non-Gaussian settings. We propose greedy approaches based on new computationally inexpensive lower bounds for MI, constructed via log-Sobolev inequalities. We demonstrate that our method outperforms random selection strategies, Gaussian approximations, and nested Monte Carlo (NMC) estimators of MI in various settings, including optimal design for nonlinear models with non-additive noise.
BibTeX entry
@article{li-sobolevoed-2024, author = "Li, F. Y. and Belhadji, A. and Marzouk, Y. M.", journal = "Preprint", month = "2", title = "Nonlinear Bayesian optimal experimental design using logarithmic Sobolev inequalities", year = "2024", doi = "10.48550/arXiv.2402.15053", abstract = "We study the problem of selecting experiments from a larger candidate pool, where the goal is to maximize mutual information (MI) between the selected subset and the underlying parameters. Finding the exact solution is to this combinatorial optimization problem is computationally costly, not only due to the complexity of the combinatorial search but also the difficulty of evaluating MI in nonlinear/non-Gaussian settings. We propose greedy approaches based on new computationally inexpensive lower bounds for MI, constructed via log-Sobolev inequalities. We demonstrate that our method outperforms random selection strategies, Gaussian approximations, and nested Monte Carlo (NMC) estimators of MI in various settings, including optimal design for nonlinear models with non-additive noise.", keywords = "optimal experimental design;log-sobolev" }
Stable generative modeling using Schrödinger bridges
We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling and Bayesian inference. In this paper, we propose a generative model combining Schrödinger bridges and Langevin dynamics. Schrödinger bridges over an appropriate reversible reference process are used to approximate the conditional transition probability from the available training samples, which is then implemented in a discrete-time reversible Langevin sampler to generate new samples. By setting the kernel bandwidth in the reference process to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with the time-stepping of stiff stochastic differential equations. Moreover, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem.
BibTeX entry
@article{li-schroedinger-2024, author = "Gottwald, G. and Li, F. Y. and Marzouk, Y. M. and Reich, S.", journal = "Preprint", month = "7", title = "Stable generative modeling using Schrödinger bridges", year = "2024", doi = "https://doi.org/10.48550/arXiv.2401.04372", abstract = "We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling and Bayesian inference. In this paper, we propose a generative model combining Schrödinger bridges and Langevin dynamics. Schrödinger bridges over an appropriate reversible reference process are used to approximate the conditional transition probability from the available training samples, which is then implemented in a discrete-time reversible Langevin sampler to generate new samples. By setting the kernel bandwidth in the reference process to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with the time-stepping of stiff stochastic differential equations. Moreover, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem.", keywords = "Langevin dynamics, Schrödinger bridges, generative modeling, Bayesian inference" }
Wasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying Extremes
We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the p-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are established. The convergence rate of the estimator is shown to be minimax optimal under a class of Pareto-type models analogous to the standard class used in the setting of one-dimensional observations known as the Hall-Welsh class. When the estimator is minimax inefficient, a novel two-step estimator is introduced and demonstrated to attain the minimax lower bound. Our analysis bridges the gaps in understanding trade-offs between estimation bias and variance in multivariate extreme value theory.
BibTeX entry
@article{zhang-wassersteinminimax-2023, author = "Zhang, X. and Blanchet, J. and Marzouk, Y. M. and Nguyen, V. A. and Wang, S.", journal = "Preprint", title = "Wasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying Extremes", year = "2023", doi = "10.48550/arXiv.2312.09862", abstract = "We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the p-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are established. The convergence rate of the estimator is shown to be minimax optimal under a class of Pareto-type models analogous to the standard class used in the setting of one-dimensional observations known as the Hall-Welsh class. When the estimator is minimax inefficient, a novel two-step estimator is introduced and demonstrated to attain the minimax lower bound. Our analysis bridges the gaps in understanding trade-offs between estimation bias and variance in multivariate extreme value theory.", keywords = "optimal transport;minimax estimation" }
Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference
We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference–particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
BibTeX entry
@article{wang-conditionalot-2023, author = "Wang, Z. O. and Baptista, R. and Marzouk, Y. M. and Ruthotto, L. and Verma, D.", journal = "Preprint", title = "Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference", year = "2023", doi = "10.48550/arXiv.2310.16975", abstract = "We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference–particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.", keywords = "measure transport, generative modeling, optimal transport, Bayesian inference, inverse problems, uncertainty quantification" }
An adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localization
Heavy tails is a common feature of filtering distributions that results from the nonlinear dynamical and observation processes as well as the uncertainty from physical sensors. In these settings, the Kalman filter and its ensemble version — the ensemble Kalman filter (EnKF) — that have been designed under Gaussian assumptions result in degraded performance. t–distributions are a parametric family of distributions whose tail-heaviness is modulated by a degree of freedom ν. Interestingly, Cauchy and Gaussian distributions correspond to the extreme cases of a t–distribution for ν = 1 and ν = ∞, respectively. Leveraging tools from measure transport (Spantini et al., SIAM Review, 2022), we present a generalization of the EnKF whose prior-to-posterior update leads to exact inference for t–distributions. We demonstrate that this filter is less sensitive to outlying synthetic observations generated by the observation model for small ν. Moreover, it recovers the Kalman filter for ν = ∞. For nonlinear state-space models with heavy-tailed noise, we propose an algorithm to estimate the prior-to-posterior update from samples of joint forecast distribution of the states and observations. We rely on a regularized expectation-maximization (EM) algorithm to estimate the mean, scale matrix, and degree of freedom of heavy-tailed t–distributions from limited samples (Finegold and Drton, arXiv preprint, 2014). Leveraging the conditional independence of the joint forecast distribution, we regularize the scale matrix with an l1 sparsity-promoting penalization of the log-likelihood at each iteration of the EM algorithm. This l1 regularization draws upon the graphical lasso algorithm (Friedman et al., Biostatistics, 2008) to estimate sparse covariance matrix in the Gaussian setting. By sequentially estimating the degree of freedom at each analysis step, our filter has the appealing feature of adapting the prior-to-posterior update to the tail-heaviness of the data. This new filter intrinsically embeds an adaptive and data-dependent multiplicative inflation mechanism complemented with an adaptive localization through the l1-penalization of the estimated scale matrix. We demonstrate the benefits of this new ensemble filter on challenging filtering problems with heavy-tailed noise.
BibTeX entry
@article{le-provost-heavytailedfilter-2023, author = "Le Provost, M. and Baptista, R. and Eldredge, J. D. and Marzouk, Y. M.", journal = "Preprint", title = "An adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localization", year = "2023", doi = "10.48550/arXiv.2310.08741", abstract = "Heavy tails is a common feature of filtering distributions that results from the nonlinear dynamical and observation processes as well as the uncertainty from physical sensors. In these settings, the Kalman filter and its ensemble version — the ensemble Kalman filter (EnKF) — that have been designed under Gaussian assumptions result in degraded performance. t–distributions are a parametric family of distributions whose tail-heaviness is modulated by a degree of freedom ν. Interestingly, Cauchy and Gaussian distributions correspond to the extreme cases of a t–distribution for ν = 1 and ν = ∞, respectively. Leveraging tools from measure transport (Spantini et al., SIAM Review, 2022), we present a generalization of the EnKF whose prior-to-posterior update leads to exact inference for t–distributions. We demonstrate that this filter is less sensitive to outlying synthetic observations generated by the observation model for small ν. Moreover, it recovers the Kalman filter for ν = ∞. For nonlinear state-space models with heavy-tailed noise, we propose an algorithm to estimate the prior-to-posterior update from samples of joint forecast distribution of the states and observations. We rely on a regularized expectation-maximization (EM) algorithm to estimate the mean, scale matrix, and degree of freedom of heavy-tailed t–distributions from limited samples (Finegold and Drton, arXiv preprint, 2014). Leveraging the conditional independence of the joint forecast distribution, we regularize the scale matrix with an l1 sparsity-promoting penalization of the log-likelihood at each iteration of the EM algorithm. This l1 regularization draws upon the graphical lasso algorithm (Friedman et al., Biostatistics, 2008) to estimate sparse covariance matrix in the Gaussian setting. By sequentially estimating the degree of freedom at each analysis step, our filter has the appealing feature of adapting the prior-to-posterior update to the tail-heaviness of the data. This new filter intrinsically embeds an adaptive and data-dependent multiplicative inflation mechanism complemented with an adaptive localization through the l1-penalization of the estimated scale matrix. We demonstrate the benefits of this new ensemble filter on challenging filtering problems with heavy-tailed noise.", keywords = "ensemble Kalman filter;t–distribution;transport maps" }
A transport approach to sequential simulation-based inference
We present a new transport-based approach to efficiently perform sequential Bayesian inference of static model parameters. The strategy is based on the extraction of conditional distribution from the joint distribution of parameters and data, via the estimation of structured (e.g., block triangular) transport maps. This gives explicit surrogate models for the likelihood functions and their gradients. This allow gradient-based characterizations of posterior density via transport maps in a model-free, online phase. This framework is well suited for parameter estimation in case of complex noise models including nuisance parameters and when the forward model is only known as a black box. The numerical application of this method is performed in the context of characterization of ice thickness with conductivity measurements.
BibTeX entry
@article{rubio-sequentialtransport-2023, author = "Rubio, P.-B. and Marzouk, Y. M. and Parno, M.", journal = "Preprint", month = "8", title = "A transport approach to sequential simulation-based inference", year = "2023", doi = "https://doi.org/10.48550/arXiv.2308.13940", abstract = "We present a new transport-based approach to efficiently perform sequential Bayesian inference of static model parameters. The strategy is based on the extraction of conditional distribution from the joint distribution of parameters and data, via the estimation of structured (e.g., block triangular) transport maps. This gives explicit surrogate models for the likelihood functions and their gradients. This allow gradient-based characterizations of posterior density via transport maps in a model-free, online phase. This framework is well suited for parameter estimation in case of complex noise models including nuisance parameters and when the forward model is only known as a black box. The numerical application of this method is performed in the context of characterization of ice thickness with conductivity measurements.", keywords = "transport, ssm, inference" }
An Approximation Theory Framework for Measure-Transport Sampling Algorithms
This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling – a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance (or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback–Leibler divergence. Specialized rates for approximations of the popular triangular Kn{ö}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
BibTeX entry
@article{baptista-theorytransportsample-2023, author = "Baptista, R. and Hosseini, B. and Kovachki, N. B. and Marzouk, Y. M. and Sagiv, A.", journal = "Preprint", month = "2", title = "An Approximation Theory Framework for Measure-Transport Sampling Algorithms", year = "2023", doi = "10.48550/arXiv.2302.13965", abstract = "This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling -- a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance~(or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback--Leibler divergence. Specialized rates for approximations of the popular triangular Kn{ö}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.", keywords = "Transport map, generative models, stability analysis, approximation theory" }
Transport map unadjusted Langevin algorithms
Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. This connection suggests more systematic ways of learning metrics, and also yields alternative discretizations of the RMLD described by the map, which we study. Moreover, we show that under certain conditions, when the transport map is used in conjunction with ULA, we can improve the geometric rate of convergence of the output process in the 2–Wasserstein distance. Illustrative numerical results complement our theoretical claims.
BibTeX entry
@article{zhang-transportula-2023, author = "Zhang, B. J. and Marzouk, Y. M. and Spiliopoulos, K.", journal = "Preprint", month = "2", title = "Transport map unadjusted Langevin algorithms", year = "2023", doi = "10.48550/arXiv.2302.07227", abstract = "Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. This connection suggests more systematic ways of learning metrics, and also yields alternative discretizations of the RMLD described by the map, which we study. Moreover, we show that under certain conditions, when the transport map is used in conjunction with ULA, we can improve the geometric rate of convergence of the output process in the 2--Wasserstein distance. Illustrative numerical results complement our theoretical claims.", keywords = "angevin dynamics, transport maps, Bayesian inference, Markov chain Monte Carlo" }
Infinite-Dimensional Diffusion Models for Function Spaces
We define diffusion-based generative models in infinite dimensions, and apply them to the generative modeling of functions. By first formulating such models in the infinite-dimensional limit and only then discretizing, we are able to obtain a sampling algorithm that has \emph{dimension-free} bounds on the distance from the sample measure to the target measure. Furthermore, we propose a new way to perform conditional sampling in an infinite-dimensional space and show that our approach outperforms previously suggested procedures.
BibTeX entry
@article{pidstrigach-infinitedimdiffusion-2023, author = "Pidstrigach, J. and Marzouk, Y. M. and Reich, S. and Wang, S.", journal = "Preprint", month = "2", title = "Infinite-Dimensional Diffusion Models for Function Spaces", year = "2022", doi = "10.48550/arXiv.2302.10130", abstract = "We define diffusion-based generative models in infinite dimensions, and apply them to the generative modeling of functions. By first formulating such models in the infinite-dimensional limit and only then discretizing, we are able to obtain a sampling algorithm that has \emph{dimension-free} bounds on the distance from the sample measure to the target measure. Furthermore, we propose a new way to perform conditional sampling in an infinite-dimensional space and show that our approach outperforms previously suggested procedures." }
On minimax density estimation via measure transport
We study the convergence properties, in Hellinger and related distances, of nonparametric density estimators based on measure transport. These estimators represent the measure of interest as the pushforward of a chosen reference distribution under a transport map, where the map is chosen via a maximum likelihood objective (equivalently, minimizing an empirical Kullback-Leibler loss) or a penalized version thereof. We establish concentration inequalities for a general class of penalized measure transport estimators, by combining techniques from M-estimation with analytical properties of the transport-based density representation. We then demonstrate the implications of our theory for the case of triangular Knothe-Rosenblatt (KR) transports on the $d$-dimensional unit cube, and show that both penalized and unpenalized versions of such estimators achieve minimax optimal convergence rates over Hölder classes of densities. Specifically, we establish optimal rates for unpenalized nonparametric maximum likelihood estimation over bounded Hölder-type balls, and then for certain Sobolev-penalized estimators and sieved wavelet estimators.
BibTeX entry
@article{wang-tde-2022, author = "Wang, S. and Marzouk, Y. M.", journal = "Preprint", title = "On minimax density estimation via measure transport", year = "2022", abstract = "We study the convergence properties, in Hellinger and related distances, of nonparametric density estimators based on measure transport. These estimators represent the measure of interest as the pushforward of a chosen reference distribution under a transport map, where the map is chosen via a maximum likelihood objective (equivalently, minimizing an empirical Kullback-Leibler loss) or a penalized version thereof. We establish concentration inequalities for a general class of penalized measure transport estimators, by combining techniques from M-estimation with analytical properties of the transport-based density representation. We then demonstrate the implications of our theory for the case of triangular Knothe-Rosenblatt (KR) transports on the $d$-dimensional unit cube, and show that both penalized and unpenalized versions of such estimators achieve minimax optimal convergence rates over Hölder classes of densities. Specifically, we establish optimal rates for unpenalized nonparametric maximum likelihood estimation over bounded Hölder-type balls, and then for certain Sobolev-penalized estimators and sieved wavelet estimators." }
Distributionally robust Gaussian process regression and Bayesian inverse problems
We study a distributionally robust optimization formulation (i.e., a min-max game) for two representative problems in Bayesian nonparametric estimation: Gaussian process regression and, more generally, linear inverse problems. Our formulation seeks the best mean-squared error predictor, in an infinite-dimensional space, against an adversary who chooses the worst-case model in a Wasserstein ball around a nominal infinite-dimensional Bayesian model. The transport cost is chosen to control features such as the degree of roughness of the sample paths that the adversary is allowed to inject. We show that the game has a well-defined value (i.e., strong duality holds in the sense that max-min equals min-max) and that there exists a unique Nash equilibrium which can be computed by a sequence of finite-dimensional approximations. Crucially, the worst-case distribution is itself Gaussian. We explore properties of the Nash equilibrium and the effects of hyperparameters through a set of numerical experiments, demonstrating the versatility of our modeling framework.
BibTeX entry
@article{zhang-dro-2022, author = "Zhang, X. and Blanchet, J. and Marzouk, Y. M. and Nguyen, V. A. and Wang, S.", journal = "Preprint", title = "Distributionally robust Gaussian process regression and Bayesian inverse problems", year = "2022", abstract = "We study a distributionally robust optimization formulation (i.e., a min-max game) for two representative problems in Bayesian nonparametric estimation: Gaussian process regression and, more generally, linear inverse problems. Our formulation seeks the best mean-squared error predictor, in an infinite-dimensional space, against an adversary who chooses the worst-case model in a Wasserstein ball around a nominal infinite-dimensional Bayesian model. The transport cost is chosen to control features such as the degree of roughness of the sample paths that the adversary is allowed to inject. We show that the game has a well-defined value (i.e., strong duality holds in the sense that max-min equals min-max) and that there exists a unique Nash equilibrium which can be computed by a sequence of finite-dimensional approximations. Crucially, the worst-case distribution is itself Gaussian. We explore properties of the Nash equilibrium and the effects of hyperparameters through a set of numerical experiments, demonstrating the versatility of our modeling framework." }
Gradient-based data and parameter dimension reduction for Bayesian models: an information theoretic perspective
We consider the problem of reducing the dimensions of parameters and data in non-Gaussian Bayesian inference problems. Our goal is to identify an "informed" subspace of the parameters and an "informative" subspace of the data so that a high-dimensional inference problem can be approximately reformulated in low-to-moderate dimensions, thereby improving the computational efficiency of many inference techniques. To do so, we exploit gradient evaluations of the log-likelihood function. Furthermore, we use an information-theoretic analysis to derive a bound on the posterior error due to parameter and data dimension reduction. This bound relies on logarithmic Sobolev inequalities, and it reveals the appropriate dimensions of the reduced variables. We compare our method with classical dimension reduction techniques, such as principal component analysis and canonical correlation analysis, on applications ranging from mechanics to image processing.
BibTeX entry
@article{dimred2022, author = "Baptista, R. and Marzouk, Y. M. and Zahm, O.", journal = "Preprint", title = "Gradient-based data and parameter dimension reduction for Bayesian models: an information theoretic perspective", year = "2022", abstract = "We consider the problem of reducing the dimensions of parameters and data in non-Gaussian Bayesian inference problems. Our goal is to identify an "informed" subspace of the parameters and an "informative" subspace of the data so that a high-dimensional inference problem can be approximately reformulated in low-to-moderate dimensions, thereby improving the computational efficiency of many inference techniques. To do so, we exploit gradient evaluations of the log-likelihood function. Furthermore, we use an information-theoretic analysis to derive a bound on the posterior error due to parameter and data dimension reduction. This bound relies on logarithmic Sobolev inequalities, and it reveals the appropriate dimensions of the reduced variables. We compare our method with classical dimension reduction techniques, such as principal component analysis and canonical correlation analysis, on applications ranging from mechanics to image processing.", keywords = "Bayesian inference, gradient-based dimension reduction, logarithmic Sobolev inequalities, conditional mutual information, low-dimensional subspaces, coordinate selection" }
Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport
We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for high-dimensional image data that is generally intractable to evaluate. We tackle this challenging Bayesian inference problem using a likelihood-free approach based on measure transport together with the construction of summary statistics for the image data. We also show that expected information gains (EIGs) from the observed data about the model parameters can be computed with no significant additional cost. Lastly, we present a numerical case study based on the Ohta–Kawasaki model for diblock copolymer thin film self-assembly and top-down microscopy characterization. For calibration, we introduce several domain-specific energy- and Fourier-based summary statistics, and quantify their informativeness using EIG. We demonstrate the power of the proposed approach to study the effect of data corruptions and experimental designs on the calibration results.
BibTeX entry
@article{baptista-copoly-2022, author = "Baptista, R. and Cao, L. and Chen, J. and Ghattas, O. and Li, F. Y. and Marzouk, Y. M. and Oden, J. T.", journal = "Preprint", title = "Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport", year = "2022", doi = "10.48550/arXiv.2206.11343", abstract = "We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for high-dimensional image data that is generally intractable to evaluate. We tackle this challenging Bayesian inference problem using a likelihood-free approach based on measure transport together with the construction of summary statistics for the image data. We also show that expected information gains (EIGs) from the observed data about the model parameters can be computed with no significant additional cost. Lastly, we present a numerical case study based on the Ohta--Kawasaki model for diblock copolymer thin film self-assembly and top-down microscopy characterization. For calibration, we introduce several domain-specific energy- and Fourier-based summary statistics, and quantify their informativeness using EIG. We demonstrate the power of the proposed approach to study the effect of data corruptions and experimental designs on the calibration results.", keywords = "block copolymers, material self-assembly, uncertainty quantification, likelihood-free inference, summary statistics, measure transport, expected information gain, Ohta–Kawasaki model" }
Computing eigenfunctions of the multidimensional Ornstein-Uhlenbeck operator
We discuss approaches to computing eigenfunctions of the Ornstein–Uhlenbeck (OU) operator in more than two dimensions. While the spectrum of the OU operator and theoretical properties of its eigenfunctions have been well characterized in previous research, the practical computation of general eigenfunctions has not been resolved. We review special cases for which the eigenfunctions can be expressed exactly in terms of commonly used orthogonal polynomials. Then we present a tractable approach for computing the eigenfunctions in general cases and comment on its dimension dependence.
BibTeX entry
@article{zhang-eigenfunctions-2021, author = "Zhang, B. J. and Sahai, T. and Marzouk, Y. M.", journal = "Preprint", title = "Computing eigenfunctions of the multidimensional Ornstein-Uhlenbeck operator", year = "2021", doi = "10.48550/arXiv.2110.09229", abstract = "We discuss approaches to computing eigenfunctions of the Ornstein--Uhlenbeck (OU) operator in more than two dimensions. While the spectrum of the OU operator and theoretical properties of its eigenfunctions have been well characterized in previous research, the practical computation of general eigenfunctions has not been resolved. We review special cases for which the eigenfunctions can be expressed exactly in terms of commonly used orthogonal polynomials. Then we present a tractable approach for computing the eigenfunctions in general cases and comment on its dimension dependence." }
Bayesian inference under model misspecification using transport-Lagrangian distances: an application to seismic inversion
Model misspecification constitutes a major obstacle to reliable inference in many inverse problems. Inverse problems in seismology, for example, are particularly affected by misspecification of wave propagation velocities. In this paper, we focus on a specific seismic inverse problem–-full-waveform moment tensor inversion - and develop a Bayesian framework that seeks robustness to velocity misspecification. A novel element of our framework is the use of transport-Lagrangian (TL) distances between observed and model predicted waveforms to specify a loss function, and the use of this loss to define a generalized belief update via a Gibbs posterior. The TL distance naturally disregards certain features of the data that are more sensitive to model misspecification, and therefore produces less biased or dispersed posterior distributions in this setting. To make the latter notion precise, we use several diagnostics to assess the quality of inference and uncertainty quantification, i.e., continuous rank probability scores and rank histograms. We interpret these diagnostics in the Bayesian setting and compare the results to those obtained using more typical Gaussian noise models and squared-error loss, under various scenarios of misspecification. Finally, we discuss potential generalizability of the proposed framework to a broader class of inverse problems affected by model misspecification.
BibTeX entry
@article{scarinci-tl-2021, author = "Scarinci, A. and Fehler, M. and Marzouk, Y. M.", journal = "Preprint", title = "Bayesian inference under model misspecification using transport-Lagrangian distances: an application to seismic inversion", year = "2021", abstract = "Model misspecification constitutes a major obstacle to reliable inference in many inverse problems. Inverse problems in seismology, for example, are particularly affected by misspecification of wave propagation velocities. In this paper, we focus on a specific seismic inverse problem---full-waveform moment tensor inversion - and develop a Bayesian framework that seeks robustness to velocity misspecification. A novel element of our framework is the use of transport-Lagrangian (TL) distances between observed and model predicted waveforms to specify a loss function, and the use of this loss to define a generalized belief update via a Gibbs posterior. The TL distance naturally disregards certain features of the data that are more sensitive to model misspecification, and therefore produces less biased or dispersed posterior distributions in this setting. To make the latter notion precise, we use several diagnostics to assess the quality of inference and uncertainty quantification, i.e., continuous rank probability scores and rank histograms. We interpret these diagnostics in the Bayesian setting and compare the results to those obtained using more typical Gaussian noise models and squared-error loss, under various scenarios of misspecification. Finally, we discuss potential generalizability of the proposed framework to a broader class of inverse problems affected by model misspecification." }
Learning non-Gaussian graphical models via Hessian scores and triangular transport
Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions, most existing methods fail to consistently recover the graph structure for non-Gaussian data. Here we propose an algorithm for learning the Markov structure of continuous and non-Gaussian distributions. To characterize conditional independence, we introduce a score based on integrated Hessian information from the joint log-density, and we prove that this score upper bounds the conditional mutual information for a general class of distributions. To compute the score, our algorithm SING estimates the density using a deterministic coupling, induced by a triangular transport map, and iteratively exploits sparse structure in the map to reveal sparsity in the graph. For certain non-Gaussian datasets, we show that our algorithm recovers the graph structure even with a biased approximation to the density. Among other examples, we apply sing to learn the dependencies between the states of a chaotic dynamical system with local interactions.
BibTeX entry
@article{baptista-lng-2021, author = "Baptista, R. and Marzouk, Y. M. and Morrison, R. and Zahm, O.", journal = "Preprint", title = "Learning non-Gaussian graphical models via Hessian scores and triangular transport", year = "2021", abstract = "Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions, most existing methods fail to consistently recover the graph structure for non-Gaussian data. Here we propose an algorithm for learning the Markov structure of continuous and non-Gaussian distributions. To characterize conditional independence, we introduce a score based on integrated Hessian information from the joint log-density, and we prove that this score upper bounds the conditional mutual information for a general class of distributions. To compute the score, our algorithm SING estimates the density using a deterministic coupling, induced by a triangular transport map, and iteratively exploits sparse structure in the map to reveal sparsity in the graph. For certain non-Gaussian datasets, we show that our algorithm recovers the graph structure even with a biased approximation to the density. Among other examples, we apply sing to learn the dependencies between the states of a chaotic dynamical system with local interactions.", keywords = "Undirected graphical models, structure learning, non-Gaussian distributions, conditional mutual information, transport map, sparsity" }
A layered multiple importance sampling scheme for focused optimal Bayesian experimental design
We develop a new computational approach for "focused" optimal Bayesian experimental design with nonlinear models, with the goal of maximizing expected information gain in targeted subsets of model parameters. Our approach considers uncertainty in the full set of model parameters, but employs a design objective that can exploit learning trade-offs among different parameter subsets. We introduce a new layered multiple importance sampling scheme that provides consistent estimates of expected information gain in this focused setting. This sampling scheme yields significant reductions in estimator bias and variance for a given computational effort, making optimal design more tractable for a wide range of computationally intensive problems.
BibTeX entry
@article{art_77, author = "Feng, C. and Marzouk, Y. M.", journal = "Preprint", title = "A layered multiple importance sampling scheme for focused optimal Bayesian experimental design", year = "2019", abstract = "We develop a new computational approach for "focused" optimal Bayesian experimental design with nonlinear models, with the goal of maximizing expected information gain in targeted subsets of model parameters. Our approach considers uncertainty in the full set of model parameters, but employs a design objective that can exploit learning trade-offs among different parameter subsets. We introduce a new layered multiple importance sampling scheme that provides consistent estimates of expected information gain in this focused setting. This sampling scheme yields significant reductions in estimator bias and variance for a given computational effort, making optimal design more tractable for a wide range of computationally intensive problems.", keywords = "optimal experimental design, Bayesian inference, Monte Carlo methods, multiple importance sampling, expected information gain, mutual information" }
A trust region method for derivative-free nonlinear constrained stochastic optimization
We present the algorithm SNOWPAC for derivative-free constrained stochastic optimization. The algorithm builds on a model-based approach for deterministic nonlinear constrained derivative-free optimization that introduces an ‘‘inner boundary path’’ to locally convexify the feasible domain and ensure feasible trial steps. We extend this deterministic method via a generalized trust region approach that accounts for noisy evaluations of the objective and constraints. To reduce the impact of noise, we fit consistent Gaussian processes to past objective and constraint evaluations. Our approach incorporates a wide variety of probabilistic risk or deviation measures in both the objective and the constraints. Numerical benchmarking demonstrates SNOWPAC’s efficiency and highlights the accuracy of the optimization solutions found.
BibTeX entry
@article{art_55, author = "Augustin, F. and Marzouk, Y. M.", journal = "Preprint", title = "A trust region method for derivative-free nonlinear constrained stochastic optimization", year = "2017", abstract = "We present the algorithm SNOWPAC for derivative-free constrained stochastic optimization. The algorithm builds on a model-based approach for deterministic nonlinear constrained derivative-free optimization that introduces an ``inner boundary path'' to locally convexify the feasible domain and ensure feasible trial steps. We extend this deterministic method via a generalized trust region approach that accounts for noisy evaluations of the objective and constraints. To reduce the impact of noise, we fit consistent Gaussian processes to past objective and constraint evaluations. Our approach incorporates a wide variety of probabilistic risk or deviation measures in both the objective and the constraints. Numerical benchmarking demonstrates SNOWPAC's efficiency and highlights the accuracy of the optimization solutions found." }
Sequential Bayesian optimal experimental design via approximate dynamic programming
The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dynamic program. Batch and greedy designs are shown to result from special cases of this formulation. We then focus on sOED for parameter inference, adopting a Bayesian formulation with an information theoretic design objective. To make the problem tractable, we develop new numerical approaches for nonlinear design with continuous parameter, design, and observation spaces. We approximate the optimal policy by using backward induction with regression to construct and refine value function approximations in the dynamic program. The proposed algorithm iteratively generates trajectories via exploration and exploitation to improve approximation accuracy in frequently visited regions of the state space. Numerical results are verified against analytical solutions in a linear-Gaussian setting. Advantages over batch and greedy design are then demonstrated on a nonlinear source inversion problem where we seek an optimal policy for sequential sensing.
BibTeX entry
@article{huan-sequential-2016, author = "Huan, X. and Marzouk, Y. M.", journal = "Preprint", title = "Sequential Bayesian optimal experimental design via approximate dynamic programming", year = "2016", abstract = "The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dynamic program. Batch and greedy designs are shown to result from special cases of this formulation. We then focus on sOED for parameter inference, adopting a Bayesian formulation with an information theoretic design objective. To make the problem tractable, we develop new numerical approaches for nonlinear design with continuous parameter, design, and observation spaces. We approximate the optimal policy by using backward induction with regression to construct and refine value function approximations in the dynamic program. The proposed algorithm iteratively generates trajectories via exploration and exploitation to improve approximation accuracy in frequently visited regions of the state space. Numerical results are verified against analytical solutions in a linear-Gaussian setting. Advantages over batch and greedy design are then demonstrated on a nonlinear source inversion problem where we seek an optimal policy for sequential sensing.", keywords = "sequential experimental design, Bayesian experimental design, approximate dynamic programming, feedback control policy, lookahead, approximate value iteration, information gain" }
Bayesian level sets for image segmentation
This paper presents a new algorithm for image segmentation and classification, Bayesian Level Sets (BLS). BLS harnesses the advantages of two well-known algorithms: variational level sets and finite mixture model EM (FMM-EM). Like FMM-EM, BLS has a simple, probabilistic implementation which natively extends to an arbitrary number of classes. Via a level set-inspired geometric prior, BLS returns smooth, regular segmenting contours that are robust to noise. In practice, BLS is also observed to be robust to fairly lenient initial conditions. A comparative analysis of the three algorithms (BLS, level set, FMM-EM) is presented, and the advantages of BLS are quantitatively demonstrated on realistic applications such as pluripotent stem cell colonies, brain MRI phantoms, and stem cell nuclei.
BibTeX entry
@article{art_39, author = "Lowry, N. and Mangoubi, R. and Desai, M. and Marzouk, Y. M. and Sammak, P.", journal = "Preprint", title = "Bayesian level sets for image segmentation", year = "2015", abstract = "This paper presents a new algorithm for image segmentation and classification, Bayesian Level Sets (BLS). BLS harnesses the advantages of two well-known algorithms: variational level sets and finite mixture model EM (FMM-EM). Like FMM-EM, BLS has a simple, probabilistic implementation which natively extends to an arbitrary number of classes. Via a level set-inspired geometric prior, BLS returns smooth, regular segmenting contours that are robust to noise. In practice, BLS is also observed to be robust to fairly lenient initial conditions. A comparative analysis of the three algorithms (BLS, level set, FMM-EM) is presented, and the advantages of BLS are quantitatively demonstrated on realistic applications such as pluripotent stem cell colonies, brain MRI phantoms, and stem cell nuclei." }
NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints
This paper proposes the algorithm NOWPAC (Nonlinear Optimization With Path-Augmented Constraints) for nonlinear constrained derivative-free optimization. The algorithm uses a trust region framework based on fully linear models for the objective function and the constraints. A new constraint-handling scheme based on an inner boundary path allows for the computation of feasible trial steps using models for the constraints. We prove that the iterates computed by NOWPAC converge to a local first order critical point. We also discuss the convergence of NOWPAC in situations where evaluations of the objective function or the constraints are inexact, e.g., corrupted by numerical errors. For this, we determine a rate of decay that the magnitude of these numerical errors must satisfy, while approaching the critical point, to guarantee convergence. In settings where adjusting the accuracy of the objective or constraint evaluations is not possible, as is often the case in practical applications, we introduce an error indicator to detect these regimes and prevent deterioration of the optimization results.
BibTeX entry
@article{augustin-nowpac-2014, author = "Augustin, F. and Marzouk, Y. M.", journal = "Preprint", title = "NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints", year = "2014", abstract = "This paper proposes the algorithm NOWPAC (Nonlinear Optimization With Path-Augmented Constraints) for nonlinear constrained derivative-free optimization. The algorithm uses a trust region framework based on fully linear models for the objective function and the constraints. A new constraint-handling scheme based on an inner boundary path allows for the computation of feasible trial steps using models for the constraints. We prove that the iterates computed by NOWPAC converge to a local first order critical point. We also discuss the convergence of NOWPAC in situations where evaluations of the objective function or the constraints are inexact, e.g., corrupted by numerical errors. For this, we determine a rate of decay that the magnitude of these numerical errors must satisfy, while approaching the critical point, to guarantee convergence. In settings where adjusting the accuracy of the objective or constraint evaluations is not possible, as is often the case in practical applications, we introduce an error indicator to detect these regimes and prevent deterioration of the optimization results." }
Announcements
January 2025
Congrats to Dimitris Konomis, who successfully defended his thesis "Max-Stable Processes, Measure Transport, and Conditional Sampling". We wish you luck at the Voleon Group!
Congrats to Dimitris Konomis, who successfully defended his thesis "Max-Stable Processes, Measure Transport, and Conditional Sampling". We wish you luck at the Voleon Group!
October 2024
Congratulations to Jan Glaubitz for his new position as assistant professor in scientific computing at Linköping University!
Congratulations to Jan Glaubitz for his new position as assistant professor in scientific computing at Linköping University!
September 2024
Congratulations to Fengyi Li for starting at LinkedIn as an AI Engineer!
Congratulations to Fengyi Li for starting at LinkedIn as an AI Engineer!
August 2024
Congratulations to Hannah Lu, now an Assistant Professor at University of Texas at Austin in the Oden Institute!
Congratulations to Hannah Lu, now an Assistant Professor at University of Texas at Austin in the Oden Institute!
August 2024
Congratulations to Mirjeta Pasha, now at Virginia Tech as an Assistant Professor of Data and High Performance Computational Mathematics in the Department of Mathematics!
Congratulations to Mirjeta Pasha, now at Virginia Tech as an Assistant Professor of Data and High Performance Computational Mathematics in the Department of Mathematics!
August 2024
Congratulations to Timo Schorlepp, now at the Courant Institute in New York University as a Courant Instructor!
Congratulations to Timo Schorlepp, now at the Courant Institute in New York University as a Courant Instructor!
July 2024
Congratulations to Michael Brennan for becoming a Senior Researcher at Solea Energy!
More announcements
Congratulations to Michael Brennan for becoming a Senior Researcher at Solea Energy!